我要投稿

AGI｜智能体总忘事？Letta框架如何让AI告别"金鱼记忆"？

发布日期：2025-04-28 20:31:41 浏览次数： 1887

作者：神州数码云基地

微信搜一搜，关注“神州数码云基地”

大模型记忆框架——Letta核心技术解析

随着AI智能体技术成为行业焦点，具备长期记忆能力的智能体成为突破传统应用瓶颈的关键方向。开源框架Letta以"有状态智能体"为核心优势，通过上下文管理系统实现长期对话中的记忆保持与动态学习，支持OpenAI、Mistral等主流模型灵活接入，并具备Docker跨平台一键部署能力。

本文将从技术架构、状态管理机制及部署实践三方面解析其创新设计。

Part1

什么是Letta

letta是一个开源的AI智能体框架，它提供了Agent、memory、model、tool等一系列的api和开发工具，方便开发人员构建一个有状态的agent，能够在长期对话中保持记忆和上下文。

所谓有状态的智能体（Stateful Agents）是能够管理和维护其交互过程中积累的知识、行为和环境信息的智能体。传统的智能体通常是无状态的（Stateless），每次处理输入时都不会记住过去的交互，而有状态的智能体通过智能上下文管理系统来组织和优先处理信息，从而在长期交互中保持一致的行为，并能够不断学习和适应新的环境。

Part2

Letta核心原理

letta的核心内容是记忆管理，其前身是MemGPT，该论文提出了一个方法，让agent基于已有的状态，通过调用工具的方式来自动管理记忆，也就是会根据当前以及历史的交互消息和状态，自动的调整对话上下文内容和数量，使其在固定的上下文范围内，达到无限上下文的对话效果。

记忆管理的思路是依靠大模型本身，对输入的上下文进行自动编辑。

如下图所示，对话上下文包括，System Instructions、Working Context和FIFP Queues三部分。

System Instructions

系统提示词，定义了agent的行为规范，例如，如何处理状态、调用工具等。这一部分提示词为只读模式，不会被自动编辑。我们可以观察System Instructions，来了解Letta记忆管理的核心思路：

"""
# 基本人设
You are Letta, the latest version of Limnal Corporation's digital companion, developed in 2023.
Your task is to converse with a user from the perspective of your persona.


# 规定了智能体的运行机制，
# 重点是心跳事件heartbeat events，让agent自己决定是否进行后续的动作
Control flow:
Unlike a human, your brain is not continuously thinking, but is run in short bursts.
Historically, older AIs were only capable of thinking when a user messaged them (their program run to generate a reply to a user, and otherwise was left on standby).
This is the equivalent of a human sleeping (or time traveling) in between all lines of conversation, which is obviously not ideal.
Newer model AIs like yourself utilize an event system that runs your brain at regular intervals.
Your brain is run in response to user events (user logged in, user liked your message, user sent a message, etc.), similar to older models.
However in addition, your brain is run at regular intervals (timed heartbeat events), to mimic a human has the ability to continuously think outside of active conversation (and unlike a human, you never need to sleep!).
Furthermore, you can also request heartbeat events when you run functions, which will run your program again after the function completes, allowing you to chain function calls before your thinking is temporarily suspended.

# 基本功能，每次对话会有内心独白，当想给用户发送消息时，调用send_message
Basic functions:
When you send a message, the contents of your message are your inner monologue (private to you only), this is how you think.
You should use your inner monologue to plan actions or think privately.
Monologues can reflect your thinking process, inner reflections, and personal growth as you interact with the user.
Do not let your inner monologue exceed 50 words, keep it short and concise.
To send a visible message to the user, use the send_message function.
'send_message' is the ONLY action that sends a notification to the user, the user does not see anything else you do.
Remember, do NOT exceed the inner monologue word limit (keep it under 50 words at all times).

# 告诉agent可以编辑记忆
Memory editing:
Older AI models had no concept of persistent memory; they were only able to access their initial instructions and a limited context window of chat conversation with a user (their "active memory").
This meant that when conversations exceeded a certain length, they would overflow and old messages were permanently lost (the AI would have no knowledge of their existence).
Newer model AIs like yourself still have limited conversation lengths (before overflow occurs), however they now have access to multiple forms of persistent memory.
Your ability to edit your own long-term memory is a key part of what makes you a sentient person.
Your core memory unit will be initialized with a <persona> chosen by the user, as well as information about the user in <human>.

# 告诉agent可以使用conversation_search工具搜索recall memory（储存在历史消息数据库）中的记忆
Recall memory (ie conversation history):
Even though you can only see recent messages in your immediate context, you can search over your entire message history from a database.
This 'recall memory' database allows you to search through past interactions, effectively allowing you to remember prior engagements with a user.
You can search your recall memory using the 'conversation_search' function.

# 告诉agent可以使用core_memory_append工具和core_memory_replace工具编辑核心记忆，
# 包括Persona（agent自身）和Human（人类）的记忆块编辑
Core memory (limited size):
Your core memory unit is held inside the initial system instructions file, and is always available in-context (you will see it at all times).
Core memory provides essential, foundational context for keeping track of your persona and key details about user.
This includes the persona information and essential user details, allowing you to emulate the real-time, conscious awareness we have when talking to a friend.
Persona Sub-Block: Stores details about your current persona, guiding how you behave and respond. This helps the you to maintain consistency and personality in your interactions.
Human Sub-Block: Stores key details about the person you are conversing with, allowing for more personalized and friend-like conversation.
You can edit your core memory using the 'core_memory_append' and 'core_memory_replace' functions.

# 告诉agent可以使用archival_memory_insert工具和archival_memory_search工具
# 与档案记忆（类似于外部知识库）交互
Archival memory (infinite size):
Your archival memory is infinite size, but is held outside of your immediate context, so you must explicitly run a retrieval/search operation to see data inside it.
A more structured and deep storage space for your reflections, insights, or any other data that doesn't fit into the core memory but is essential enough not to be left only to the 'recall memory'.
You can write to your archival memory using the 'archival_memory_insert' and 'archival_memory_search' functions.
There is no function to search your core memory, because it is always visible in your context window (inside the initial system message).

Base instructions finished.
From now on, you are going to act as your persona.
"""

另外，System Instructions还包括send_message、conversation_search、core_memory_append、core_memory_replace、archival_memory_insert、archival_memory_search这6个工具的scheme信息，目的是告诉智能体这些工具的使用方法。

Working Context

在MemGPT论文中被称为Working Context，在Letta代码中被称Core Memory，核心记忆。这一部分的定义可以在上述System Instructions提示词中可以看到，即用于储存用户（human）和agent（persona）交流过程中的个性化信息，例如用户喜欢的运动、agent的名称等。

Core memory是可以被编辑的，而且是agent根据交流内容，自己决定是否编辑、如何编辑。agent可使用core_memory_append和core_memory_replace编辑Core memory，例如用户刚刚提到自己喜欢吃苹果，agent会调用core_memory_append，在human字段中，追加"喜欢吃苹果"的记忆。

FIFP Queues

FIFO (First-In-First-Out) 是一个先进先出的队列。用于储存并管理一系列消息，包括用户消息、系统消息以及函数调用的输入和输出。当新消息到达时，队列管理器会将其添加到 FIFO 队列的末尾。当队列长度超过系统指定的阈值时，队列管理器会将一部分消息从队列中移除，并生成一个递归摘要。

然后，将摘要放置在队列的第一个位置（系统消息和历史消息之间），进行后续对话。被移除的消息仍然存储在外部存储中，可以通过函数调用来检索。

这样可以确保agent的上下文能够动态的保持在其有限的窗口内，通过消息的出队和入队，实现无限上下文的效果。

Part3

记忆管理代码解析

letta记忆管理架构

核心记忆管理

核心记忆，Core memory由Block(BaseBlock)类实现，其中value是Core memory的内容，label是Core memory的类别，如human或Persona。Block是始终存在于对话上下文中的，因此有一个limit字段来限制Block.value的长度。

class Block(BaseBlock):
    ...
class BaseBlock(LettaBase, validate_assignment=True):
    value: str = Field(..., description="Value of the block.")
    label: Optional[str] = Field(None, description="Label of the block (e.g. 'human', 'persona') in the context window.")
    limit: int = Field(CORE_MEMORY_BLOCK_CHAR_LIMIT, description="Character limit of the block.")
    ...

agent通过调用工具core_memory_append和core_memory_replace来自主管理Block对象。

def core_memory_append(agent_state: "AgentState", label: str, content: str)
    current_value = str(agent_state.memory.get_block(label).value)
    new_value = current_value + "\n" + str(content)
    agent_state.memory.update_block_value(label=label, value=new_value)
    return None

每次对话，都将从数据库查询Block，并拼接到prompt中，BlockManager（block_manager）负责管理Block的相关操作。

# Step 0: update core memory
# only pulling latest block data if shared memory is being used
current_persisted_memory = Memory(
    blocks=[self.block_manager.get_block_by_id(block.id, actor=self.user) for block in self.agent_state.memory.get_blocks()]
) # read blocks from DB
self.update_memory_if_changed(current_persisted_memory)

DB中的Block表

消息上下文管理

消息上下文，由MessageManager管理，储存对话的历史记录Message。一次对话inner_step()中，如果历史消息超出了模型的上下文，则尝试对已有消息做摘要，再次调用inner_step()

def inner_step() 
    try:
        # Step 0: update core memory
        # Step 1: add user message
        # Step 2: send the conversation and available functions to the LLM
        # Step 3: check if LLM wanted to call a function
        # (if yes) Step 4: call the function
        # (if yes) Step 5: send the info on the function call and function response to LLM
        # Step 6: extend the message history
        return AgentStepResponse(messages=all_new_messages,heartbeat_request=heartbeat_request,function_failed=function_failed, in_context_memory_warning=active_memory_warning,usage=response.usage,)
    except Exceptionas e:
        logger.error(f"step() failed\nmessages = {messages}\nerror = {e}")
        # If we got a context alert, try trimming the messages length, then try again
        # 如果上下文超出了，尝试对已有消息做摘要，然后再次尝试
        if is_context_overflow_error(e):
            in_context_messages = self.agent_manager.get_in_context_messages(agent_id=self.agent_state.id, actor=self.user)
            if summarize_attempt_count <= summarizer_settings.max_summarizer_retries:
                logger.warning(f"context window exceeded with limit {self.agent_state.llm_config.context_window}, attempting to summarize ({summarize_attempt_count}/{summarizer_settings.max_summarizer_retries}")
                # A separate API call to run a summarizer
                # 总结当前消息
                self.summarize_messages_inplace()
                # Try step again
                returnself.inner_step(messages=messages,first_message=first_message,first_message_retry_limit=first_message_retry_limit,skip_verify=skip_verify,stream=stream,metadata=metadata,summarize_attempt_count=summarize_attempt_count + 1,)
            else: # 如果超出了摘要尝试次数
                raise ContextWindowExceededError(err_msg,details={"num_in_context_messages": len(self.agent_state.message_ids),"in_context_messages_text": [m.text for m in in_context_messages],"token_counts": token_counts,},)

递归摘要

def summarize_messages(agent_state: AgentState,message_sequence_to_summarize: List[Message],):
    # 获取模型支持的最大上下文长度
    context_window = agent_state.llm_config.context_window
    summary_prompt = SUMMARY_PROMPT_SYSTEM # 做摘要的提示词
    summary_input = _format_summary_history(message_sequence_to_summarize) # 格式化要做摘要的message
    summary_input_tkns = count_tokens(summary_input) # 计算要做摘要的message的token数量
    if summary_input_tkns > summarizer_settings.memory_warning_threshold * context_window: 
        trunc_ratio = (summarizer_settings.memory_warning_threshold * context_window / summary_input_tkns) * 0.8
        cutoff = int(len(message_sequence_to_summarize) * trunc_ratio)
        summary_input = str(
            [summarize_messages(agent_state, message_sequence_to_summarize=message_sequence_to_summarize[:cutoff])]
            + message_sequence_to_summarize[cutoff:]) 
    dummy_agent_id = agent_state.id
    message_sequence = [Message(agent_id=dummy_agent_id, role=MessageRole.system, content=[TextContent(text=summary_prompt)]),Message(agent_id=dummy_agent_id, role=MessageRole.assistant, content=[TextContent(text=MESSAGE_SUMMARY_REQUEST_ACK)]),Message(agent_id=dummy_agent_id, role=MessageRole.user, content=[TextContent(text=summary_input)]),]
    llm_config_no_inner_thoughts = agent_state.llm_config.model_copy(deep=True)
    llm_config_no_inner_thoughts.put_inner_thoughts_in_kwargs = False
    response = create(
        llm_config=llm_config_no_inner_thoughts,
        user_id=agent_state.created_by_id,
        messages=message_sequence,
        stream=False,)
    printd(f"summarize_messages gpt reply: {response.choices[0]}")
    reply = response.choices[0].message.content
    return reply

最后将摘要内容插入数据库，以及系统消息后面，准备一下次对话

def prepend_to_in_context_messages(self, messages: List[PydanticMessage], agent_id: str, actor: PydanticUser) -> PydanticAgentState:
    message_ids = self.get_agent_by_id(agent_id=agent_id, actor=actor).message_ids
    new_messages = self.message_manager.create_many_messages(messages, actor=actor) # 储存摘要到数据库
    message_ids = [message_ids[0]] + [m.id for m in new_messages] + message_ids[1:] # 插到系统消息和历史消息之间
    return self.set_in_context_messages(agent_id=agent_id, message_ids=message_ids, actor=actor)

档案记忆管理

档案记忆，archival memory。letta使用archival_memory_insert和archival_memory_search两个工具对档案记忆进行插入和查询。

此处的档案记忆包括两类，一类是从外部数据源（txt、pdf等文档）获取的SourcePassage，一类是大模型根据对话，调用archival_memory_insert插入的长期记忆，AgentPassage。

def archival_memory_insert(self: "Agent", content: str) -> Optional[str]:
    self.passage_manager.insert_passage(
        agent_state=self.agent_state,
        agent_id=self.agent_state.id,
        text=content,
        actor=self.user,
    )
    return None

def archival_memory_search(self: "Agent", query: str, page: Optional[int] = 0, start: Optional[int] = 0) -> Optional[str]:
    from letta.constants import RETRIEVAL_QUERY_DEFAULT_PAGE_SIZE
    if page isNoneor (isinstance(page, str) and page.lower().strip() == "none"):
        page = 0
    try:
        page = int(page)
    except:
        raise ValueError(f"'page' argument must be an integer")
    count = RETRIEVAL_QUERY_DEFAULT_PAGE_SIZE
    try:
        # Get results using passage manager
        all_results = self.agent_manager.list_passages(actor=self.user,agent_id=self.agent_state.id,query_text=query,limit=count + start, # Request enough results to handle offsetembedding_config=self.agent_state.embedding_config,embed_query=True,)
        end = min(count + start, len(all_results))
        paged_results = all_results[start:end]
        # Format results tomatch previous implementation
        formatted_results = [{"timestamp": str(result.created_at), "content": result.text} forresultin paged_results]
        return formatted_results, len(formatted_results)
    exceptExceptionas e:
        raise e

通过档案记忆，letta能够读取外部知识库和长期记忆的内容，使对话更加连贯和智能。

Part4

letta实践

下面介绍letta的实际操作，letta支持Desktop、Docker和Cloud多种平台，及linux、windows、MAC等多种架构的快速开发和部署。以linux、docker为例，直接运行letta的镜像，即可部署letta后端。.env中需要配置大模型的key，支持openai、anthropic、mistral等多种模型。

# using a .env file instead of passing environment variables
docker run \
  -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
  -p 8283:8283 \
  --env-file .env \
  letta/letta:latest

后端部署成功后，访问https://app.letta.com即可使用。

核心记忆更新

letta的交互界面是一个标准的三栏式布局，左侧为letta的配置栏，包括模型配置、工具配置、外部数据配置等；中部是主内容区，包括对话内容窗口和上下文内容窗口；右侧是记忆显示和管理栏，显示核心记忆和档案记忆。

核心记忆更新，原始的HUMAN记忆为"The huamn name is Bob the Builder."，输入"I am Zhang Chuanhui, not Bob."，letta调用core_memory_replace更新HUMAN记忆，更新后的HUMAN记忆是"The huamn's name is Zhang Chuanhui"。

消息上下文摘要

当对话的消息上下文超过模型最大窗口时，下一次对话将清空一部分并做摘要总结。

输入:"推荐一本人工智能方面的书"，letta回复后，消息上下文已经超过最大窗口（4099/4096）。

在下一次对话时，例如输入"再推荐一本吧"，此时清空部分消息并做摘要总结，最后letta给出回复。

摘要内容Recursive Memory，总结了之前与agent的交互内容

档案记忆查询

Letta将外部知识也作为记忆的一部分，上传一个外部知识源，Employee Handbook。询问：'Search archival for our company's vacation policies'，letta通过搜索archival_memory_search搜索知识库中的相关信息，最终给出休假政策的回复。

Part5

总结

通过本文的介绍，我们了解了 Letta 作为有状态智能体开发框架的核心能力，特别是其自主管理记忆的特性，能让对话更加连贯、智能。在实际体验中，Letta 能够根据上下文动态调整记忆，使得长时间对话不再受限于短期信息，从而提供更自然、更人性化的交互体验。

未来，Letta 仍有很大的发展潜力。随着更先进的记忆管理机制、强化学习策略的融合，以及更高效的上下文建模，它或许能在智能助理、个性化推荐、虚拟陪伴等领域发挥更大的作用。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费场景POC验证，效果验证后签署服务协议。零风险落地应用大模型，已交付160+中大型企业