我要投稿

轻量高效的知识图谱RAG系统：LightRAG

发布日期：2024-12-16 14:28:38 浏览次数： 2346

作者：深度学习机器

微信搜一搜，关注“深度学习机器”

LightRAG是港大Data Lab提出一种基于知识图谱结构的RAG方案，相比GraphRAG具有更快更经济的特点。

架构

1 索引阶段：对文档进行切分处理，提取其中的实体和边分别进行向量化处理，存放在向量知识库

2 检索阶段：对用于输入分别提取局部和全局关键词，分别用于检索向量知识库中的实体和边关系，同时结合相关的chunk进行总结

下载方式

1 源码安装

cd LightRAG
pip install -e .

2 pypi源安装

pip install lightrag-hku

需要额外手动安装多个包，不太方便。建议从源码安装，可以直接下载所有依赖

模型支持

1 支持兼容openai规范的接口

async defllm_model_func(
    prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
)->str:
returnawait openai_complete_if_cache(
"solar-mini",
        prompt,
        system_prompt=system_prompt,
        history_messages=history_messages,
        api_key=os.getenv("UPSTAGE_API_KEY"),
        base_url="https://api.upstage.ai/v1/solar",
**kwargs
)

asyncdefembedding_func(texts: list[str])-> np.ndarray:
returnawait openai_embedding(
        texts,
        model="solar-embedding-1-large-query",
        api_key=os.getenv("UPSTAGE_API_KEY"),
        base_url="https://api.upstage.ai/v1/solar"
    )

2 支持hg部署模型

from lightrag.llm import hf_model_complete, hf_embedding
from transformers importAutoModel,AutoTokenizer
from lightrag.utils importEmbeddingFunc


# Initialize LightRAG with Hugging Face model
rag =LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=hf_model_complete,# Use Hugging Face model for text generation
    llm_model_name='meta-llama/Llama-3.1-8B-Instruct',# Model name from Hugging Face
# Use Hugging Face embedding function
    embedding_func=EmbeddingFunc(
        embedding_dim=384,
        max_token_size=5000,
        func=lambda texts: hf_embedding(
            texts,
            tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"),
            embed_model=AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
)
),
)

3 支持ollama模型

from lightrag.llm import ollama_model_complete, ollama_embedding
from lightrag.utils importEmbeddingFunc

# Initialize LightRAG with Ollama model
rag =LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=ollama_model_complete,# Use Ollama model for text generation
    llm_model_name='your_model_name',# Your model name
# Use Ollama embedding function
    embedding_func=EmbeddingFunc(
        embedding_dim=768,
        max_token_size=8192,
        func=lambda texts: ollama_embedding(
            texts,
            embed_model="nomic-embed-text"
)
),
)

修改了模型需要重新构建新目录，否则部分参数会报错

基本操作

查询参数

可以设置查询时的参数，如检索模式、topk等

class QueryParam:
    mode:Literal["local","global","hybrid","naive"]="global"
    only_need_context:bool=False
    response_type:str="Multiple Paragraphs"
# Number of top-k items to retrieve; corresponds to entities in "local" mode and relationships in "global" mode.
    top_k:int=60
# Number of tokens for the original chunks.
    max_token_for_text_unit:int=4000
# Number of tokens for the relationship descriptions
    max_token_for_global_context:int=4000
# Number of tokens for the entity descriptions
    max_token_for_local_context:int=4000

print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))

增量添加文档数据

与初始化图谱类似，执行insert操作即可。

with open("./newText.txt") as f:
    rag.insert(f.read())

添加自定义图谱

除了从文档创建图谱外，LightRAG还支持以离线的方式添加实体或者关系以及原始chunk。

custom_kg = {
"entities":[
{
"entity_name":"CompanyA",
"entity_type":"Organization",
"description":"A major technology company",
"source_id":"Source1"
},
{
"entity_name":"ProductX",
"entity_type":"Product",
"description":"A popular product developed by CompanyA",
"source_id":"Source1"
}
],
"relationships":[
{
"src_id":"CompanyA",
"tgt_id":"ProductX",
"description":"CompanyA develops ProductX",
"keywords":"develop, produce",
"weight":1.0,
"source_id":"Source1"
}
],
"chunks":[
{
"content":"ProductX, developed by CompanyA, has revolutionized the market with its cutting-edge features.",
"source_id":"Source1",
},
{
"content":"PersonA is a prominent researcher at UniversityB, focusing on artificial intelligence and machine learning.",
"source_id":"Source2",
},
{
"content":"None",
"source_id":"UNKNOWN",
},
],
}

rag.insert_custom_kg(custom_kg)

删除实体

# 删除特定名称的实体
rag.delete_by_entity("Project Gutenberg")

总结

● 在构建图谱的过程中为每个实体节点和关系边生成一个文本的键值对。每个索引键是一个单词或短语，用于高效检索，对应的值是一个经过总结外部数据后生成的文本段落，，有助于文本生成。

● 增量更新算法使得在新增文档的适合无需重新构建图谱，这使得LightRAG具有更显著的经济性和便捷性。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费场景POC验证，效果验证后签署服务协议。零风险落地应用大模型，已交付160+中大型企业

相关资讯

2025-06-24

上一代知识问答的回顾

2025-06-17

当智能体遇上GraphRAG：构建下一代动态路由知识图谱问答系统

2025-06-17

大模型生成知识图谱——GraphRAG原理

2025-06-16

知识图谱焕发生机，激发大模型LLM深层次推理 —— 昨天，今天和明天

2025-06-15

基于知识图谱的Zero-Shot问答：大语言模型的事实锚定新范式

2025-06-14

如何为客户数据构建语义视图？

2025-06-13

构建下一代AI：深入探讨知识图谱 KG 与大模型 LLM 的集成方法

2025-06-02

知识图谱与LLM接口优化：突破复杂推理的性能瓶颈

了解更多

160+中大型企业正在使用53AI

立即咨询预约演示

把握AI发展的机遇，共同探索、共同进步

2025-01-22

如何打造基于GenAI的员工服务机器人

2025-01-22

热点资讯

解决Dify与Milvus集成难题：从零到一的实战避坑指南

2025-04-07

智谱共融：大模型驱动的知识图谱范式重构与演进路径

2025-04-21

千万级向量数据库实战对比：Milvus，Qdrant，Chroma，Weaviate

2025-05-06

Graph-RAG全面综述：如何用知识图谱+大模型解决信息检索难题？

2025-05-23

公安专业知识库的构建：DeepSeek大模型技术赋能智慧警务新未来

2025-04-07

腾讯ima支持Markdown格式了，开心了一会又不开心了

2025-04-07

向量数据库对比：优缺点、适用场景与案例分析

2025-05-15

LLM知识图谱构建器：前端架构如何革新数据可视化？

2025-04-09

知识图谱激活 DeepSeek 智能体，图模互补重构企业专业知识管理

2025-05-28

知识库优化之路（三）：嵌入模型的选择和使用方法

2025-04-15

大家都在问

如何为客户数据构建语义视图？

2025-06-14

Agent Infra 图谱：哪些组件值得为 Agent 重做一遍？

2025-05-23

Graph-RAG全面综述：如何用知识图谱+大模型解决信息检索难题？

2025-05-23

无需代码！MCP + Neo4j 如何颠覆知识图谱构建？

2025-05-22

AI搜索与向量数据-模型是如何将信息和数据编码成知识的？

2025-05-20

LLM如何将杂乱文本变为可视化知识图谱？

2025-04-20

微软Phi-4-mini：小模型如何在GraphRAG中大放异彩？

2025-04-15

LLM知识图谱构建器：前端架构如何革新数据可视化？

2025-04-09

热门标签

内容创作大模型技术个人提效 langchain llamaindex 多模态技术 RAG技术智能客服知识图谱模型微调 RAGFlow coze Dify Fastgpt Bisheng Qanything AI+汽车 AI+金融 AI+工业 AI+培训 AI+SaaS 提示词框架提示词技巧 AI+电商 AI面试数字员工 ChatBI 知识管理开源大模型智能营销智能硬件智能化改造 AI+医疗 MaxKB