微信扫码
与创始人交个朋友
我要投稿
LightRAG是港大Data Lab提出一种基于知识图谱结构的RAG方案,相比GraphRAG具有更快更经济的特点。
1 索引阶段:对文档进行切分处理,提取其中的实体和边分别进行向量化处理,存放在向量知识库
2 检索阶段:对用于输入分别提取局部和全局关键词,分别用于检索向量知识库中的实体和边关系,同时结合相关的chunk进行总结
cd LightRAG
pip install -e .
pip install lightrag-hku
需要额外手动安装多个包,不太方便。建议从源码安装,可以直接下载所有依赖
async defllm_model_func(
prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
)->str:
returnawait openai_complete_if_cache(
"solar-mini",
prompt,
system_prompt=system_prompt,
history_messages=history_messages,
api_key=os.getenv("UPSTAGE_API_KEY"),
base_url="https://api.upstage.ai/v1/solar",
**kwargs
)
asyncdefembedding_func(texts: list[str])-> np.ndarray:
returnawait openai_embedding(
texts,
model="solar-embedding-1-large-query",
api_key=os.getenv("UPSTAGE_API_KEY"),
base_url="https://api.upstage.ai/v1/solar"
)
from lightrag.llm import hf_model_complete, hf_embedding
from transformers importAutoModel,AutoTokenizer
from lightrag.utils importEmbeddingFunc
# Initialize LightRAG with Hugging Face model
rag =LightRAG(
working_dir=WORKING_DIR,
llm_model_func=hf_model_complete,# Use Hugging Face model for text generation
llm_model_name='meta-llama/Llama-3.1-8B-Instruct',# Model name from Hugging Face
# Use Hugging Face embedding function
embedding_func=EmbeddingFunc(
embedding_dim=384,
max_token_size=5000,
func=lambda texts: hf_embedding(
texts,
tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"),
embed_model=AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
)
),
)
from lightrag.llm import ollama_model_complete, ollama_embedding
from lightrag.utils importEmbeddingFunc
# Initialize LightRAG with Ollama model
rag =LightRAG(
working_dir=WORKING_DIR,
llm_model_func=ollama_model_complete,# Use Ollama model for text generation
llm_model_name='your_model_name',# Your model name
# Use Ollama embedding function
embedding_func=EmbeddingFunc(
embedding_dim=768,
max_token_size=8192,
func=lambda texts: ollama_embedding(
texts,
embed_model="nomic-embed-text"
)
),
)
修改了模型需要重新构建新目录,否则部分参数会报错
可以设置查询时的参数,如检索模式、topk等
class QueryParam:
mode:Literal["local","global","hybrid","naive"]="global"
only_need_context:bool=False
response_type:str="Multiple Paragraphs"
# Number of top-k items to retrieve; corresponds to entities in "local" mode and relationships in "global" mode.
top_k:int=60
# Number of tokens for the original chunks.
max_token_for_text_unit:int=4000
# Number of tokens for the relationship descriptions
max_token_for_global_context:int=4000
# Number of tokens for the entity descriptions
max_token_for_local_context:int=4000
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))
与初始化图谱类似,执行insert
操作即可。
with open("./newText.txt") as f:
rag.insert(f.read())
除了从文档创建图谱外,LightRAG还支持以离线的方式添加实体或者关系以及原始chunk。
custom_kg = {
"entities":[
{
"entity_name":"CompanyA",
"entity_type":"Organization",
"description":"A major technology company",
"source_id":"Source1"
},
{
"entity_name":"ProductX",
"entity_type":"Product",
"description":"A popular product developed by CompanyA",
"source_id":"Source1"
}
],
"relationships":[
{
"src_id":"CompanyA",
"tgt_id":"ProductX",
"description":"CompanyA develops ProductX",
"keywords":"develop, produce",
"weight":1.0,
"source_id":"Source1"
}
],
"chunks":[
{
"content":"ProductX, developed by CompanyA, has revolutionized the market with its cutting-edge features.",
"source_id":"Source1",
},
{
"content":"PersonA is a prominent researcher at UniversityB, focusing on artificial intelligence and machine learning.",
"source_id":"Source2",
},
{
"content":"None",
"source_id":"UNKNOWN",
},
],
}
rag.insert_custom_kg(custom_kg)
# 删除特定名称的实体
rag.delete_by_entity("Project Gutenberg")
● 在构建图谱的过程中为每个实体节点和关系边生成一个文本的键值对。每个索引键是一个单词或短语,用于高效检索,对应的值是一个经过总结外部数据后生成的文本段落,,有助于文本生成。
● 增量更新算法使得在新增文档的适合无需重新构建图谱,这使得LightRAG具有更显著的经济性和便捷性。
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-12-21
SAC-KG:利用大型语言模型一键构建领域知识图谱 - 中科大&阿里
2024-12-19
北大Chatlaw - 基于知识图谱增强混合专家模型的多智能体法律助手
2024-12-18
Elasticsearch vs 向量数据库:寻找最佳混合检索方案
2024-12-16
5种方法,让文本信息瞬间变成结构化图谱!
2024-12-16
向量数据库到底算不算一种NoSQL数据库?
2024-12-14
大模型能自动创建高质量知识图谱吗?可行性及人机协同机制 - WhyHow.AI
2024-12-12
大模型+知识图谱在工业领域落地的4大场景
2024-12-11
GPT-4o+知识图谱:法律文档检索的新革命
2024-07-17
2024-07-11
2024-08-13
2024-07-13
2024-07-12
2024-06-24
2024-07-08
2024-06-10
2024-07-26
2024-07-04
2024-12-16
2024-12-10
2024-12-04
2024-12-01
2024-11-30
2024-11-22
2024-11-04
2024-10-10