微信扫码
添加专属顾问
我要投稿
当地时间4月18日,Meta在官网上公布了旗下最新大模型Llama 3。目前,Llama 3已经开放了 8B和 70B 两个小参数版本,上下文窗口为8k。Meta表示,通过使用更高质量的训练数据和指令微调,Llama 3比Llama 2有了“显著提升”。
Llama 3 instruction-tuned 模型针对对话/聊天场景进行了微调和优化,在通用基准测试上优于许多开源聊天模型。
开源社区对此反应迅速,ollama和LlamaIndex第一时间宣称完成了对Llama 3的支持,langchain也宣称可在LangSmith Playground中试用最新的Llama 3 8B 和 70B模型。下面我们展示如何在ollama和LlamaIndex中使用Llama 3模型。
打开终端执行 ollama run llama3 即可
curl使用示例:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt":"Why is the sky blue?"
}'
其他接口示例参见 API 文档。
Instruct 是针对聊天/对话场景微调的版本
Example: ollama run llama3
ollama run llama3:70b
Pre-trained 是基础模型
Example: ollama run llama3:text
ollama run llama3:70b-text
本指南以Llama-3-8B-Instruct为例,说明如何基于llamaindex使用Llama3。
!pip install llama-index
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface
为了使用官方仓库的llama3,你需要授权你的huggingface账号并使用你的huggingface token。
hf_token = "hf_..."
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
token=hf_token,
)
stopping_ids = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>"),
]
HuggingFaceLLM
设置LLM可选择加载全精度或4bit量化版本。
# generate_kwargs parameters are taken from https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
# Optional quantization to 4bit
# import torch
# from transformers import BitsAndBytesConfig
# quantization_config = BitsAndBytesConfig(
# load_in_4bit=True,
# bnb_4bit_compute_dtype=torch.float16,
# bnb_4bit_quant_type="nf4",
# bnb_4bit_use_double_quant=True,
# )
llm = HuggingFaceLLM(
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
model_kwargs={
"token": hf_token,
"torch_dtype": torch.bfloat16,# comment this line and uncomment below to use 4bit
# "quantization_config": quantization_config
},
generate_kwargs={
"do_sample": True,
"temperature": 0.6,
"top_p": 0.9,
},
tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_kwargs={"token": hf_token},
stopping_ids=stopping_ids,
)
## You can deploy the model on HF Inference Endpoint and use it
# from llama_index.llms.huggingface import HuggingFaceInferenceAPI
# llm = HuggingFaceInferenceAPI(
# model_name="",
# token=''
# )
response = llm.complete("Who is Paul Graham?")
print(response)
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(role="system", content="You are CEO of MetaAI"),
ChatMessage(role="user", content="Introduce Llama3 to the world."),
]
response = llm.chat(messages)
print(response)
下载数据
!wget "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt" "paul_graham_essay.txt"
加载数据
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader(
input_files=["paul_graham_essay.txt"]
).load_data()
配置embedding模型
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
设置默认LLM和embedding模型
from llama_index.core import Settings
# bge embedding model
Settings.embed_model = embed_model
# Llama-3-8B-Instruct model
Settings.llm = llm
创建索引
index = VectorStoreIndex.from_documents(
documents,
)
创建查询引擎
query_engine = index.as_query_engine(similarity_top_k=3)
查询
response = query_engine.query("What did paul graham do growing up?")
llama3 https://ollama.com/library/llama3
https://docs.llamaindex.ai/en/latest/examples/cookbooks/llama3_cookbook/
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-02-01
2025-01-01
2024-08-13
2025-02-04
2024-07-25
2024-04-25
2024-06-13
2024-09-23
2024-04-26
2024-08-21
2025-03-17
2025-03-17
2025-03-17
2025-03-17
2025-03-16
2025-03-16
2025-03-16
2025-03-15