微信扫码
与创始人交个朋友
我要投稿
pip install -U optimum[neural-compressor] intel-extension-for-transformers
def quantize(model_name: str, output_path: str, calibration_set: "datasets.Dataset"):
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def preprocess_function(examples):
return tokenizer(examples["text"], padding="max_length", max_length=512, truncation=True)
vectorized_ds = calibration_set.map(preprocess_function, num_proc=10)
vectorized_ds = vectorized_ds.remove_columns(["text"])
quantizer = INCQuantizer.from_pretrained(model)
quantization_config = PostTrainingQuantConfig(approach="static", backend="ipex", domain="nlp")
quantizer.quantize(
quantization_config=quantization_config,
calibration_dataset=vectorized_ds,
save_directory=output_path,
batch_size=1,
)
tokenizer.save_pretrained(output_path)
# 数据集地址https://huggingface.co/datasets/allenai/qasper
from optimum.intel import IPEXModelmodel = IPEXModel.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")
inputs = tokenizer(sentences, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# get the [CLS] token
embeddings = outputs[0][:, 0]
从上面的结果可以看出,通过量化后模型的延迟和吞吐量都有大幅提升。大家是不是学会的呢。下篇我们继续介绍一个相关工具,辅助我们高效管理RAG流程。
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-11-08
一篇大模型RAG最新综述
2024-11-08
微软GraphRAG 0.4.0&DRIFT图推理搜索更新
2024-11-08
小模型在RAG(Retrieval-Augmented Generation)系统中的应用:提升效率与可扩展性的新路径
2024-11-08
RAG评估:RAGChecker重磅发布!精准诊断RAG系统的全新细粒度框架!
2024-11-07
蚂蚁KAG框架核心功能研读
2024-11-07
为什么它是从PDF中解析数据的最佳工具?PDF文件解析新选择,构建LLM 大模型数据基础
2024-11-06
RAG vs ICL:AI大模型的记忆术和临场发挥,谁才是最强辅助?
2024-11-06
Long2RAG:评估长上下文与长形式检索增强生成与关键点召回
2024-07-18
2024-07-09
2024-07-09
2024-05-05
2024-05-19
2024-07-07
2024-06-20
2024-07-07
2024-07-08
2024-07-09
2024-11-06
2024-11-06
2024-11-05
2024-11-04
2024-10-27
2024-10-25
2024-10-21
2024-10-21