微信扫码
与创始人交个朋友
我要投稿
01。
概述
一款利用检索增强生成(RAG)技术和LLaMA-3.1-8B即时大型语言模型(LLM)的个人助理工具。该工具旨在通过结合机器学习和基于检索的系统,彻底改变PDF文档分析任务。
02。
RAG架构的起源
03。
RAG 架构概述
04。
实现细节
05。
安装
!conda install -n pa \
pytorch \
torchvision \
torchaudio \
cpuonly \
-c pytorch \
-c conda-forge \
--yes
%pip install -U ipywidgets
%pip install -U requests
%pip install -U llama-index
%pip install -U llama-index-embeddings-huggingface
%pip install -U llama-index-llms-groq
%pip install -U groq
%pip install -U gradio
import os
import platform
import subprocess
import requests
def install_tesseract():
"""
Installs Tesseract OCR based on the operating system.
"""
os_name = platform.system()
if os_name == "Linux":
print("Detected Linux. Installing Tesseract using apt-get...")
subprocess.run(["sudo", "apt-get", "update"], check=True)
subprocess.run(["sudo", "apt-get", "install", "-y", "tesseract-ocr"], check=True)
elif os_name == "Darwin":
print("Detected macOS. Installing Tesseract using Homebrew...")
subprocess.run(["brew", "install", "tesseract"], check=True)
elif os_name == "Windows":
tesseract_installer_url = "https://github.com/UB-Mannheim/tesseract/releases/download/v5.4.0.20240606/tesseract-ocr-w64-setup-5.4.0.20240606.exe"
installer_path = "tesseract-ocr-w64-setup-5.4.0.20240606.exe"
response = requests.get(tesseract_installer_url)
with open(installer_path, "wb") as file:
file.write(response.content)
tesseract_path = r"C:\Program Files\Tesseract-OCR"
os.environ["PATH"] += os.pathsep + tesseract_path
try:
result = subprocess.run(["tesseract", "--version"], check=True, capture_output=True, text=True)
print(result.stdout)
except subprocess.CalledProcessError as e:
print(f"Error running Tesseract: {e}")
else:
print(f"Unsupported OS: {os_name}")
install_tesseract()
Convert PDF to OCR
import webbrowser
url = "https://www.ilovepdf.com/ocr-pdf"
webbrowser.open_new(url)
import os
from llama_index.core import (
Settings,
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
load_index_from_storage
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.groq import Groq
import gradio as gr
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-11-09
先进的多文档问答(MDQA)框架HiQA:大幅降低区分度低的复杂多文档RAG的幻觉问题
2024-11-08
AI改变工作:一天内打造专属于你自己的RAG
2024-11-08
打造自己的RAG解析大模型:(新技能)企业垂类数据标注(一)
2024-11-08
一篇大模型RAG最新综述
2024-11-08
微软GraphRAG 0.4.0&DRIFT图推理搜索更新
2024-11-08
小模型在RAG(Retrieval-Augmented Generation)系统中的应用:提升效率与可扩展性的新路径
2024-11-08
RAG评估:RAGChecker重磅发布!精准诊断RAG系统的全新细粒度框架!
2024-11-07
蚂蚁KAG框架核心功能研读
2024-07-18
2024-07-09
2024-07-09
2024-05-05
2024-05-19
2024-07-07
2024-06-20
2024-07-07
2024-07-08
2024-07-09
2024-11-06
2024-11-06
2024-11-05
2024-11-04
2024-10-27
2024-10-25
2024-10-21
2024-10-21