微信扫码
添加专属顾问
我要投稿
pip install magic-pdf
cp magic-pdf.template.json ~/magic-pdf.json
magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path"
图像
,表格
,标题
,文本
等;行内公式
和行间公式
;conda create -n pdfpipeline python=3.10
git clone https://github.com/opendatalab/PDF-Extract-Kit.git
pip3 install -r requirements+cpu.txt
pip install https://github.com/opendatalab/PDF-Extract-Kit/raw/main/assets/whl/detectron2-0.6-cp310-cp310-macosx_11_0_arm64.whl
PDF-Extract-Kit/configs/model_configs.yaml:2
PDF-Extract-Kit/modules/layoutlmv3/layoutlmv3_base_inference.yaml:72
git lfs clone https://huggingface.co/wanderkid/PDF-Extract-Kit
python pdf_extract.py --pdf data/pdfs/ocr_1.pdf
File "/Users/linyu/ai/PDF-Extract-Kit/modules/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 71, in forward
position_embedding = F.interpolate(position_embedding, size=(Hp, Wp), mode='bicubic')
NotImplementedError: The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
export PYTORCH_ENABLE_MPS_FALLBACK=1
或者在py 运行文件中加import os
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
RuntimeError: Expected repeatBuffer && cumsumBuffer && resultBuffer to be true, but got false.
magic-pdf pdf-command --pdf "/Users/linyu/ai/pdf储能政策知识库/《“十四五”能源领域科技创新规划》2021.pdf" --model ""
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-07-02
别再只盯着分数线!手把手教你用AI挖掘最适合你的大学和专业(附提问模板)
2025-07-01
悟空Agent实战:LLaMA-Factory高危0day漏洞挖掘与修复
2025-07-01
Google 开源全球首个实时音乐大模型
2025-06-30
重磅!百度文心一言4.5开源,包含2个多模态大模型,4个大语言模型,最大参数量4240亿!完全免费商用授权!
2025-06-28
卷疯了!这个清华系Agent框架开源后迅速斩获1.9k stars,还要“消灭”Prompt?
2025-06-28
刚刚,马斯克宣布:Grok 4 将于7月4日后发布!
2025-06-27
Local MCP时代来临:一键集成AI的Desktop Extensions(.dxt)深度解析
2025-06-27
PaddleOCR 3.0重磅发布!OCR精度跃升13%,多场景文档解析全面升级
2025-06-17
2025-06-17
2025-04-13
2025-04-29
2025-04-12
2025-04-10
2025-04-29
2025-04-29
2025-04-15
2025-05-29
2025-06-28
2025-06-25
2025-06-25
2025-06-21
2025-06-16
2025-06-15
2025-06-14
2025-06-10