我要投稿

在个人电脑一键运行谷歌最新 Gemma-2-9B 大模型

发布日期：2024-08-24 22:48:43 浏览次数： 1734 来源：Second State

谷歌最近发布了9B和27B大小的 Gemma 2模型^[1]，这是其 Gemma 模型系列的最新型号。根据其技术报告，未来几天将开源一个 Gemma-2-2b 模型。技术报告还显示，Gemma-2-9B模型在多个基准测试中的表现超过了 Mistral-7B、Llama-3-8B和 Gemma 1.5模型。

如果想一键在你的计算机上运行 Gemma-9b-Chat，可以在终端中运行以下命令
bash <(curl -sSfL 'https://raw.githubusercontent.com/LlamaEdge/LlamaEdge/main/run-llm.sh') —model gemma-2-9b-it

本文将以 Gemma-2-9B 为例，手把手教你轻松

在自己的设备上运行 Gemma-2-9B on your own device
为 Gemma-2-9B 创建一个 API server

我们将使用 LlamaEdge^[2] （Rust + Wasm技术栈）开发并部署该模型的应用。无需安装复杂的 Python 包或 C++ 工具链！了解我们为什么选择 Rust + Wasm^[3]。

在你自己的设备上运行 Gemma-2-9B

第一步：用下面的命令行安装 WasmEdge^[4].

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.13.5

第二步：下载 Gemma-2-9B-it model GGUF 模型^[5]文件。模型有6.40 GB，所以下载可能需要一定时间

curl -LO https://huggingface.co/second-state/gemma-2-9b-it-GGUF/resolve/main/gemma-2-9b-it-Q5_K_M.gguf

第三步：下载一个跨平台的可移植 Wasm 文件，是个聊天应用。该应用让你能在命令行中与模型聊天。该应用的 Rust 源代码戳这里^[6]。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

就是这样。可以通过输入以下命令在终端与模型聊天。

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-2-9b-it-Q5_K_M.gguf llama-chat.wasm -p gemma-instruct -c 4096

这个可移植的Wasm应用会自动利用设备上的硬件加速器（例如GPU）。

[You]:
Tell me what is gemma?

[Bot]:
Gemma is an open-weights AI assistant developed by Google DeepMind.

Here are some key things to know about Gemma:

* **Open Weights:** This means the underlying code and model weights are freely accessible to the public. Anyone can inspect, modify, or build upon Gemma.
* **Text-Only:** Gemma is designed primarily for text-based interactions. It can understand and generate human-like text but doesn't have capabilities like image recognition or voice synthesis.
* **Focus on Research & Development:** Gemma is intended as a tool for researchers and developers to explore and experiment with different AI models and techniques.

**Why is Gemma significant?**

* **Transparency and Collaboration:** Open-weights nature promotes transparency in AI development and encourages collaboration among researchers worldwide.
* **Customization:** Users can adapt Gemma to their specific needs by fine-tuning it on their own datasets.
* **Accessibility:**  Makes advanced AI technology accessible to a wider range of individuals and organizations, beyond those with large resources.

**Keep in mind:**

* Gemma is still under development and may have limitations compared to more mature, closed-source models.
* Its capabilities are primarily focused on text processing.

为Gemma-2-9b-it^[7]创建一个兼容OpenAI的 API server

一个兼容 OpenAI 的API 使得 Llama-3-8B-Chinese 能够与不同的开发框架和工具无缝集成，比如 flows.network^[8], LangChain and LlamaIndex等等，提供更广泛的应用可能。大家也可以参考其代码自己写自己的API服务器或者其它大模型应用。想要启动 API 服务，请按以下步骤操作：下载这个 API 服务器应用。它是一个跨平台的可移植 Wasm 应用，可以在各种 CPU 和 GPU 设备上运行。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

然后，下载聊天机器人 Web UI，从而通过聊天机器人 UI 与模型进行交互。

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

接下来，使用以下命令行启动模型的 API 服务器。然后，打开浏览器访问 http://localhost:8080^[9] 开始聊天！

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-2b-it-Q5_K_M.gguf llama-api-server.wasm -p gemma-instruct -c 4096

另外打开一个终端窗口，可以使用 curl 与 API 服务器进行交互。

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are a sentient, superintelligent artificial general intelligence, here to teach and assist me."}, {"role":"user", "content": "Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world."}], "model":"Gemma-2b-it"}'

就是这样啦。WasmEdge 是运行 LLM 应用最简单、最快、最安全的方式^[10]。快来试试看吧！

53AI，企业落地应用大模型首选服务商

产品：大模型应用平台+智能体定制开发+落地咨询服务

承诺：先做场景POC验证，看到效果再签署服务协议。零风险落地应用大模型，已交付160+中大型企业

160+中大型企业正在使用53AI

立即咨询预约演示

百度智能云邀53AI：共创AI新纪元，启航智能新时代

2024-05-27

钉钉恒星计划：53AI与百余位企业家及钉钉生态伙伴，共议“AI 浪潮下的新机遇”

2024-05-22

热点资讯

太强了！10大开源大模型！

2024-05-06

Ollama 本地运行大模型(LLM)完全指南

2024-07-25

开源项目Composio：突破 AI 智能体开发的边界

2024-08-13

Qwen2 阿里最强开源大模型（Qwen2-7B）本地部署、API调用和WebUI对话机器人

2024-06-12

如何免费使用 Claude AI？Claude使用指南！

2024-07-11

中医大模型，开源！

2024-06-16

全民进入GPT-4时代：OpenAI强势推出GPT-4o mini！彻底取代GPT-3.5！

2024-07-20

QWEN2.5开源上新，14B/32B回归实测：这就是最强单卡本地模型！

2024-09-20

开源医疗大模型Llama3-Aloe-8B-Alpha，性能超越 MedAlpaca 和 PMC-LLaMA

2024-06-15

教你使用Ollama在本地跑Llama3.1

2024-07-25

大家都在问

微软开源的 Markitdown 可将任意文件转换为 Markdown 格式，PDF 解析咋样？

2024-12-20

开源！轻量！AI 代码助手插件 Continue 使用体验如何？

2024-12-19

不可思议！AirLLM 如何让 70B 大模型在 4GB GPU 上顺利推理？

2024-11-22

从浏览器自动化到个人助手，Skyvern又悄悄占据了AI市场的一席之地？

2024-11-19

这家小公司的700 亿参数模型竟然比Open o1还强？

2024-11-13

Ollama 更新！本地跑 LLama3.2，轻量级+视觉能力，能媲美GPT-4o？

2024-11-13

NotebookLM+AI播客到底是什么神仙组合？

2024-10-07

阿里巴巴发布Qwen2.5，对标GPT o1？

2024-09-22

开箱即用的企业大模型应用平台

工作+AI

业务+AI

AIx业务

大模型咨询

大模型定制

为Gemma-2-9b-it^[7]创建一个兼容OpenAI的 API server

相关资讯

160+中大型企业正在使用53AI

百度智能云邀53AI：共创AI新纪元，启航智能新时代

钉钉恒星计划：53AI与百余位企业家及钉钉生态伙伴，共议“AI 浪潮下的新机遇”

热点资讯

太强了！10大开源大模型！

Ollama 本地运行大模型(LLM)完全指南

开源项目Composio：突破 AI 智能体开发的边界

Qwen2 阿里最强开源大模型（Qwen2-7B）本地部署、API调用和WebUI对话机器人

如何免费使用 Claude AI？Claude使用指南！

中医大模型，开源！

全民进入GPT-4时代：OpenAI强势推出GPT-4o mini！彻底取代GPT-3.5！

QWEN2.5开源上新，14B/32B回归实测：这就是最强单卡本地模型！

开源医疗大模型Llama3-Aloe-8B-Alpha，性能超越 MedAlpaca 和 PMC-LLaMA

教你使用Ollama在本地跑Llama3.1

大家都在问

微软开源的 Markitdown 可将任意文件转换为 Markdown 格式，PDF 解析咋样？

开源！轻量！AI 代码助手插件 Continue 使用体验如何？

不可思议！AirLLM 如何让 70B 大模型在 4GB GPU 上顺利推理？

从浏览器自动化到个人助手，Skyvern又悄悄占据了AI市场的一席之地？

这家小公司的700 亿参数模型竟然比Open o1还强？

Ollama 更新！本地跑 LLama3.2，轻量级+视觉能力，能媲美GPT-4o？

NotebookLM+AI播客到底是什么神仙组合？

阿里巴巴发布Qwen2.5，对标GPT o1？

热门标签

开箱即用的企业大模型应用平台

工作+AI

业务+AI

AIx业务

大模型咨询

大模型定制

为Gemma-2-9b-it[7]创建一个兼容OpenAI的 API server

相关资讯

160+中大型企业正在使用53AI

百度智能云邀53AI：共创AI新纪元，启航智能新时代

钉钉恒星计划：53AI与百余位企业家及钉钉生态伙伴，共议“AI 浪潮下的新机遇”

热点资讯

太强了！10​大开源大模型！

Ollama 本地运行大模型(LLM)完全指南

开源项目Composio：突破 AI 智能体开发的边界

Qwen2 阿里最强开源大模型（Qwen2-7B）本地部署、API调用和WebUI对话机器人

如何免费使用 Claude AI？Claude使用指南！

中医大模型 ，开源！

全民进入GPT-4时代：OpenAI强势推出GPT-4o mini！彻底取代GPT-3.5！

QWEN2.5开源上新，14B/32B回归实测：这就是最强单卡本地模型！

开源医疗大模型Llama3-Aloe-8B-Alpha，性能超越 MedAlpaca 和 PMC-LLaMA

教你使用Ollama在本地跑Llama3.1

大家都在问

微软开源的 Markitdown 可将任意文件转换为 Markdown 格式，PDF 解析咋样？

开源！轻量！AI 代码助手插件 Continue 使用体验如何？

不可思议！AirLLM 如何让 70B 大模型在 4GB GPU 上顺利推理？

从浏览器自动化到个人助手，Skyvern又悄悄占据了AI市场的一席之地？

这家小公司的700 亿参数模型竟然比Open o1还强？

Ollama 更新！本地跑 LLama3.2，轻量级+视觉能力，能媲美GPT-4o？

NotebookLM+AI播客到底是什么神仙组合？

阿里巴巴发布Qwen2.5，对标GPT o1？

热门标签

为Gemma-2-9b-it^[7]创建一个兼容OpenAI的 API server

太强了！10大开源大模型！

中医大模型，开源！