我要投稿

从零开始的DeepSeek微调训练实战（SFT）

发布日期：2025-03-11 10:08:44 浏览次数： 2007 作者：阿里云开发者

前言

本文重点介绍使用微调框架unsloth，围绕DeepSeek R1 Distill 7B模型进行高效微调，并介绍用于推理大模型高效微调的COT数据集的创建和使用方法，并在一个medical-o1-reasoning-SFT数据集上完成高效微调实战，并最终达到问答风格优化&知识灌注目的。

你能收获什么：
亲手完成DeepSeek R1蒸馏模型的微调实战
对模型微调、推理数据集等知识有一定了解
对大模型运行的机制和原理有一定的了解
有机会制作一个属于自己的定制化大模型

复现仅需7G显存、半小时运行时间即可完成一次最小可行性实验，并获得微调效果。本人使用的GPU信息如下所示，使用平台为云开发平台ADC：

一、基础概念介绍

1.1 微调与强化学习、模型蒸馏

伴随着DeepSeek的兴起，关于强化学习训练、模型蒸馏等概念也逐渐被人熟知，这里简单总结下这三者的异同。微调、强化学习训练和模型蒸馏都是常用的技术方法，尽管这些方法在某些方面存在交集，但它们的核心原理和任务目标却有显著差异。

1. 微调（Fine-tuning）：

微调是指在已经训练好的大型预训练模型的基础上，进一步训练该模型以适应特定任务或特定领域的数据。相比从零开始训练一个模型，微调所需的数据和计算资源显著减少；可以在特定任务上取得更好的性能，因为模型在微调过程中会重点学习与任务相关的特性；可以在多种领域（如情感分析、问答系统等）上进行微调，从而快速适应不同应用场景。

举个?：想象一下，你有一只机器人狗，它已经在基本的狗行为上进行了初步训练，比如行走和听从简单的命令。微调就像是对这只机器狗进行进一步的训练以适应特定的任务环境。比如说，你希望这只机器狗能够在公园里捡回特定种类的球。通过微调，你可以在原有的训练基础上，用一组特定的数据集（比如各种颜色和大小的球）来调整其行为，使其在新环境中表现得更好。

●目标：通过少量的标注数据对预训练模型进行优化，适应具体任务。

●特点：微调的计算量相对较小，能够在有限的数据和计算资源下提升模型在特定任务上的性能。

●应用：常用于下游任务如情感分析、机器翻译、推荐系统等。

2. 强化学习（Reinforcement Learning）：

强化学习是一种机器学习方法，它通过让智能体在环境中执行动作，以获得反馈或奖励信号，从而学习最优策略。通过不断地试错和调整策略，智能体逐渐找到能够最大化长期回报的行为路径。这种学习方法常用于需要决策和动态环境交互的任务，如游戏、机器人导航和自动化控制系统。

举个?：强化学习训练则有点像是教这只机器狗通过尝试和错误来学习新技能。在这种情况下，你没有直接告诉它应该怎么做，而是为它设定一个目标，比如尽可能快地找到并捡起一只球。机器狗每完成一次任务都会获得奖励，然后它将通过调整自己的行为来最大化获得的奖励。例如，如果机器狗发现跑直线能更快地找到球，它可能会在未来的尝试中更倾向于这样做。

●目标：通过与环境的交互，学习最优的行为策略，最大化累积奖励。

●特点：强化学习强调动态决策，它通常不依赖于预定义的数据集，而是依赖于与环境的持续交互。

●应用：强化学习在游戏AI（如AlphaGo）、机器人控制、自动驾驶等任务中有广泛应用。

3. 模型蒸馏（Model Distillation）：

模型蒸馏是一种模型压缩技术，通过将一个复杂的大型模型（通常称为“教师模型”）中的知识迁移到一个更小的模型（称为“学生模型”）。在这个过程中，教师模型首先对训练数据进行预测，生成软标签即概率分布。这些软标签包含了有关任务的重要信息。学生模型则使用这些软标签进行训练，以接近教师模型的性能。模型蒸馏能够在保持高精度的同时，显著减少模型的大小和计算消耗，适用于在资源受限的环境下部署机器学习模型。

举个?：你有一只非常昂贵和精密的机器人狗，它可以完美执行任务。为了降低成本，你希望制造一个更简单的机器狗，同样能有效完成任务。通过模型蒸馏，你会使用大狗的行为数据来训练小狗，让后者理解和模仿前者的精妙动作，同时保持高效性。

●目标：通过教师模型的“知识转移” ，帮助学生模型提升性能，特别是计算能力有限的设备上。

●特点：蒸馏的核心在于知识的迁移，尤其是在模型压缩和部署方面的优势。学生模型通常在性能上能接近教师模型，但参数量更小，计算更高效。

●应用：常见于模型压缩、边缘计算、低功耗设备的部署中，用于提升部署效率并降低计算需求。

1.2 大模型微调

与RAG（Retrieval-Augmented Generation）或Agent技术依靠构建复杂的工作流以优化模型性能不同，微调通过直接调整模型的参数来提升模型的能力。这种方法让模型通过在特定任务的数据上进行再训练，从而'永久'掌握该任务所需的技能。微调不仅可以显著提高模型在特定领域或任务上的表现，还能使其适应于各种具体应用场景的需求。这种能力的增强是通过更精细地调整模型内部的权重和偏差，使其在理解和生成信息时更加精准，因此被广泛用于需要高精度和领域适应性的任务中。

1.2.1 全量微调与高效微调

从广义上讲，微调可以分为两种主要方式：全量微调和高效微调。全量微调是指利用所有可用数据来重新训练模型，以全面优化其参数。尽管这种方法对计算资源的需求较高，但它能够在最大程度上提升模型对特定任务的适应能力。相反，高效微调则采用更精简的策略，只使用部分数据进行调整，并主要修改模型的部分参数。这种方法以相对较低的计算开销，实现对模型性能的显著提升，类似于“以小博大”，非常适合在资源有限的情况下快速调整和增强模型的性能。

全量微调（Full Fine-Tuning）

举个?：想象一下你在一家公司管理一个团队，这个团队的所有成员已经接受了基础培训，知道如何处理一般的工作任务。现在，公司引入了一个全新的复杂项目，要求团队具备更多的专业技能和知识。

●优点：全面掌握所有相关技能，使模型对新任务有更高的适应性。

●缺点：耗时更长，资源消耗大。

高效微调（Efficient Fine-Tuning）

高效微调的方法更有针对性，它不需要花费大量的时间和资源。举个?：比如，如果机器人狗的任务只是要学会在一种新环境中识别特别的障碍物，你可以在已有的模型基础上，仅仅微调那些与识别相关的参数，而无需重新训练整个模型。

●优点：节省时间和资源，快速提升特定技能。

●缺点：可能不如全面培训那样细致和彻底，但能够在特定任务中高效达标。

现在绝大多数开源模型，在开源的时候都会公布两个版本的模型，其一是Base模型，该模型只经过了预训练，没有经过指令微调；其二则是Chat模型（或者就是不带尾缀的模型），则是在预训练模型基础上进一步进行全量指令微调之后的对话模型：

1.2.2 高效微调与LoRA、 QLoRA

尽管全量微调可以对模型的能力进行深度改造，但要带入模型全部参数进行训练，需要消耗大量的算力，且有一定的技术门槛。相比之下，在绝大多数场景中，如果我们只想提升模型某个具体领域的能力，那高效微调会更加合适。尽管在2020年前后，深度学习领域诞生了很多高效微调的方法，但现在适用于大模型的最主流的高效微调方法只有一种——LoRA。

LoRA（ Low-Rank Adaptation）微调是一种参数高效的微调方法，旨在通过引入低秩矩阵来减少微调时需要调整的参数数量，从而显著降低显存和计算资源的消耗。具体来说，LoRA 微调并不直接调整原始模型的所有参数，而是通过在某些层中插入低秩的适配器（Adapter）层来进行训练。

LoRA的原理：

●在标准微调中，会修改模型的所有权重，而在 LoRA 中，只有某些低秩矩阵（适配器）被训练和调整。这意味着原始模型的参数保持不变，只是通过少量的新参数来调整模型的输出。

●低秩矩阵的引入可以在显存和计算能力有限的情况下，依然有效地对大型预训练模型进行微调，从而让 LoRA 成为显存较小的设备上的理想选择。

举个?：想象你想教学生们怎样进行快速心算而不去完全打破他们原有的学习方法。你决定只引入一个简化版本的心算技巧，让他们在现有知识的基础上进行少量调整。这就像是把原有的学习方式轻量化处理，只增加所需的少量新知识，而不是重新教授整个数学课程。

LoRA的优势：

1.显存优化：只需要调整少量的参数（适配器），显著减少了显存需求，适合显存有限的GPU。

2.计算效率：微调过程中的计算负担也更轻，因为减少了需要调整的参数量。

3.灵活性：可以与现有的预训练模型轻松结合使用，适用于多种任务，如文本生成、分类、问答等。

而QLoRA（Quantized Low-Rank Adaptation）则是 LoRA 的一个扩展版本，它结合了 LoRA 的低秩适配器和量化技术。QLoRA 进一步优化了计算效率和存储需求，特别是在极端显存受限的环境下。与 LoRA 不同的是， QLoRA 会将插入的低秩适配器层的部分权重进行量化（通常是量化为INT4或INT8），在保持性能的同时显著降低模型的存储和计算需求。

举个?：针对学生中一些学习资源（如时间或精力）更加有限的情况，你进一步优化教学方法，不仅简化了学习内容（类似LoRA），同时还用了一些有助于记忆的技巧（比如使用图像或口诀），从而更有效地传授知识。这样，每个学生能在有限时间内学会心算法。在技术上，QLoRA涉及量化（quantization）技术，将模型的一部分权重参数存储在较低精度的数值格式中，以此减少内存使用和计算量，同时结合LoRA的低秩调整，让适应过程更加高效。

QLoRA的优势：

1.在显存非常有限的情况下仍能进行微调。

2.可以处理更大规模的模型。

3.适合用于边缘设备和需要低延迟推理的场景。

1.3 高效微调的应用场景

在实际大模型应用场景中，高效微调主要用于以下四个方面：

1.对话风格微调：高效微调可以用于根据特定需求调整模型的对话风格。例如，针对客服系统、虚拟助理等场景，模型可以通过微调来适应不同的语气、礼貌程度或回答方式，从而在与用户互动时提供更符合要求的对话体验。通过微调少量的参数（例如对话生成的策略、情感表达等），可以使模型表现出更具针对性和个性化的风格。

2.知识灌注：知识灌注是指将外部知识或领域特定的信息快速集成到已有的预训练模型中。通过高效微调，模型可以更好地学习新领域的专有知识，而无需重新从头开始训练。例如，对于法律、医疗等专业领域，可以使用少量的标注数据对预训练模型进行微调，帮助模型理解特定行业的术语、规则和知识，进而提升专业领域的问答能力。

3.推理能力提升：高效微调还可以用于提升大模型的推理能力，尤其是在处理更复杂推理任务时。通过微调，模型能够更加高效地理解长文本、推理隐含信息，或者从数据中提取逻辑关系，进而在多轮推理任务中提供更准确的答案。这种微调方式可以帮助模型在解答复杂问题时，提高推理准确性并减少错误。

4.Agent能力（Function calling & MCP能力）提升：在多任务协作或功能调用场景中，高效微调能够显著提升模型Agent能力，使得模型能够有效地与其他系统进行交互、调用外部API或执行特定MCP任务。通过针对性微调，模型可以学会更精准的功能调用策略、参数解析和操作指令，从而在自动化服务、智能助手或机器人控制等领域表现得更加高效和智能。

二、 DeepSeek R1 Distill高效微调环境准备

2.1 unsloth安装

unsloth是推理、微调一体式框架，unsloth将Llama 3.3、Mistral、Phi-4、Qwen 2.5和Gemma的微调速度提高2倍，同时节省80%的内存。

官网地址：GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

https://github.com/unslothai/unsloth

使用如下命令快速安装：

pip install unslothpip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.gi

2.2 wandb安装与注册

2.2.1 wandb基本说明

在大规模模型训练中，往往需要监控和分析大量的训练数据，而WandB可以帮助实现这一目标。它提供了以下几个重要的功能（注意数据安全问题，wandb内网穿透⚠️）：

2.2.2 wandb注册与使用

wandb官网：https://wandb.ai/site

使用邮箱注册后，记得拷贝下APIkey

然后即可在令行中输入如下代码安装wandb:

pip install wandb

可设置wandb进行微调记录，并可在对应网站上观察到训练过程如下：

2.3 DeepSeek R1模型下载

ModelScope模型地址：https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

创建DeepSeek-R1-Distill-Qwen-7B文件夹，用于保存下载的模型权重：

mkdir ./DeepSeek-R1-Distill-Qwen-7B

即可使用如下命令进行模型下载：

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local_dir ./DeepSeek-R1-Distill-Qwen-7B

也有其他的下载方法，视使用情况选择。

下载后的模型的完整文件如下：

2.4 微调数据

DeepSeek R1及其蒸馏模型，推理过程的具体体现就是在回复内容中，会同时包含推理部分内容和最终回复部分内容，并且其推理部分内容会通过（一种在模型训练过程中注入的特殊标记）来进行区分。也就是说， DeepSeek R1模型组的回复格式是一种非常特殊的格式，即包含think部分内容，也包含 response部分内容。

因此，在围绕DeepSeek R1模型组进行微调的时候，微调数据集的回复部分文本也需要是包含推理和最终回复两部分内容，才能使得DeepSeek R1模型组在保持既定回复风格的同时，强化模型能力，反之则会导致指令消融问题（模型回复不再包含think部分）。

此时think部分和最终回复，共同构成有监督微调的标签。

这种同时包含思考和结果的数据集，在当下并不少见，例如非常著名的数学问答数据集NuminaMath CoT，就同时包含数学问题、问题的解题思路（也就是think部分）和问题最终的答案。而该数据集也是可以用于推理模型微调的数据集。除了NuminaMath CoT数据集外，还有APPs（编程数据集）、 TACO（编程数据集）、long_form_thought_data_5k （通用问答数据集）等，都是CoT数据集，均可用于推理模型微调。

若实际业务有需要，也可以构造类似结构的数据集。

本文选取的数据集是一个包含推理过程的医学数据集：由深圳大数据研究院发布的HuatuoGPT-o1模型的微调数据集—medical-o1-reasoning-SFT，地址：https://www.modelscope.cn/datasets/AI-ModelScope/medical-o1-reasoning-SFT。

为什么要选这个数据集：数学能力已经测试过了，想看看其在更难推理的场景下的能力

数据集总共包含25371条数据，均为医学领域疾病诊断数据集，且不乏一些疑难杂症的推理和判断，数据集整体质量较高，推理过程严谨准确，非常适合进行医疗领域模型微调，可以极大程度提高模型对于病理的推理过程，并在这个过程中完成一些医疗知识的灌注。

三、DeepSeek R1模型微调实操

3.1 unsloth LLama模型推理

需要借助unsloth进行模型推理

导入unsloth

from unsloth import FastLanguageModel

首先设置关键参数，并读取模型：

关键参数

max_seq_length = 2048 //这决定了模型的上下文长度。例如，Gemini 的上下文长度超过 100 万，而 Llama-3 的上下文长度为 8192。允许选择任意数字 - 但出于测试目的，建议将其设置为 2048。Unsloth 还支持非常长的上下文微调，表明可以提供比最佳长 4 倍的上下文长度。dtype = None //将其保持为 None，但可以为较新的 GPU 选择 torch.float16 或 torch.bfloat16。load_in_4bit = False //采用 4 位量化进行微调。这样可将内存使用量减少 4 倍，从而在空闲的 16GB 内存 GPU 中实际进行微调。4 位量化本质上将权重转换为一组有限的数字以减少内存使用量。这样做的缺点是准确度会下降 1-2%。如果想要获得如此微小的额外准确度，请在 H100 等较大的 GPU 上将其设置为 False

注，若显存不足，则可以load_in_4bit = True，运行4 bit量化版。

模型加载

model, tokenizer = FastLanguageModel.from_pretrained(model_name = "./DeepSeek-R1-Distill-Qwen-7B",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,

现在要自定义微调，可以编辑上面的数字，也可以忽略它，因为已经默认选择了相当合理的数字。目标是改变这些数字以提高准确率，同时也抵消过度拟合。过度拟合是指让语言模型记住数据集，而无法回答新颖的新问题。希望最终模型能够回答从未见过的问题，而不是进行记忆。

r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128

微调过程的等级。数值越大，占用的内存越多，速度越慢，但可以提高复杂任务的准确性。我们通常建议数值为 8（用于快速微调），最高可达 128。数值过大可能会导致过度拟合，从而损害模型的质量。

target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",],

选择所有模块进行微调。可以删除一些模块以减少内存使用量并加快训练速度，但强烈不建议这样做。只需在所有模块上进行训练！

lora_alpha = 16,

微调的缩放因子。较大的数字将使微调更多地了解您的数据集，但可能会导致过度拟合。建议将其设置为等于等级r，或将其加倍。

lora_dropout = 0, # Supports any, but = 0 is optimized

将其保留为 0 以加快训练速度！可以减少过度拟合，但效果不大。

bias = "none",# Supports any, but = "none" is optimized

将其保留为 0，以实现更快、更少的过度拟合训练！

use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context

选项包括True、False 和"unsloth"。建议使用"unsloth"，因为内存使用量减少了 30%，并支持极长的上下文微调。https://unsloth.ai/blog/long-context了解更多详细信息。

random_state = 3407,

确定确定性运行的次数。训练和微调需要随机数，因此设置此数字可使实验可重复。

use_rslora = False,# We support rank stabilized LoRA

高级功能可自动设置lora_alpha = 16。

loftq_config = None, # And LoftQ

高级功能可将 LoRA 矩阵初始化为权重的前 r 个奇异向量。可以在一定程度上提高准确度，但一开始会使内存使用量激增。

输出

==((====))==Unsloth 2025.2.12: Fast Qwen2 patching. Transformers: 4.48.3. \\ /|GPU: Tesla V100S-PCIE-32GB. Max memory: 31.739 GB. Platform: Linux.O^O/ \_/ \Torch: 2.6.0+cu124. CUDA: 7.0. CUDA Toolkit: 12.4. Triton: 3.2.0\/Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unslothUnsloth: Fast downloading is enabled - ignore downloading bars which are red colored!Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00,4.23s/it]./DeepSeek-R1-Distill-Qwen-7B does not have a padding token! Will use pad_token = <|vision_pad|>.

在INT4量化情况下，8B模型推理仅需7G左右显存。

此时model就是读取进来的DeepSeek R1 蒸馏模型：

模型信息

model

输出

Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 3584, padding_idx=151654)(layers): ModuleList((0-27): 28 x Qwen2DecoderLayer((self_attn): Qwen2Attention((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): LlamaRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False))

tokenizer是分词器：

分词器

tokenizer

输出

LlamaTokenizerFast(name_or_path='./DeepSeek-R1-Distill-Qwen-7B', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<｜begin▁of▁sentence｜>', 'eos_token': '<｜end▁of▁sentence｜>', 'pad_token': '<|vision_pad|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={151643: AddedToken("<｜end▁of▁sentence｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151644: AddedToken("<｜User｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151645: AddedToken("<｜Assistant｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151646: AddedToken("<｜begin▁of▁sentence｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151647: AddedToken("<|EOT|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151648: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151649: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),})

将模型调整为推理模式：

推理模式

FastLanguageModel.for_inference(model)

输出

Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 3584, padding_idx=151654)(layers): ModuleList((0-27): 28 x Qwen2DecoderLayer((self_attn): Qwen2Attention((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): LlamaRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False))

然后即可和模型进行对话：

问题字符串

question = "你是谁？"

然后这里首先需要借助分词器，将输入的问题转化为标记索引：

转化为标记索引

inputs = tokenizer([question], return_tensors="pt").to("cuda")

索引信息

inputs

输出

{'input_ids': tensor([[151646, 105043, 100165,11319]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1]], device='cuda:0')}

最后再带入inputs进行对话

回答问题

 outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,)

此时得到的回复也是词索引：

回复索引

outputs

tensor([[151646, 105043, 100165,..., 102454, 5373,99728]],device='cuda:0')

同样需要分词器将其转化为文本：

转化为文本

response = tokenizer.batch_decode(outputs)

回复文字

response

输出

['<｜begin▁of▁sentence｜>你是谁？我需要帮助你完成这个任务。8年前，你是一个刚进入职场的大学生，正在寻找工作。现在，你已经找到了一份工作，但可能需要进一步发展。你已经知道你的工作职责，但可能需要了解如何更好地完成任务。你已经知道如何处理日常事务，但可能需要学习如何更高效地完成任务。你已经知道如何与同事和客户沟通，但可能需要学习如何更好地管理时间。你已经知道如何处理工作中的问题，但可能需要学习如何处理压力。你已经知道如何制定工作计划，但可能需要学习如何调整计划以适应变化。你已经知道如何使用办公软件，但可能需要学习如何更好地利用这些工具。你已经知道如何处理紧急情况，但可能需要学习如何更好地预防和化解潜在风险。你已经知道如何进行沟通，但可能需要学习如何更有效地传达信息。你已经知道如何建立关系，但可能需要学习如何更好地维护和扩展这些关系。你已经知道如何完成任务，但可能需要学习如何更高效地完成任务。你已经知道如何处理压力，但可能需要学习如何更好地管理压力。你已经知道如何制定计划，但可能需要学习如何调整计划以适应变化。你已经知道如何使用工具，但可能需要学习如何更好地利用这些工具。你已经知道如何处理紧急情况，但可能需要学习如何更好地预防和化解潜在风险。你已经知道如何进行沟通，但可能需要学习如何更有效地传达信息。你已经知道如何建立关系，但可能需要学习如何更好地维护和扩展这些关系。\n\n好吧，现在我需要帮助用户完成这个任务。首先，我需要理解用户的需求。用户希望从一个刚入职的大学生的角度，逐步提升自己的职场技能，包括效率、时间管理、沟通、关系维护、压力管理、计划调整、工具使用、紧急情况处理和信息传达等。\n\n接下来，我需要考虑用户可能的身份。他可能是一个正在找工作或刚进入职场的大学生，对职场生活充满好奇和挑战。他可能对如何有效地完成工作感到困惑，或者想进一步提升自己的职业能力。\n\n用户可能没有明确说出的深层需求包括：他可能希望找到一个系统化的学习路径，帮助他逐步成长；他可能需要具体的建议或资源，比如培训课程、书籍、工具等；他可能希望了解如何平衡工作与生活，缓解压力，提升幸福感。\n\n因此，我应该提供一个结构化的计划，涵盖效率提升、时间管理、沟通技巧、压力管理、计划调整、工具使用、紧急情况处理、信息传达和关系维护等方面。同时，建议他利用学习资源和实践来持续成长。\n\n最后，我需要用鼓励和支持的语气，帮助他建立信心，相信自己能够通过努力实现职业目标。\n</think>\n\n好的，我将按照您的要求，帮助您逐步提升职场技能。以下是一个结构化的学习计划，涵盖您提到的各个方面：\n\n### 1. 效率提升\n- **学习工具使用**：掌握常用办公软件（如Excel、Word、Teams等）的高级功能，如自动化工具（Power Automate）、模板、快捷键等。\n- **时间管理**：\n- 使用时间管理方法（如番茄工作法、GTD）来提高工作效率。\n- 学习日计划表和周计划表的制作与使用，确保任务按计划完成。\n- 避免多任务处理，专注于一项任务直到完成。\n\n### 2. 时间管理\n- **设定优先级**：学会使用ABC分析法，确定任务的优先级。\n- **建立缓冲时间**：避免过度压缩时间，留出缓冲时间以应对突发情况。\n- **定期复盘**：每周复盘一周的工作，分析效率提升的空间。\n\n### 3. 沟通技巧\n- **有效沟通**：学习如何清晰、简洁地表达自己的观点，倾听他人的意见。\n- **非语言沟通**：观察和学习领导和同事的非语言沟通方式，如眼神交流、肢体语言等。\n- **团队协作**：参与团队项目，锻炼合作能力，学会妥协与协作。\n\n### 4. 压力管理\n- **压力识别**：学会识别压力源，并学会区分短期和长期的压力。\n- **放松技巧**：学习深呼吸、冥想等放松技巧，以应对压力。\n- **寻求支持**：建立支持网络，如朋友、家人或同事，分享压力和困难。\n\n### 5. 制计划与调整\n- **灵活计划**：学习根据实际情况调整计划的能力。\n- **定期评估计划**：每周评估计划的执行情况，及时调整。\n- **快速调整**：遇到变化时，快速调整计划以适应新情况。\n\n### 6. 工具使用\n- **自动化工具**：学习使用Power Automate、 Zapier等工具自动化工作流程。\n- **学习新工具**：根据工作需求，学习和掌握新工具，如Jira、Trello等项目管理工具。\n- **工具比较**：比较不同工具的优缺点，选择最适合自己的工具。\n\n### 7. 紧急情况处理\n- **应急预案**：制定并学习如何处理紧急情况的预案。\n- **快速反应**：练习如何在紧急情况下快速反应，解决问题。\n- **学习经验**：总结处理紧急情况的经验教训，提升应对能力。\n\n### 8. 信息传达\n- **清晰表达**：学习如何用简洁、有力的语言传达关键信息。\n- **视觉沟通**：利用图表、PPT等视觉工具，更有效地传达信息。\n- **倾听反馈**：学习如何从同事和客户那里获取反馈，改进信息传达方式。\n\n### 9. 关系维护\n- **建立联系**：主动与同事、领导']

格式化输出一下：

格式化

print(response[0])

输出

<｜begin▁of▁sentence｜>你是谁？我需要帮助你完成这个任务。8年前，你是一个刚进入职场的大学生，正在寻找工作。现在，你已经找到了一份工作，但可能需要进一步发展。你已经知道你的工作职责，但可能需要了解如何更好地完成任务。你已经知道如何处理日常事务，但可能需要学习如何更高效地完成任务。你已经知道如何与同事和客户沟通，但可能需要学习如何更好地管理时间。你已经知道如何处理工作中的问题，但可能需要学习如何处理压力。你已经知道如何制定工作计划，但可能需要学习如何调整计划以适应变化。你已经知道如何使用办公软件，但可能需要学习如何更好地利用这些工具。你已经知道如何处理紧急情况，但可能需要学习如何更好地预防和化解潜在风险。你已经知道如何进行沟通，但可能需要学习如何更有效地传达信息。你已经知道如何建立关系，但可能需要学习如何更好地维护和扩展这些关系。你已经知道如何完成任务，但可能需要学习如何更高效地完成任务。你已经知道如何处理压力，但可能需要学习如何更好地管理压力。你已经知道如何制定计划，但可能需要学习如何调整计划以适应变化。你已经知道如何使用工具，但可能需要学习如何更好地利用这些工具。你已经知道如何处理紧急情况，但可能需要学习如何更好地预防和化解潜在风险。你已经知道如何进行沟通，但可能需要学习如何更有效地传达信息。你已经知道如何建立关系，但可能需要学习如何更好地维护和扩展这些关系。
好吧，现在我需要帮助用户完成这个任务。首先，我需要理解用户的需求。用户希望从一个刚入职的大学生的角度，逐步提升自己的职场技能，包括效率、时间管理、沟通、关系维护、压力管理、计划调整、工具使用、紧急情况处理和信息传达等。
接下来，我需要考虑用户可能的身份。他可能是一个正在找工作或刚进入职场的大学生，对职场生活充满好奇和挑战。他可能对如何有效地完成工作感到困惑，或者想进一步提升自己的职业能力。
用户可能没有明确说出的深层需求包括：他可能希望找到一个系统化的学习路径，帮助他逐步成长；他可能需要具体的建议或资源，比如培训课程、书籍、工具等；他可能希望了解如何平衡工作与生活，缓解压力，提升幸福感。
因此，我应该提供一个结构化的计划，涵盖效率提升、时间管理、沟通技巧、压力管理、计划调整、工具使用、紧急情况处理、信息传达和关系维护等方面。同时，建议他利用学习资源和实践来持续成长。
最后，我需要用鼓励和支持的语气，帮助他建立信心，相信自己能够通过努力实现职业目标。</think>
好的，我将按照您的要求，帮助您逐步提升职场技能。以下是一个结构化的学习计划，涵盖您提到的各个方面：
### 1. 效率提升- **学习工具使用**：掌握常用办公软件（如Excel、Word、Teams等）的高级功能，如自动化工具（Power Automate）、模板、快捷键等。- **时间管理**：- 使用时间管理方法（如番茄工作法、GTD）来提高工作效率。- 学习日计划表和周计划表的制作与使用，确保任务按计划完成。- 避免多任务处理，专注于一项任务直到完成。
### 2. 时间管理- **设定优先级**：学会使用ABC分析法，确定任务的优先级。- **建立缓冲时间**：避免过度压缩时间，留出缓冲时间以应对突发情况。- **定期复盘**：每周复盘一周的工作，分析效率提升的空间。
### 3. 沟通技巧- **有效沟通**：学习如何清晰、简洁地表达自己的观点，倾听他人的意见。- **非语言沟通**：观察和学习领导和同事的非语言沟通方式，如眼神交流、肢体语言等。- **团队协作**：参与团队项目，锻炼合作能力，学会妥协与协作。
### 4. 压力管理- **压力识别**：学会识别压力源，并学会区分短期和长期的压力。- **放松技巧**：学习深呼吸、冥想等放松技巧，以应对压力。- **寻求支持**：建立支持网络，如朋友、家人或同事，分享压力和困难。
### 5. 制计划与调整- **灵活计划**：学习根据实际情况调整计划的能力。- **定期评估计划**：每周评估计划的执行情况，及时调整。- **快速调整**：遇到变化时，快速调整计划以适应新情况。
### 6. 工具使用- **自动化工具**：学习使用Power Automate、 Zapier等工具自动化工作流程。- **学习新工具**：根据工作需求，学习和掌握新工具，如Jira、Trello等项目管理工具。- **工具比较**：比较不同工具的优缺点，选择最适合自己的工具。
### 7. 紧急情况处理- **应急预案**：制定并学习如何处理紧急情况的预案。- **快速反应**：练习如何在紧急情况下快速反应，解决问题。- **学习经验**：总结处理紧急情况的经验教训，提升应对能力。
### 8. 信息传达- **清晰表达**：学习如何用简洁、有力的语言传达关键信息。- **视觉沟通**：利用图表、PPT等视觉工具，更有效地传达信息。- **倾听反馈**：学习如何从同事和客户那里获取反馈，改进信息传达方式。
### 9. 关系维护- **建立联系**：主动与同事、领导

带入问答模板进行回答

结构化输入方法

提示词

prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。
### Instruction:你是一名助人为乐的助手。
### Question:{}
### Response:<think>{}"""

问题

question = "你好，好久不见！"

格式化输入

[prompt_style_chat.format(question, "")]

输出

['请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好，好久不见！\n\n### Response:\n<think>']

分词

inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")

回复索引

outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,)

回复文本

response = tokenizer.batch_decode(outputs)

response

输出

['<｜begin▁of▁sentence｜>请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好，好久不见！\n\n### Response:\n<think>\n嗯，用户发来“你好，好久不见！”这句话，看起来像是一种友好的问候，带有亲切感。首先，我需要分析用户的意图，可能是想打招呼或者继续之前的对话。我应该回应得友好且温暖，同时保持专业性。\n\n考虑到用户可能是想建立联系，我应该用一种既亲切又正式的方式回应。比如，使用“你好！很高兴见到你！今天过得怎么样？”这样的回复既表达了问候，又询问了近况，有助于继续对话。\n\n另外，我需要确保语言简洁明了，避免使用复杂的词汇，让用户感觉轻松愉快。同时，保持语气友好，让用户感到被重视和欢迎。\n\n最后，检查一下回复是否符合所有要求，比如是否恰当、是否符合角色设定，以及是否能够有效传达信息。确认无误后，就可以发送这个回复了。\n</think>\n\n你好！很高兴见到你！今天过得怎么样？<｜end▁of▁sentence｜>']

格式化

print(response[0].split("### Response:")[1])

格式化回复

<think>嗯，用户发来“你好，好久不见！”这句话，看起来像是一种友好的问候，带有亲切感。首先，我需要分析用户的意图，可能是想打招呼或者继续之前的对话。我应该回应得友好且温暖，同时保持专业性。
考虑到用户可能是想建立联系，我应该用一种既亲切又正式的方式回应。比如，使用“你好！很高兴见到你！今天过得怎么样？”这样的回复既表达了问候，又询问了近况，有助于继续对话。
另外，我需要确保语言简洁明了，避免使用复杂的词汇，让用户感觉轻松愉快。同时，保持语气友好，让用户感到被重视和欢迎。
最后，检查一下回复是否符合所有要求，比如是否恰当、是否符合角色设定，以及是否能够有效传达信息。确认无误后，就可以发送这个回复了。</think>
你好！很高兴见到你！今天过得怎么样？<｜end▁of▁sentence｜>

3.2 初始模型问答测试

3.2.1 问答模版设置

为测试模型微调的能力，选取了医疗相关的数据集进行微调。

先简单测试下未经微调的模型能力，设置问答模板

提示词

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 
### Question:{}
### Response:<think>{}"""

翻译如下：

提示词翻译

prompt_style = """以下是一个任务说明，配有提供更多背景信息的输入。请写出一个恰当的回答来完成该任务。在回答之前，请仔细思考问题，并按步骤进行推理，确保回答逻辑清晰且准确。
### Instruction:您是一位具有高级临床推理、诊断和治疗规划知识的医学专家。请回答以下医学问题。
问题：
回复：
"""

3.2.2 问答测试

接下来抽取部分medical-o1-reasoning-SFT数据集中问题进行提问，并查看初始状态下模型回答结果。

问题1

question_1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"

翻译

question_1 ：一位61岁的女性，有长期在咳嗽或打喷嚏等活动中发生不自主尿液流失的病史，但夜间没有漏尿。她接受了妇科检查和Q-tip测试。根据这些检查结果，膀胱测量（cystometry）最可能会显示她的残余尿量和逼尿肌收缩情况如何？

问题2

question_2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"

翻译

question_2 ：面对一位突发胸痛并放射至颈部和左臂的患者，其既往病史包括高胆固醇血症和冠状动脉疾病，同时伴有升高的肌钙蛋白I水平和心动过速，根据这些临床表现，最可能受累的冠状动脉是哪一条？

问答测试

问题1测试

inputs1 = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")

outputs1 = model.generate(input_ids=inputs1.input_ids,max_new_tokens=1200,use_cache=True,)response1 = tokenizer.batch_decode(outputs1)
print(response1[0].split("### Response:")[1])

<think>Alright, I'm trying to figure out what the cystometry would show for this 61-year-old woman. She's having involuntary urine loss during activities like coughing or sneezing but not at night. So, she's experiencing urgency, right? That's when you don't hold your pee because something upsets you.
She went through a gynecological exam and a Q-tip test. I remember that a Q-tip test is used to check for a urine stream. If she's getting a stream, the Q-tip would be inserted and they'd have to pull it back out, which is a sign of an orinary reflex. So, the fact that she had a Q-tip test suggests that they noticed this involuntary loss.
Now, the question is about what a cystometry would show. Cystometry measures the residual volume in the bladder and also assesses the detrusor muscle contractions. The detrusor is the muscle that helps push urine out when you squeeze. If the detrusor isn't contracting properly, the bladder might not empty, leaving residual urine.
Since she's losing urine during activities that cause urgency, like coughing or sneezing, that's probably because the detrusor isn't contracting strong enough. When the detrusor doesn't contract, the bladder can't empty completely, so there's residual volume. That would mean during cystometry, they'd see a low residual volume because the bladder isn't holding much. Also, the detrusor contractions would be weak, indicating that the muscle isn't working as it should to push the urine out.
I think residual volume is low here because she's continuously losing urine, so not much is left in the bladder. The detrusor contractions being weak would support that idea. So, putting it all together, the cystometry would show a low residual volume and weak detrusor contractions.</think>
The 61-year-old woman presents with involuntary urine loss during activities like coughing or sneezing, indicative of urgency. The Q-tip test confirmed this by showing a positive orinary reflex. Upon cystometry, the findings would reveal:
1. **Low Residual Volume**: The bladder likely holds minimal urine due to the continuous loss during activities, suggesting incomplete emptying.
2. **Weak Detrusor Contractions**: The detrusor muscle's contractions are inadequate, contributing to the inability to expel urine completely, thus maintaining residual urine in the bladder.
These observations point to a functional incontinence issue, likely due to insufficient detrusor activity.<｜end▁of▁sentence｜>

回复翻译

好的，我试着弄清楚这位61岁女性的膀胱压力测定会显示什么。她在咳嗽或打喷嚏等活动中出现不自主的尿液流失，但晚上没有。所以，她经历了尿急，对吧？就是当你因为某些事情而无法控制排尿的时候。
她进行了妇科检查和Q-tip测试。我记得Q-tip测试是用来检查尿流的。如果有尿流，Q-tip就会被插入，然后需要拉出来，这表明存在排尿反射。所以，她接受Q-tip测试这一事实表明医生注意到了这种不自主的流失。
现在，问题是膀胱压力测定会显示什么。膀胱压力测定测量膀胱内的残余尿量，并评估逼尿肌的收缩。逼尿肌是帮助排尿的肌肉。当逼尿肌收缩不当时，膀胱可能无法排空，留下残余尿液。
因为她在咳嗽或打喷嚏等引起尿急的活动中流失尿液，这可能是因为逼尿肌收缩不够有力。当逼尿肌不收缩时，膀胱无法完全排空，因此会有残余尿量。这意味着在膀胱压力测定中，他们会看到低残余尿量，因为膀胱中保持的尿液不多。此外，逼尿肌的收缩会很弱，表明该肌肉未能有效地排尿。
我认为这里的残余尿量低是因为她持续流失尿液，所以膀胱里剩下的不多。逼尿肌收缩弱也支持了这一观点。因此，综合来看，膀胱压力测定会显示低残余尿量和逼尿肌收缩弱。
这位61岁的女性在咳嗽或打喷嚏等活动中出现不自主的尿液流失，表明有尿急。Q-tip测试通过显示阳性的排尿反射确认了这一点。在膀胱压力测定中，结果将显示：
低残余尿量：由于活动期间的持续流失，膀胱内可能保持的尿液极少，提示排空不完全。
逼尿肌收缩弱：逼尿肌的收缩不足，导致无法完全排出尿液，从而使膀胱内保持残余尿液。
这些观察结果指向功能性尿失禁的问题，可能是由于逼尿肌活动不足所致。

问题2测试

inputs2 = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")
outputs2 = model.generate(input_ids=inputs2.input_ids,max_new_tokens=1200,use_cache=True,)
response2 = tokenizer.batch_decode(outputs2)print(response2[0].split("### Response:")[1])

<think>Okay, so I'm trying to figure out which coronary artery is involved in this patient's chest pain. Let me start by breaking down the information given.
The patient has sudden-onset chest pain that goes to the neck and left arm. That makes me think about the possible locations of the coronary artery on the left side. The left coronary artery is typically on the left side of the neck, so pain radiating to the left arm could be coming from that area. But I shouldn't jump to conclusions yet.
They have a past medical history of hypercholesterolemia and coronary artery disease. Hypercholesterolemia is a risk factor for heart disease, so it's something to consider, but not directly indicative of which artery is affected. The key here is the tachycardia, which is an elevated heart rate. Tachycardia can be a clue for several issues, including heart valve problems or arrhythmias, but in the context of chest pain, it's often linked to coronary artery issues.
Elevated troponin I levels are a big hint. Troponin I is a marker of heart muscle injury. Elevated levels suggest an acute myocardial infarction or a significant ischemia. Myocardial infarction typically affects one of the coronary arteries, causing tissue damage.
Now, considering the presentation: the pain is radiating to the left arm. The left arm is typically served by the left anterior descending (LAD) and left circumflex (LCx) coronary arteries. The LAD supplies the left side of the chest, including the left arm, while the LCx supplies the upper arm and neck.
The tachycardia could be due to the heart muscle being ischemic, which can cause the heart to beat faster as it tries to pump blood. This is more common in LAD disease because the LAD is responsible for the left side of the heart, which includes the left ventricle and part of the left atrium. If the LAD is blocked, the left ventricle might not get enough blood, leading to a rapid heart rate.
On the other hand, LCx disease affects the right upper arm and the neck. If that's the case, the pain might radiate to the right side of the neck or the right arm, not the left. But the patient's pain is specifically on the left side, so that makes me lean towards LAD being the culprit.
I should also consider the possible causes of LAD disease. It's often due to atherosclerosis, which is more common in patients with a history of hypercholesterolemia. Since the patient has a history of coronary artery disease, this adds to the suspicion that it's LAD.
So putting it all together: the presentation points to the left arm, the elevated troponin suggests myocardial injury, and the tachycardia supports ischemia. All these factors together make the most likely coronary artery affected the left anterior descending (LAD).</think>
The most likely coronary artery affected in this patient is the **left anterior descending (LAD)**. 
**Step-by-Step Explanation:**
1. **Presentation of Pain:** The patient experiences chest pain radiating to the left arm. This suggests involvement of the left coronary artery, as the LAD supplies blood to the left side of the chest and the left arm.
2. **Elevated Troponin I Levels:** Elevated troponin I indicates myocardial injury, commonly associated with a myocardial infarction or significant coronary artery disease.
3. **Tachycardia:** Increased heart rate is often due to ischemia, which can occur when blood flow to the heart muscle is reduced, as happens with coronary artery disease.
4. **Coronary Artery Disease and History:** The patient's past history of coronary artery disease and hypercholesterolemia (a risk factor for cardiovascular disease) supports the suspicion of ischemia in one of the coronary arteries.
5. **Coronary Artery Locations:** The LAD is responsible for the left side of the heart, including the left ventricle and part of the left atrium. Blockage of this artery would cause ischemia and tachycardia, aligning with the patient's symptoms.
**Conclusion:** Based on the presentation, elevation of troponin I, and tachycardia, the most likely coronary artery affected is the left anterior descending (LAD).<｜end▁of▁sentence｜>

回复翻译

好的，我在尝试弄清楚这位患者的胸痛涉及哪个冠状动脉。让我从给定的信息入手，逐步分析。
患者出现了突然发作的胸痛，疼痛放射到颈部和左臂。这让我考虑到可能涉及左侧的冠状动脉。左冠状动脉通常位于左侧颈部，因此放射到左臂的疼痛可能来自该区域。但我不应该急于下结论。
患者有高胆固醇血症和冠状动脉疾病的病史。高胆固醇血症是心脏病的危险因素，因此需要考虑，但并不能直接指示受累的冠状动脉。关键在于心动过速，即心率升高。心动过速可能是多种问题的线索，包括心脏瓣膜问题或心律失常，但在胸痛的背景下，通常与冠状动脉问题有关。
升高的肌钙蛋白I水平是一个重要线索。肌钙蛋白I是心肌损伤的标志物，升高的水平提示急性心肌梗死或显著的缺血。心肌梗死通常影响某条冠状动脉，导致组织损伤。
现在考虑表现：疼痛放射到左臂。左臂通常由左前降支（LAD）和左回旋支（LCx）供血。LAD为胸部左侧供血，包括左臂，而LCx则供给上臂和颈部。
心动过速可能是由于心肌缺血引起的，这可能导致心脏在试图泵血时加速跳动。这在LAD疾病中更为常见，因为LAD负责左侧心脏的供血，包括左心室和部分左心房。如果LAD阻塞，左心室可能得不到足够血液，导致心率加快。
另一方面，LCx疾病则会影响右上臂和颈部。如果是这种情况，疼痛可能放射到右侧颈部或右臂，而不是左侧。但患者的疼痛明确在左侧，因此我倾向于认为是LAD出现问题。
我还应该考虑LAD疾病的可能原因。这通常与动脉粥样硬化有关，而患有高胆固醇血症历史的患者更容易出现这种情况。由于患者有冠状动脉疾病的病史，这进一步增加了对LAD的怀疑。
所以综合考虑：疼痛指向左臂，肌钙蛋白升高提示心肌损伤，心动过速支持缺血。这些因素结合在一起，使得最可能受累的冠状动脉是左前降支（LAD）。
最有可能受累的冠状动脉是 左前降支（LAD）。
逐步解释：
疼痛表现：患者感到胸痛放射到左臂。这提示可能涉及左冠状动脉，因为LAD为胸部左侧和左臂供血。
升高的肌钙蛋白I水平：升高的肌钙蛋白I指示心肌损伤，通常与心肌梗死或显著的冠状动脉疾病有关。
心动过速：心率升高通常是由于缺血，当血流减少时，会发生这种情况，就像冠状动脉疾病中一样。
冠状动脉疾病及病史：患者的冠状动脉疾病和高胆固醇血症的病史（心血管疾病的危险因素）支持对其中一条冠状动脉缺血的怀疑。
冠状动脉位置：LAD负责左心室和部分左心房的供血。该动脉的阻塞将导致缺血和心动过速，与患者的症状相符。
结论：根据疼痛表现、肌钙蛋白I升高和心动过速，最有可能受累的冠状动脉是左前降支（LAD）。

3.2.3 问答测试结果

标准答案

问题1标准答案

在这种压力性尿失禁的情况下，膀胱测压检查（cystometry）最可能显示正常的排尿后残余尿量，因为压力性尿失禁通常不会影响膀胱排空功能。此外，由于压力性尿失禁主要与身体用力有关，而不是膀胱过度活动症（OAB），因此在测试过程中不太可能观察到逼尿肌的非自主收缩。

问题2标准答案

根据患者表现出的突然胸痛并放射至颈部和左臂，结合其有高胆固醇血症和冠状动脉疾病的病史，肌钙蛋白升高和心动过速，临床症状强烈提示左前降支（LAD）动脉受累。该动脉通常是引发此类症状的罪魁祸首，因为它供应了心脏的大部分区域。放射性疼痛和肌钙蛋白升高的组合表明心肌受损，这使得LAD成为最可能的致病动脉。然而，在没有进一步的诊断检查（如心电图）的情况下，最终的确诊仍需等待确认。

能够看出，在原始状态下，模型能够进行推理并给出回复，但实际上第一个回答过程并不符合医学规范，而第二个问题则直接回答错误。由此可见，在初始状态下，模型对于medical-o1-reasoning-SFT数据集问答效果并不好。

接下来尝试进行微调，并测试微调后模型问答效果。

3.3 最小可行性实验

接下来尝试进行模型微调，对于当前数据集而言，可以带入原始数据集的部分数据进行微调，也可以带入全部数据并遍历多次进行微调。对于大多数的微调实验，可以从最小可行性实验入手进行微调，也就是先尝试带入少量数据进行微调，并观测微调效果。若微调可以顺利执行，并能够获得微调效果，再考虑带入更多的数据进行更大规模微调。

3.3.1 数据集准备

这里直接从modelscope上下载medical-o1-reasoning-SFT数据集。

如果有报错，按提示解决，如果还不行可以取hugging face的数据集，或者换其他数据集、或者将数据集下载到本地再读取。本人尝试了好多，有些数据集能成功，有些不行，各种莫名奇妙的报错，按报错提示改动即可。

数据集下载

from modelscope.msdatasets import MsDatasetds =MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT')

查看

ds[0]

输出

{'Question': 'A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?', 'Complex_CoT': "Okay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.", 'Response': 'Cystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.'}

在最小可行性实验中，可以下载500条数据进行微调即可看出效果：

划分数据集

from modelscope.msdatasets import MsDatasetds =MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT', split = "train[0:500]")

3.3.2 文本进行结构化处理

提示词模版

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 
### Question:{}
### Response:<think>{}</think>{}"""

然后提取并设置文本生成结束的标记：

文本生成结束标记

EOS_TOKEN = tokenizer.eos_tokentokenizer.eos_token

输出

'<｜end▁of▁sentence｜>'

然后定义函数，用于对medical-o1-reasoning-SFT数据集进行修改，Complex_CoT列和Response列进行拼接，并加上文本结束标记：

格式化函数

def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}

将数据集整理为如下形式：

数据格式化

ds = ds.map(formatting_prompts_func, batched = True,)

产看格式化后的数据

ds["text"][0]

输出

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.\n</think>\nCystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.<｜end▁of▁sentence｜>"

3.3.3 开始微调

开启微调，把模型设置为微调模式：

微调模式

model = FastLanguageModel.get_peft_model(model,r=16,target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,bias="none",use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,loftq_config=None,)

输出

Unsloth 2025.2.12 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.

然后导入相关的库：

导库

from trl import SFTTrainerfrom transformers import TrainingArgumentsfrom unsloth import is_bfloat16_supported

创建微调对象：

微调对象

trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=ds,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_step=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),)

通常不建议更改上述参数：

per_device_train_batch_size = 2,

如果想更多地利用 GPU 的内存，增加批处理大小。同时增加批处理大小可以使训练更加流畅，并使过程不会过度拟合。通常不建议这样做，因为这可能会因填充问题而使训练速度变慢。通常会增加批处理大小，gradient_accumulation_steps这只会对数据集进行更多遍历。

gradient_accumulation_steps = 4,

相当于将批量大小增加到自身之上，但不会影响内存消耗，如果想要更平滑的训练损失曲线，通常建议增加这个值。

max_steps = 60, # num_train_epochs = 1,

将步骤设置为 60 以加快训练速度。对于可能需要数小时的完整训练运行，请注释掉max_steps，并将其替换为num_train_epochs = 1。将其设置为 1 表示对数据集进行 1 次完整传递。通常建议传递 1 到 3 次，不要更多，否则微调会过度拟合。

learning_rate = 2e-4,

如果想让微调过程变慢，但同时又最有可能收敛到更高的准确度结果，降低学习率。通常建议尝试 2e-4、1e-4、5e-5、2e-5 作为数字。

输出

Applying chat template to train dataset (num_proc=2): 100%|██████████| 500/500 [00:01<00:00, 348.86 examples/s]Tokenizing train dataset (num_proc=2): 100%|██████████| 500/500 [00:02<00:00, 221.20 examples/s]Tokenizing train dataset (num_proc=2): 100%|██████████| 500/500 [00:00<00:00, 920.96 examples/s] Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

开始微调

trainer_stats = trainer.train()

这是训练损失，如果目标是设置参数，使其尽可能接近 0.5，如果微调未达到 1、0.8 或 0.5，可能需要调整一些数字。如果损失为 0，这可能也不是一个好兆头。

注意，unsloth在微调结束后，会自动更新模型权重（在缓存中），因此无需手动合并模型权重即可直接调用微调后的模型：

FastLanguageModel.for_inference(model)

输出

PeftModelForCausalLM((base_model): LoraModel((model): Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 3584, padding_idx=151654)(layers): ModuleList((0-27): 28 x Qwen2DecoderLayer((self_attn): Qwen2Attention((q_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=3584, bias=True)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=3584, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(k_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=512, bias=True)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=512, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(v_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=512, bias=True)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=512, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(o_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=3584, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=3584, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(rotary_emb): LlamaRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=18944, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=18944, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(up_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=18944, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=18944, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(down_proj): lora.Linear((base_layer): Linear(in_features=18944, out_features=3584, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=18944, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=3584, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False))))

验证之前的问题：

问题验证

inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1200,use_cache=True,)response = tokenizer.batch_decode(outputs)

<think>Alright, so we have a 61-year-old woman here, and she's been dealing with this weird issue where she loses urine during activities like coughing or sneezing. That's interesting. Now, she doesn't have leakage at night, which is a good sign, but it's definitely happening during these activities. Hmm, I'm thinking this might be some kind of urinary incontinence, but not the usual type. Let's think about what could be causing this.
Okay, during coughing or sneezing, the bladder gets pressed. This pressure can push urine out, even if the leakage at night isn't happening. So, this sounds like it might be a case of interstitial incontinence, where the bladder empties during certain activities, but doesn't necessarily have a leakage at night.
Now, let's consider the gynecological exam and the Q-tip test. These are typically used to check for urethral strictures. A urethral stenosis, or a narrowing, can certainly cause urinary incontinence, especially during activities that press the bladder. The Q-tip test is a useful tool here to determine if there's a urethral obstruction.
If there's a urethral stenosis, it would definitely make sense that the Q-tip test might show a positive result, indicating some kind of obstruction. This would explain why she's losing urine during activities but not at night.
Now, let's think about what cystometry would reveal. Cystometry is a technique used to measure the residual volume in the bladder, which is the amount of urine left after the bladder empties. If there's a urethral stenosis, you'd expect to see a significant residual volume because the bladder doesn't empty completely, even though it's pressed during activities.
And what about detrusor contractions? In this scenario, with urethral stenosis, the detrusor muscles, which are responsible for bladder contraction, might not be fully contracting. This would lead to the bladder not emptying completely during the activities that press it, hence the residual volume.
So, putting this all together, if cystometry is done, it's likely to show a significant residual volume in the bladder and possibly reduced detrusor contractions due to the urethral obstruction. This would align with the symptoms she's experiencing during coughing or sneezing.</think>Based on the symptoms and the findings from the gynecological exam and Q-tip test, cystometry would likely reveal a significant residual volume in the bladder and reduced detrusor contractions. The presence of a urethral stenosis, inferred from these findings, would explain the urinary incontinence during activities like coughing or sneezing, while the absence of leakage at night suggests that the bladder doesn't empty completely due to the obstruction. The significant residual volume and reduced detrusor contractions are consistent with this type of incontinence.<｜end▁of▁sentence｜>

能够发现，问题回答更加规范，但仍可能存在一定的回答错误。由此可以考虑继续进行大规模微调。

3.3.4 模型合并

此时本地保存的模型权重在outputs文件夹中：

然后可使用如下代码进行模型权重合并：

权重合并

new_model_local = "DeepSeek-R1-Medical"model.save_pretrained(new_model_local) tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

3.4 微调参数解析

SFTTrainer 进行监督微调（Supervised Fine-Tuning, SFT），适用于 transformers 和 Unsloth 生态中的模型微调：1. 相关库

●SFTTrainer（来自 trl 库）：

○trl（Transformer Reinforcement Learning）是 Hugging Face 旗下的 trl 库，提供监督微调（SFT）和强化学习（RLHF）相关的功能。

○SFTTrainer 主要用于有监督微调（Supervised Fine-Tuning），适用于 LoRA 等低秩适配微调方式。

●TrainingArguments（来自 transformers 库）：

○这个类用于定义训练超参数，比如批量大小、学习率、优化器、训练步数等。

is_bfloat16_supported（来自 unsloth）：

○这个函数检查当前 GPU 是否支持 bfloat16（BF16），如果支持，则返回 True，否则返回 False

○bfloat16 是一种更高效的数值格式，在新款 NVIDIA A100/H100 等GPU上表现更优。

2. 初始化 SFTTrainer

SFTTrainer 部分

TrainingArguments 部分

3.5 完整高效微调实验

接下来尝试带入全部数据进行高效微调，以提升模型微调效果。

完整代码：

from unsloth import FastLanguageModelfrom modelscope.msdatasets import MsDatasetfrom trl import SFTTrainerfrom transformers import TrainingArgumentsfrom unsloth import is_bfloat16_supported
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 
### Question:{}
### Response:<think>{}</think>{}"""
max_seq_length = 2048 dtype = None load_in_4bit = False
model, tokenizer = FastLanguageModel.from_pretrained(model_name = "./DeepSeek-R1-Distill-Qwen-7B",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,)
EOS_TOKEN = tokenizer.eos_token# Must add EOS_TOKEN
def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}
ds =MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT', split = "train")dataset = ds.map(formatting_prompts_func, batched = True,)dataset["text"][0]
model = FastLanguageModel.get_peft_model(model,r=16,target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,bias="none",use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,loftq_config=None,)
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,num_train_epochs = 3,warmup_steps=5,# max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),)
wandb.init()trainer_stats = trainer.train()
new_model_local = "DeepSeek-R1-Medical"model.save_pretrained(new_model_local) tokenizer.save_pretrained(new_model_local)model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

完全跑下来时间可能比较久：

10分钟构建能主动提问的智能导购

为助力商家全天候自动化满足顾客的购物需求，可通过百炼构建一个 Multi-Agent 架构的大模型应用实现智能导购助手。该系统能够主动询问顾客所需商品的具体参数，一旦收集齐备，便会自动从商品数据库中检索匹配的商品，并精准推荐给顾客。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费场景POC验证，效果验证后签署服务协议。零风险落地应用大模型，已交付160+中大型企业