深入掌握DeepSeek模型微调,快速提升问答系统性能。 核心内容: 1. 使用unsloth框架对DeepSeek R1 Distill 7B模型进行高效微调 2. COT数据集的创建方法及其在推理大模型微调中的应用 3. 通过实战案例,学习问答风格优化和知识灌注技巧
本文重点介绍使用微调框架unsloth,围绕DeepSeek R1 Distill 7B模型进行高效微调,并介绍用于推理大模型高效微调的COT数据集的创建和使用方法,并在一个medical-o1-reasoning-SFT数据集上完成高效微调实战,并最终达到问答风格优化&知识灌注目的。
亲手完成DeepSeek R1蒸馏模型的微调实战
1.1 微调与强化学习、模型蒸馏
2. 强化学习(Reinforcement Learning):
3. 模型蒸馏(Model Distillation):
●目标:通过教师模型的“知识转移” ,帮助学生模型提升性能,特别是计算能力有限的设备上。
1.2 大模型微调
与RAG(Retrieval-Augmented Generation)或Agent技术依靠构建复杂的工作流以优化模型性能不同,微调通过直接调整模型的参数来提升模型的能力。这种方法让模型通过在特定任务的数据上进行再训练,从而'永久'掌握该任务所需的技能。微调不仅可以显著提高模型在特定领域或任务上的表现,还能使其适应于各种具体应用场景的需求。这种能力的增强是通过更精细地调整模型内部的权重和偏差,使其在理解和生成信息时更加精准,因此被广泛用于需要高精度和领域适应性的任务中。
全量微调(Full Fine-Tuning)
高效微调(Efficient Fine-Tuning)
LoRA( Low-Rank Adaptation)微调是一种参数高效的微调方法,旨在通过引入低秩矩阵来减少微 调时需要调整的参数数量,从而显著降低显存和计算资源的消耗。具体来说,LoRA 微调并不直接调整原始模型的所有参数,而是通过在某些层中插入低秩的适配器(Adapter)层来进行训练。
●在标准微调中,会修改模型的所有权重,而在 LoRA 中,只有某些低秩矩阵(适配器)被训练和调整。这意味着原始模型的参数保持不变,只是通过少量的新参数来调整模型的输出。
●低秩矩阵的引入可以在显存和计算能力有限的情况下,依然有效地对大型预训练模型进行微调,从而让 LoRA 成为显存较小的设备上的理想选择。
1.显存优化: 只需要调整少量的参数(适配器),显著减少了显存需求,适合显存有限的GPU。
2.计算效率: 微调过程中的计算负担也更轻,因为减少了需要调整的参数量。
3.灵活性: 可以与现有的预训练模型轻松结合使用,适用于多种任务,如文本生成、分类、问答等。
而QLoRA(Quantized Low-Rank Adaptation) 则是 LoRA 的一个扩展版本,它结合了 LoRA 的低秩适配器和量化技术。QLoRA 进一步优化了计算效率和存储需求,特别是在极端显存受限的环境下。与 LoRA 不同的是, QLoRA 会将插入的低秩适配器层的部分权重进行量化(通常是量化为INT4或INT8),在保持性能的同时显著降低模型的存储和计算需求。
1.3 高效微调的应用场景
4.Agent能力(Function calling & MCP能力)提升:在多任务协作或功能调用场景中,高效微调能够显著提升模型Agent能力,使得模型能够有效地与其他系统进行交互、调用外部API或执行特定MCP任务。通过针对性微调,模型可以学会更精准的功能调用策略、参数解析和操作指令,从而在自动化服务、智能助手或机器人控制等领域表现得更加高效和智能。
二、 DeepSeek R1 Distill高效微调环境准备
2.1 unsloth安装
unsloth是推理、微调一体式框架,unsloth将Llama 3.3、Mistral、Phi-4、Qwen 2.5和Gemma的微调速度提高2倍,同时节省80%的内存。
官网地址:GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory
pip install unslothpip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.gi
2.2 wandb安装与注册
pip install wandb
2.3 DeepSeek R1模型下载
mkdir ./DeepSeek-R1-Distill-Qwen-7B
modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local_dir ./DeepSeek-R1-Distill-Qwen-7B
2.4 微调数据
这种同时包含思考和结果的数据集,在当下并不少见,例如非常著名的数学问答数据集NuminaMath CoT,就同时包含数学问题、问题的解题思路(也就是think部分)和问题最终的答案。而该数据集也是可以用于推理模型微调的数据集。除了NuminaMath CoT数据集外,还有APPs(编程数据集)、 TACO(编程数据集)、long_form_thought_data_5k (通用问答数据集)等,都是CoT数据集,均可用于推理模型微调。
三、DeepSeek R1模型微调实操
3.1 unsloth LLama模型推理
from unsloth import FastLanguageModel
max_seq_length = 2048 //这决定了模型的上下文长度。例如,Gemini 的上下文长度超过 100 万,而 Llama-3 的上下文长度为 8192。允许选择任意数字 - 但出于测试目的,建议将其设置为 2048。Unsloth 还支持非常长的上下文微调,表明可以提供比最佳长 4 倍的上下文长度。dtype = None //将其保持为 None,但可以为较新的 GPU 选择 torch.float16 或 torch.bfloat16。load_in_4bit = False //采用 4 位量化进行微调。这样可将内存使用量减少 4 倍,从而在空闲的 16GB 内存 GPU 中实际进行微调。4 位量化本质上将权重转换为一组有限的数字以减少内存使用量。这样做的缺点是准确度会下降 1-2%。如果想要获得如此微小的额外准确度,请在 H100 等较大的 GPU 上将其设置为 False
model, tokenizer = FastLanguageModel.from_pretrained(model_name = "./DeepSeek-R1-Distill-Qwen-7B",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
,或将其加倍。lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none",# Supports any, but = "none" is optimized
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
,因为内存使用量减少了 30%,并支持极长的上下文微调。https://unsloth.ai/blog/long-context了解更多详细信息。
random_state = 3407,
use_rslora = False,# We support rank stabilized LoRA
高级功能可自动设置lora_alpha = 16
loftq_config = None, # And LoftQ
==((====))==Unsloth 2025.2.12: Fast Qwen2 patching. Transformers: 4.48.3. \\ /|GPU: Tesla V100S-PCIE-32GB. Max memory: 31.739 GB. Platform: Linux.O^O/ \_/ \Torch: 2.6.0+cu124. CUDA: 7.0. CUDA Toolkit: 12.4. Triton: 3.2.0\/Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unslothUnsloth: Fast downloading is enabled - ignore downloading bars which are red colored!Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00,4.23s/it]./DeepSeek-R1-Distill-Qwen-7B does not have a padding token! Will use pad_token = <|vision_pad|>.
Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 3584, padding_idx=151654)(layers): ModuleList((0-27): 28 x Qwen2DecoderLayer((self_attn): Qwen2Attention((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): LlamaRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False))
LlamaTokenizerFast(name_or_path='./DeepSeek-R1-Distill-Qwen-7B', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<|begin▁of▁sentence|>', 'eos_token': '<|end▁of▁sentence|>', 'pad_token': '<|vision_pad|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={151643: AddedToken("<|end▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151644: AddedToken("<|User|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151645: AddedToken("<|Assistant|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151646: AddedToken("<|begin▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151647: AddedToken("<|EOT|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151648: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151649: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),})
Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 3584, padding_idx=151654)(layers): ModuleList((0-27): 28 x Qwen2DecoderLayer((self_attn): Qwen2Attention((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): LlamaRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False))
question = "你是谁?"
inputs = tokenizer([question], return_tensors="pt").to("cuda")
{'input_ids': tensor([[151646, 105043, 100165,11319]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1]], device='cuda:0')}
outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,)
tensor([[151646, 105043, 100165,..., 102454, 5373,99728]],device='cuda:0')
response = tokenizer.batch_decode(outputs)
['<|begin▁of▁sentence|>请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好,好久不见!\n\n### Response:\n<think>\n嗯,用户发来"你好,好久不见!"这句话,看起来像是一种友好的问候,带有亲切感。首先,我需要分析用户的意图,可能是想打招呼或者继续之前的对话。我应该回应得友好且温暖,同时保持专业性。\n\n考虑到用户可能是想建立联系,我应该用一种既亲切又正式的方式回应。比如,使用"你好!很高兴见到你!今天过得怎么样?"这样的回复既表达了问候,又询问了近况,有助于继续对话。\n\n另外,我需要确保语言简洁明了,避免使用复杂的词汇,让用户感觉轻松愉快。同时,保持语气友好,让用户感到被重视和欢迎。\n\n最后,检查一下回复是否符合所有要求,比如是否恰当、是否符合角色设定,以及是否能够有效传达信息。确认无误后,就可以发送这个回复了。\n</think>\n\n你好!很高兴见到你!今天过得怎么样?<|end▁of▁sentence|>']
prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。
### Instruction:
### Question:
### Response:
question = "你好,好久不见!"
[prompt_style_chat.format(question, "")]
['请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好,好久不见!\n\n### Response:\n<think>']
inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,)
response = tokenizer.batch_decode(outputs)
['<|begin▁of▁sentence|>请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好,好久不见!\n\n### Response:\n<think>\n嗯,用户发来“你好,好久不见!”这句话,看起来像是一种友好的问候,带有亲切感。首先,我需要分析用户的意图,可能是想打招呼或者继续之前的对话。我应该回应得友好且温暖,同时保持专业性。\n\n考虑到用户可能是想建立联系,我应该用一种既亲切又正式的方式回应。比如,使用“你好!很高兴见到你!今天过得怎么样?”这样的回复既表达了问候,又询问了近况,有助于继续对话。\n\n另外,我需要确保语言简洁明了,避免使用复杂的词汇,让用户感觉轻松愉快。同时,保持语气友好,让用户感到被重视和欢迎。\n\n最后,检查一下回复是否符合所有要求,比如是否恰当、是否符合角色设定,以及是否能够有效传达信息。确认无误后,就可以发送这个回复了。\n</think>\n\n你好!很高兴见到你!今天过得怎么样?<|end▁of▁sentence|>']
print(response[0].split("### Response:")[1])
3.2 初始模型问答测试
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
### Response:
prompt_style = """以下是一个任务说明,配有提供更多背景信息的输入。
### Instruction:
question_1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
question_1 :一位61岁的女性,有长期在咳嗽或打喷嚏等活动中发生不自主尿液流失的病史,但夜间没有漏尿。她接受了妇科检查和Q-tip测试。根据这些检查结果,膀胱测量(cystometry)最可能会显示她的残余尿量和逼尿肌收缩情况如何?
question_2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"
question_2 :面对一位突发胸痛并放射至颈部和左臂的患者,其既往病史包括高胆固醇血症和冠状动脉疾病,同时伴有升高的肌钙蛋白I水平和心动过速,根据这些临床表现,最可能受累的冠状动脉是哪一条?
inputs1 = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")
outputs1 = model.generate(
response1 = tokenizer.batch_decode(outputs1)
print(response1[0].split("### Response:")[1])
Alright, I'm trying to figure out what the cystometry would show for this 61-year-old woman. She's having involuntary urine loss during activities like coughing or sneezing but not at night. So, she's experiencing urgency, right? That's when you don't hold your pee because something upsets you.
She went through a gynecological exam and a Q-tip test. I remember that a Q-tip test is used to check for a urine stream. If she's getting a stream, the Q-tip would be inserted and they'd have to pull it back out, which is a sign of an orinary reflex. So, the fact that she had a Q-tip test suggests that they noticed this involuntary loss.
Now, the question is about what a cystometry would show. Cystometry measures the residual volume in the bladder and also assesses the detrusor muscle contractions. The detrusor is the muscle that helps push urine out when you squeeze. If the detrusor isn't contracting properly, the bladder might not empty, leaving residual urine.
Since she's losing urine during activities that cause urgency, like coughing or sneezing, that's probably because the detrusor isn't contracting strong enough. When the detrusor doesn't contract, the bladder can't empty completely, so there's residual volume. That would mean during cystometry, they'd see a low residual volume because the bladder isn't holding much. Also, the detrusor contractions would be weak, indicating that the muscle isn't working as it should to push the urine out.
I think residual volume is low here because she's continuously losing urine, so not much is left in the bladder. The detrusor contractions being weak would support that idea. So, putting it all together, the cystometry would show a low residual volume and weak detrusor contractions.
The 61-year-old woman presents with involuntary urine loss during activities like coughing or sneezing, indicative of urgency. The Q-tip test confirmed this by showing a positive orinary reflex. Upon cystometry, the findings would reveal:
1. **Low Residual Volume**: The bladder likely holds minimal urine due to the continuous loss during activities, suggesting incomplete emptying.
2. **Weak Detrusor Contractions**: The detrusor muscle's contractions are inadequate, contributing to the inability to expel urine completely, thus maintaining residual urine in the bladder.
These observations point to a functional incontinence issue, likely due to insufficient detrusor activity.<|end▁of▁sentence|>
inputs2 = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")
outputs2 = model.generate(
response2 = tokenizer.batch_decode(outputs2)
print(response2[0].split("### Response:")[1])
Okay, so I'm trying to figure out which coronary artery is involved in this patient's chest pain. Let me start by breaking down the information given.
The patient has sudden-onset chest pain that goes to the neck and left arm. That makes me think about the possible locations of the coronary artery on the left side. The left coronary artery is typically on the left side of the neck, so pain radiating to the left arm could be coming from that area. But I shouldn't jump to conclusions yet.
They have a past medical history of hypercholesterolemia and coronary artery disease. Hypercholesterolemia is a risk factor for heart disease, so it's something to consider, but not directly indicative of which artery is affected. The key here is the tachycardia, which is an elevated heart rate. Tachycardia can be a clue for several issues, including heart valve problems or arrhythmias, but in the context of chest pain, it's often linked to coronary artery issues.
Elevated troponin I levels are a big hint. Troponin I is a marker of heart muscle injury. Elevated levels suggest an acute myocardial infarction or a significant ischemia. Myocardial infarction typically affects one of the coronary arteries, causing tissue damage.
Now, considering the presentation: the pain is radiating to the left arm. The left arm is typically served by the left anterior descending (LAD) and left circumflex (LCx) coronary arteries. The LAD supplies the left side of the chest, including the left arm, while the LCx supplies the upper arm and neck.
The tachycardia could be due to the heart muscle being ischemic, which can cause the heart to beat faster as it tries to pump blood. This is more common in LAD disease because the LAD is responsible for the left side of the heart, which includes the left ventricle and part of the left atrium. If the LAD is blocked, the left ventricle might not get enough blood, leading to a rapid heart rate.
On the other hand, LCx disease affects the right upper arm and the neck. If that's the case, the pain might radiate to the right side of the neck or the right arm, not the left. But the patient's pain is specifically on the left side, so that makes me lean towards LAD being the culprit.
I should also consider the possible causes of LAD disease. It's often due to atherosclerosis, which is more common in patients with a history of hypercholesterolemia. Since the patient has a history of coronary artery disease, this adds to the suspicion that it's LAD.
So putting it all together: the presentation points to the left arm, the elevated troponin suggests myocardial injury, and the tachycardia supports ischemia. All these factors together make the most likely coronary artery affected the left anterior descending (LAD).
The most likely coronary artery affected in this patient is the **left anterior descending (LAD)**.
**Step-by-Step Explanation:**
1. **Presentation of Pain:** The patient experiences chest pain radiating to the left arm. This suggests involvement of the left coronary artery, as the LAD supplies blood to the left side of the chest and the left arm.
2. **Elevated Troponin I Levels:** Elevated troponin I indicates myocardial injury, commonly associated with a myocardial infarction or significant coronary artery disease.
3. **Tachycardia:** Increased heart rate is often due to ischemia, which can occur when blood flow to the heart muscle is reduced, as happens with coronary artery disease.
4. **Coronary Artery Disease and History:** The patient's past history of coronary artery disease and hypercholesterolemia (a risk factor for cardiovascular disease) supports the suspicion of ischemia in one of the coronary arteries.
5. **Coronary Artery Locations:** The LAD is responsible for the left side of the heart, including the left ventricle and part of the left atrium. Blockage of this artery would cause ischemia and tachycardia, aligning with the patient's symptoms.
**Conclusion:** Based on the presentation, elevation of troponin I, and tachycardia, the most likely coronary artery affected is the left anterior descending (LAD).<|end▁of▁sentence|>
最有可能受累的冠状动脉是 左前降支(LAD)。
3.3 最小可行性实验
from modelscope.msdatasets import MsDatasetds =MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT')
{'Question': 'A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?', 'Complex_CoT': "Okay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.", 'Response': 'Cystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.'}
from modelscope.msdatasets import MsDatasetds =MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT', split = "train[0:500]")
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
### Response:
EOS_TOKEN = tokenizer.eos_tokentokenizer.eos_token
def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}
ds = ds.map(formatting_prompts_func, batched = True,)
"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.\n</think>\nCystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.<|end▁of▁sentence|>"
model = FastLanguageModel.get_peft_model(model,r=16,target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,bias="none",use_gradient_checkpointing="unsloth",# True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,loftq_config=None,)
Unsloth 2025.2.12 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
from trl import SFTTrainerfrom transformers import TrainingArgumentsfrom unsloth import is_bfloat16_supported
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=ds,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_step=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),)
per_device_train_batch_size = 2,
这只会对数据集进行更多遍历。gradient_accumulation_steps = 4,
max_steps = 60, # num_train_epochs = 1,
,并将其替换为num_train_epochs = 1
。将其设置为 1 表示对数据集进行 1 次完整传递。通常建议传递 1 到 3 次,不要更多,否则微调会过度拟合。learning_rate = 2e-4,
Applying chat template to train dataset (num_proc=2): 100%|██████████| 500/500 [00:01<00:00, 348.86 examples/s]Tokenizing train dataset (num_proc=2): 100%|██████████| 500/500 [00:02<00:00, 221.20 examples/s]Tokenizing train dataset (num_proc=2): 100%|██████████| 500/500 [00:00<00:00, 920.96 examples/s] Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
trainer_stats = trainer.train()
PeftModelForCausalLM((base_model): LoraModel((model): Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 3584, padding_idx=151654)(layers): ModuleList((0-27): 28 x Qwen2DecoderLayer((self_attn): Qwen2Attention((q_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=3584, bias=True)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=3584, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(k_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=512, bias=True)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=512, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(v_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=512, bias=True)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=512, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(o_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=3584, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=3584, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(rotary_emb): LlamaRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=18944, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=18944, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(up_proj): lora.Linear((base_layer): Linear(in_features=3584, out_features=18944, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=3584, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=18944, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(down_proj): lora.Linear((base_layer): Linear(in_features=18944, out_features=3584, bias=False)(lora_dropout): ModuleDict((default): Identity())(lora_A): ModuleDict((default): Linear(in_features=18944, out_features=16, bias=False))(lora_B): ModuleDict((default): Linear(in_features=16, out_features=3584, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False))))
inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
response = tokenizer.batch_decode(outputs)
Alright, so we have a 61-year-old woman here, and she's been dealing with this weird issue where she loses urine during activities like coughing or sneezing. That's interesting. Now, she doesn't have leakage at night, which is a good sign, but it's definitely happening during these activities. Hmm, I'm thinking this might be some kind of urinary incontinence, but not the usual type. Let's think about what could be causing this.
Okay, during coughing or sneezing, the bladder gets pressed. This pressure can push urine out, even if the leakage at night isn't happening. So, this sounds like it might be a case of interstitial incontinence, where the bladder empties during certain activities, but doesn't necessarily have a leakage at night.
Now, let's consider the gynecological exam and the Q-tip test. These are typically used to check for urethral strictures. A urethral stenosis, or a narrowing, can certainly cause urinary incontinence, especially during activities that press the bladder. The Q-tip test is a useful tool here to determine if there's a urethral obstruction.
If there's a urethral stenosis, it would definitely make sense that the Q-tip test might show a positive result, indicating some kind of obstruction. This would explain why she's losing urine during activities but not at night.
Now, let's think about what cystometry would reveal. Cystometry is a technique used to measure the residual volume in the bladder, which is the amount of urine left after the bladder empties. If there's a urethral stenosis, you'd expect to see a significant residual volume because the bladder doesn't empty completely, even though it's pressed during activities.
And what about detrusor contractions? In this scenario, with urethral stenosis, the detrusor muscles, which are responsible for bladder contraction, might not be fully contracting. This would lead to the bladder not emptying completely during the activities that press it, hence the residual volume.
So, putting this all together, if cystometry is done, it's likely to show a significant residual volume in the bladder and possibly reduced detrusor contractions due to the urethral obstruction. This would align with the symptoms she's experiencing during coughing or sneezing.
Based on the symptoms and the findings from the gynecological exam and Q-tip test, cystometry would likely reveal a significant residual volume in the bladder and reduced detrusor contractions. The presence of a urethral stenosis, inferred from these findings, would explain the urinary incontinence during activities like coughing or sneezing, while the absence of leakage at night suggests that the bladder doesn't empty completely due to the obstruction. The significant residual volume and reduced detrusor contractions are consistent with this type of incontinence.<|end▁of▁sentence|>
new_model_local = "DeepSeek-R1-Medical"
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)
3.4 微调参数解析
is_bfloat16_supported(来自 unsloth):
○这个函数检查当前 GPU 是否支持 bfloat16(BF16),如果支持,则返回 True,否则返回 False
○bfloat16 是一种更高效的数值格式,在新款 NVIDIA A100/H100 等GPU上表现更优。
3.5 完整高效微调实验
from unsloth import FastLanguageModel
from modelscope.msdatasets import MsDataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
### Response:
max_seq_length = 2048
dtype = None
load_in_4bit = False
tokenizer = FastLanguageModel.from_pretrained(
model_name = "./DeepSeek-R1-Distill-Qwen-7B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
EOS_TOKEN = tokenizer.eos_token# Must add EOS_TOKEN
def formatting_prompts_func(examples):
inputs = examples["Question"]
cots = examples["Complex_CoT"]
outputs = examples["Response"]
texts = []
for input, cot, output in zip(inputs, cots, outputs):
text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
return {
texts, :
ds =MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT', split = "train")
dataset = ds.map(formatting_prompts_func, batched = True,)
model = FastLanguageModel.get_peft_model(
use_gradient_checkpointing="unsloth",# True or "unsloth" for very long context
trainer = SFTTrainer(
num_train_epochs = 3,
# max_steps=60,
fp16=not is_bfloat16_supported(),
trainer_stats = trainer.train()
new_model_local = "DeepSeek-R1-Medical"
tokenizer, save_method = "merged_16bit",)
