我要投稿

这可能是全网最保姆的DeepSeek微调教学

发布日期：2025-03-06 18:05:26 浏览次数： 1971

作者：赋范大模型技术圈

微信搜一搜，关注“赋范大模型技术圈”

微调是增强大模型原生能力的最佳方法。

借助微调，我们可以优化大模型的问答语气风格、可以增强模型推理能力和Agent 能力，甚至是进行专业领域的知识灌注。

本次以医疗领域为例，对 DeepSeek 进行专项提升！

最终达到问答风格优化 + 知识灌注目的，让模型在微调过程中掌握复杂医学问题的专业推理过程，并提高疾病诊断的准确率。

这套微调流程可以适用于任意尺寸任意精度的 DeepSeek R1 模型。

微调后还可以一键创建微调后的 GGUF 模型权重，无缝代入 Ollama 、 vLLM 等主流大模型推理工具进行对话！

硬件要求：本节公开课最小化复现仅需 7G 显存、半小时运行时间即可完成，并获得微调效果。

训练流程迁移：本节公开课介绍的 DeepSeek R1 模型的高效微调流程可以迁移至 DeepSeek R1 任意蒸馏模型、任意COT 数据集，甚至是进行DeepSeek R1 模型高效微调。

课件代码：公开课随课提供全部课件、代码、训练数据、模型微调前后权重等各项内容。后台回复“777”即可无偿领取

课程参考资料：为了更好的辅助学习，随公开课附赠相关参考资料。

下面正式开始！

1.借助 Unsloth 进行模型推理

from unsloth import FastLanguageModel

尝试用 unsloth 进行 LLama 模型推理

首先设置关键参数，这些参数会影响模型的性能和资源需求，并加载模型：

max _ seq _ length  = 2048 dtype  = None load _ in _ 4bit  = False

* 注，若显存不足，则可以 load _ in _ 4bit = True ，运行 4 bit 量化版。

在 INT4 量化情况下， 8B 模型推理仅需 7G 左右显存。此时model 就是读取进来的 DeepSeek R1 8B 蒸馏模型，而 tokenizer 则是分词器。

将模型调整为推理模式：

FastLanguageModel . for _ inference  ( model  )

然后即可和模型进行对话：

question  =  " 请问如何证明根号 2 是无理数？ "

然后这里我们首先需要借助分词器，将输入的问题转化为标记索引：

inputs  = tokenizer  (  [  question  ] , return _ tensors=  "pt"  ).to  (  "cuda"  )

最后再带入 inputs 进行对话

outputs  = model.generate  ( input _ ids  = inputs.input _ ids ,max _ new _ tokens  = 1200 , use _ cache  = True ,  )

此时得到的回复也是词索引，同样需要分词器将其转化为文本：

response  = tokenizer.batch _ decode ( outputs )

response

print ( response [ 0 ])

至此我们就完成了 unsloth 模型推理流程。

2.原始模型的医疗问题问答

设置问答模板

prompt _ style = """Below is an instruction that describes a task , paired with an input that provides further context.Write a response that appropriately completes the request.Before answering , think carefully about the question and create a step- by  -step chain of thoughts to ensure a logical and accurate response.
 ### Instruction : You are a medical expert with advanced knowledge in clinical reasoning , diagnostics , and treatment planning.Please answer the following medical question.
 ### Question : { }
 ### Response :  <think> { } """

翻译如下：

prompt _ style = """ 以下是一个任务说明，配有提供更多背景信息的输入。
请写出一个恰当的回答来完成该任务。
在回答之前，请仔细思考问题，并按步骤进行推理，确保回答逻辑清晰且准确。

 ### Instruction : 
您是一位具有高级临床推理、诊断和治疗规划知识的医学专家。
请回答以下医学问题。

接下来我们抽取部分 medical-o1-reasoning-SFT 数据集中问题进行提问，并查看初始状态下模型回答结果。

question _ 1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test.Based on these findings , what would cystometry most likely reveal about her residual volume and detrusor contractions?  "

翻译：一位 61 岁的女性，有长期在咳嗽或打喷嚏等活动中发生不自主尿液流失的病史，但夜间没有漏尿。她接受了妇科检查和 Q-tip 测试。根据这些检查结果，膀胱测量（ cystometry ）最可能会显示她的残余尿量和逼尿肌收缩情况如何？

question _ 2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm , with a past medical history of hypercholesterolemia and coronary artery disease , elevated troponin I levels , and tachycardia , what is the most likely coronary artery involved based on this presentation?  "

翻译：面对一位突发胸痛并放射至颈部和左臂的患者，其既往病史包括高胆固醇血症和冠状动脉疾病，同时伴有升高的肌钙蛋白 I 水平和心动过速，根据这些临床表现，最可能受累的冠状动脉是哪一条？

问答测试

inputs1 = tokenizer ([ prompt _ style.format ( question _ 1 , ""  )  ] , return _ tensors=  "pt"  ).to  (  "cuda"  ) 

outputs1 = model.generate  ( input _ ids=inputs1.input _ ids ,max _ new _ tokens=1200 , use _ cache=True ,  ) 
response1 = tokenizer.batch _ decode ( outputs1 )

print  ( response1  [  0  ] . split  (  "### Response : "  )  [  1  ]  )

回复如下：

 <think> Okay , so I  'm trying to figure out this medical question.Let' s break it down step by step.The patient is a 61  -year-old woman with a history of involuntary urine loss during activities like coughing or sneezing , but she doesn  't leak at night.She' s undergoing a gynecological exam and a Q-tip test.The question is about what cystometry would show regarding her residual volume and detrusor contractions.
First , I know that involuntary urine loss during activities like sneezing or coughing is often related to stress urinary incontinence.Stress incontinence usually happens when the urethral muscles aren  't strong enough to prevent the bladder from leaking when there' s increased pressure , like from coughing .
Now , the Q-tip test.From what I remember , the Q-tip is a urethral catheter used to measure urethral pressure profiles.It  's often used to assess urethral function.A positive Q-tip test result , where the urethral pressure is lower than the intravesicular pressure during the Valsalva maneuver , is associated with intrinsic sphincteric deficiency , which is a type of stress incontinence .
Since the patient has a history of involuntary leakage during activities but not at night , it' s more likely stress incontinence rather than something like urgency incontinence , which would also involve nighttime leakage.So ,if the Q-tip test is positive , it points towards intrinsic sphincteric deficiency .
Now , moving to cystometry.Cystometry is a test that measures how the bladder responds to filling and the contractions of the detrusor muscle.It can show if there  's an overactive bladder ( OAB ), which causes urgency and frequency , or if there' s underactive detrusor , which can lead to retention .
In this case , the patient  's main issue is stress incontinence , which is more about the inability to hold urine under increased pressure.Cystometry would look at the detrusor contractions.If the detrusor is underactive , it won' t contract strongly enough to empty the bladder , leading to residual volume.But if it  's overactive , it might contract too strongly , causing urgency .
Given that the patient has a history of stress incontinence and a positive Q-tip test suggesting intrinsic sphincteric deficiency , I think the cystometry would show that the detrusor contractions are normal.The issue isn' t with the detrusor muscle  's ability to contract but rather the inability to seal the urethra to hold the pressure.So , the residual volume might be normal unless there' s significant retention , but the key finding would be the detrusor contractions being normal ,not overactive.
Wait , but could there be residual volume? If the patient has some urine left in the bladder after urination , that  's residual volume.But without symptoms of retention , like a full bladder or difficulty starting urination , it' s less likely.The main issue is the incontinence during activities , so the detrusor contractions are normal ,and the residual volume is within normal limits unless there  's another condition.
So , putting it all together , the cystometry would likely show normal detrusor contractions and normal residual volume.The issue is more on the sphincteric side , not the detrusor . </think> 
Basedontheanalysisofthepatient' shistory and theQ-tiptestresults , thecystometrywouldmostlikelyrevealnormaldetrusorcontractions and normalresidualvolume.Theprimaryissueappearstobestressurinaryincontinenceduetointrinsicsphinctericdeficiency , asindicatedbythepositiveQ-tiptest.Thisconditiontypicallyaffectstheurethralsphincter  'sabilitytopreventleakageunderincreasedpressure , ratherthanthedetrusormuscle' scontractility.Therefore , thedetrusorcontractionsare not overactive , and theresidualvolumeiswithinnormallimits.< ｜end ▁ of ▁ sentence ｜ >

翻译如下：

<think> 好的，我正在尝试分析这个医学问题。我们一步步来分解。患者是一位 61 岁的女性，有在咳嗽或打喷嚏等活动中发生不自主尿液流失的病史，但她夜间没有漏尿。她正在接受妇科检查和 Q-tip 测试。问题是关于膀胱测量（ cystometry ）会显示她的残余尿量和逼尿肌收缩情况。

首先，我知道在像打喷嚏或咳嗽等活动中发生不自主尿液流失通常与压力性尿失禁有关。压力性尿失禁通常发生在尿道肌肉不足以在压力增大的情况下（比如咳嗽时）防止膀胱漏尿时。

接下来是 Q-tip 测试。根据我记得的， Q-tip 是一种用于测量尿道压力曲线的尿道导管。它通常用于评估尿道功能。Q-tip 测试阳性结果，即在 Valsalva 操作过程中尿道压力低于膀胱内压，与内源性括约肌缺陷相关，这是一种压力性尿失禁类型。

由于患者有在活动中出现不自主漏尿的病史，但夜间没有漏尿，更可能是压力性尿失禁，而不是像急迫性尿失禁那样的情况，急迫性尿失禁通常伴有夜间漏尿。因此，如果 Q-tip 测试阳性，提示内源性括约肌缺陷。

现在，谈到膀胱测量。膀胱测量是一种测试，旨在测量膀胱在充盈过程中的反应以及逼尿肌的收缩情况。它可以显示是否存在膀胱过度活动症（ OAB ），即引起急迫感和频尿的情况，或是否存在逼尿肌低活动性，导致尿潴留。

在这种情况下，患者的主要问题是压力性尿失禁，这更与无法在压力增大时保持尿液有关。膀胱测量会查看逼尿肌的收缩情况。如果逼尿肌低活动性，它将不能强有力地收缩以排空膀胱，导致残余尿量。但如果逼尿肌过度活跃，可能会收缩过度，导致急迫感。

鉴于患者有压力性尿失禁的病史和 Q-tip 测试阳性，提示内源性括约肌缺陷，我认为膀胱测量会显示逼尿肌的收缩是正常的。问题不在于逼尿肌收缩的能力，而是无法密封尿道以保持压力。因此，残余尿量可能是正常的，除非有明显的尿潴留，但关键发现是逼尿肌的收缩是正常的，而不是过度活跃。

等等，但会不会有残余尿量？如果患者排尿后膀胱中残留一些尿液，那就是残余尿量。但如果没有尿潴留的症状，比如膀胱饱胀或排尿困难，那么这种情况的可能性较小。主要问题是在活动中发生的尿失禁，因此逼尿肌收缩是正常的，残余尿量在正常范围内，除非有其他情况。

所以，综合来看，膀胱测量可能会显示逼尿肌的收缩正常，残余尿量正常。问题更多是在括约肌方面，而不是逼尿肌。 </think>

根据对患者病史和 Q-tip 测试结果的分析，膀胱测量最可能显示逼尿肌的收缩正常，残余尿量正常。主要问题似乎是由于内源性括约肌缺陷引起的压力性尿失禁，如 Q-tip 测试阳性所示。这种情况通常影响尿道括约肌在压力增大时防止漏尿的能力，而不是逼尿肌的收缩能力。因此，逼尿肌的收缩并未过度活跃，残余尿量在正常范围内。

标准答案：

在这种压力性尿失禁的情况下，膀胱测压检查（ cystometry ）最可能显示正常的排尿后残余尿量，因为压力性尿失禁通常不会影响膀胱排空功能。此外，由于压力性尿失禁主要与身体用力有关，而不是膀胱过度活动症（ OAB ），因此在测试过程中不太可能观察到逼尿肌的非自主收缩。

问题二的测试如下：

inputs2 = tokenizer ([ prompt _ style.format ( question _ 2 , ""  )  ] , return _ tensors=  "pt"  ).to  (  "cuda"  ) 

outputs2 = model.generate  ( input _ ids=inputs2.input _ ids ,max _ new _ tokens=1200 , use _ cache=True ,  ) 
response2 = tokenizer.batch _ decode ( outputs2 )

print  ( response2  [  0  ] . split  (  "### Response : "  )  [  1  ]  )

 <think> Okay , so I have a patient presenting with sudden-onset chest pain that's radiating to the neck and left arm.That makes me think of a heart attack because those symptoms are classic — angina or myocardial infarction.The left arm pain , neck pain , and sometimes jaw or back pain can all be related to coronary artery issues .
Looking at the past medical history , the patient has hypercholesterolemia , which is high cholesterol , and coronary artery disease .Those are both risk factors for atherosclerosis , which can lead to blockages in the coronary arteries.The elevated troponin I levels are a big clue because troponin is a cardiac enzyme released when the heart muscle is damaged , which is a sign of a heart attack.Also , the patient is experiencing tachycardia , which means their heart is beating faster than usual.In a heart attack , the heart might beat faster as it tries to pump blood to compensate for the blocked artery.
Now , considering the coronary arteries , the left main coronary artery supplies blood to the entire left side of the heart , including the left ventricle , which is a large muscle that's crucial for pumping blood.If there's a blockage here , it can lead to a more severe heart attack because the left ventricle is so vital.The right coronary artery supplies the right ventricle and the inferior wall of the left ventricle.Blockages here are possible too , but the left main is more commonly associated with the symptoms described , especially when troponin is elevated.
So putting it all together , the most likely coronary artery involved is the left main coronary artery.The combination of the patient's history , the elevated troponin , and the typical chest pain radiation points to this artery being the culprit . </think> 
The most likely coronary artery involved in this presentation is the ** left main coronary artery  ( LMCA  )  **.
 ** Explanation : **  - ** Symptoms : ** The patient's sudden chest pain radiating to the neck and left arm , along with elevated troponin levels , suggests an acute coronary syndrome , likely a myocardial infarction ( heart attack ). - ** Past Medical History : ** History of hypercholesterolemia and coronary artery disease are risk factors for atherosclerosis , which can lead to blockages in the coronary arteries. - ** Tachycardia : ** Increased heart rate may occur as the heart compensates for reduced blood flow to the heart muscle. - ** Coronary Artery Consideration : ** The left main coronary artery supplies the left ventricle , a large muscle that is crucial for cardiac function.Blockages in the LMCA can lead to more severe and life-threatening heart attacks compared to blockages in the right coronary artery , which typically supply less critical areas .
Thus , thecombinationofsymptoms , elevatedtroponin , andthepatient'shistorystronglypointstothe** left maincoronaryartery** as themostlikelyculprit.< ｜ end ▁ of ▁ sentence ｜ >

翻译如下：

<think> 好的，我有一位患者，突然出现胸痛，并放射到颈部和左臂。这让我想到了心脏病发作，因为这些症状很经典——心绞痛或心肌梗死。左臂痛、颈部痛，有时还会伴随下颌或背部的疼痛，这些都可能与冠状动脉问题相关。

从病史来看，患者有高胆固醇血症（即高胆固醇）和冠状动脉疾病，这两个因素都是动脉粥样硬化的风险因素，可能导致冠状动脉发生堵塞。肌钙蛋白 I 升高是一个很大的线索，因为肌钙蛋白是心肌受损时释放的心脏酶，通常表明发生了心肌梗死。另外，患者还出现了心动过速，即心跳比平常快。在心肌梗死时，心脏可能会加速跳动，以试图通过增加心脏输出量来补偿被阻塞的冠状动脉。

考虑到冠状动脉，左主冠状动脉（ LMCA ）为整个左侧心脏提供血液，包括左心室，而左心室是一个关键的泵血肌肉。如果这里发生堵塞，可能导致更严重的心肌梗死，因为左心室至关重要。右冠状动脉为右心室和左心室下壁提供血液，这里的堵塞也是可能的，但左主冠状动脉通常与上述症状更相关，尤其是当肌钙蛋白升高时。

所以，将所有因素综合考虑，最可能受累的冠状动脉是左主冠状动脉（ LMCA ）。患者的病史、肌钙蛋白升高以及典型的胸痛放射症状都指向了这一动脉作为罪魁祸首。

</think>

最可能受累的冠状动脉是左主冠状动脉（ LMCA ）。

解释：

症状：患者突发胸痛并放射至颈部和左臂，以及肌钙蛋白升高，提示急性冠状动脉综合症，可能是心肌梗死。
病史：高胆固醇血症和冠状动脉疾病病史是动脉粥样硬化的风险因素，可能导致冠状动脉堵塞。
心动过速：心率增加可能是心脏为补偿心肌血流减少而产生的反应。
冠状动脉考虑：左主冠状动脉供应左心室，这个肌肉对心脏功能至关重要。与右冠状动脉相比，左主冠状动脉的堵塞会导致更严重且危及生命的心肌梗死，右冠状动脉通常供应的是不那么关键的区域。
因此，症状、肌钙蛋白升高以及患者的病史强烈指向左主冠状动脉（ LMCA ）作为最可能的罪魁祸首。 < ｜ end ▁ of ▁ sentence ｜ >

标准答案：

根据患者表现出的突然胸痛并放射至颈部和左臂，结合其有高胆固醇血症和冠状动脉疾病的病史，肌钙蛋白升高和心动过速，临床症状强烈提示左前降支（ LAD ）动脉受累。该动脉通常是引发此类症状的罪魁祸首，因为它供应了心脏的大部分区域。放射性疼痛和肌钙蛋白升高的组合表明心肌受损，这使得 LAD 成为最可能的致病动脉。然而，在没有进一步的诊断检查（如心电图）的情况下，最终的确诊仍需等待确认。

能够看出，在原始状态下，模型能够进行推理并给出回复，但实际上第一个回答过程并不符合医学规范。

第二个问题则直接回答错误。

由此可见，在初始状态下，模型对于 medical-o1-reasoning-SFT 数据集问答效果并不好。

接下来尝试进行微调，并测试微调后模型问答效果。

二、最小可行性实验

对于当前数据集而言，我们可以带入原始数据集的部分数据进行微调，也可以带入全部数据并遍历多次进行微调。

对于大多数的微调实验，我们都可以从最小可行性实验入手进行微调，也就是先尝试带入少量数据进行微调，并观测微调效果。

若微调可以顺利执行，并能够获得微调效果，再考虑带入更多的数据进行更大规模微调。

1.数据集准备

这里我们直接从 huggingface 上下载 medical-o1-reasoning-SFT 数据集。

设置代理环境

由于 huggingface 网络受限，下载数据集前需要先进行网络环境设置。

若是 AutoDL 服务器，则可以按照如下方式开启学术加速，从而顺利连接 huggingface 并进行数据集下载：

import subprocess import os 
result = subprocess.run ( 'bash - c  "source /etc/network _ turbo && env | grep proxy"  ' , shell= True , capture _ output= True , text= True  ) output = result.stdout for line in output.splitlines  (  ) :if  '=' in line : var , value = line .split  (  '=' , 1  ) os.environ  [ var  ]  = value

下载数据集

接下来使用 datasets 进行数据集下载

！pip install datasets

import os from datasets import load _ dataset

提取并设置文本生成结束的标记：

EOS _ TOKEN = tokenizer.eos _ token tokenizer.eos _ token

 '< ｜ end ▁ of ▁ sentence ｜ >'

然后定义函数，用于对 medical-o1-reasoning-SFT 数据集进行修改， Complex _ CoT 列和 Response 列进行拼接，并加上文本结束标记：

def formatting _ prompts _ func  (  examples  ) : inputs = examples  [  "Question"  ] cots = examples  [  "Complex _ CoT"  ] outputs = examples  [  "Response"  ] texts =  [  ] for input , cot , output in zip ( inputs , cots , outputs ):text = train _ prompt _ style.format ( input , cot , output ) + EOS _ TOKEN texts.append ( text )return {  "text" : texts , }

在最小可行性实验中，我们可以只下载 500 条数据进行微调即可看出效果：

dataset  =load _ dataset  (  "FreedomIntelligence/medical-o1-reasoning-SFT" ,  "en" , split=  "train [ 0 : 500 ] " , trust _ remote _ code= True  )

Using the latest cached version of the dataset since FreedomIntelligence/medical-o1-reasoning-SFT couldn't be found on the Hugging Face Hub Found thelatestcacheddatasetconfiguration'en'at/root/.cache/huggingface/datasets/FreedomIntelligence ___ medical-o1-reasoning-sft/en/0.0.0/4c9573e7de1e8660b88158db2efa7c7204bbd269 ( lastmodified on WedFeb501 : 06 : 322025  ) .

dataset  [  0  ]

{  'Question' :  'A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test.Based on these findings , what would cystometry most likely reveal about her residual volume and detrusor contractions?' ,  'Complex _ CoT' :  "Okay , let's think about this step by step.There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing.This sounds a lot like stress urinary incontinence to me.Now , it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping.This likely means her bladder's ability to hold urine is fine when she isn't under physical stress.Hmm , that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem.\n\nThe fact that she underwent a Q-tip test is intriguing too.This test is usually done to assess urethral mobility.In stress incontinence , a Q-tip might move significantly , showing urethral hypermobility.This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure.So , that's aligning well with stress incontinence.\n\nNow , let's think about what would happen during cystometry.Since stress incontinence isn't usually about sudden bladder contractions , I wouldn't expect to see involuntary detrusor contractions during this test.Her bladder isn't spasming or anything; it's more about the support structure failing under stress.Plus , she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying.So , her residual volume should be pretty normal.\n\nAll in all , it seems like if they do a cystometry on her , it will likely show a normal residual volume and no involuntary contractions.Yup , I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence." ,  'Response' :  'Cystometryinthiscaseofstressurinaryincontinencewouldmostlikelyrevealanormalpost-voidresidualvolume , asstressincontinencetypicallydoesnotinvolveissueswithbladderemptying.Additionally , sincestressurinaryincontinenceisprimarilyrelatedtophysicalexertionandnotanoveractivebladder , youwouldnotexpecttoseeanyinvoluntarydetrusorcontractionsduringthetest.'  }

然后进行结构化处理：

dataset  =dataset.map ( formatting _ prompts _ func , batched= True ,  )

将数据集整理为如下形式：

dataset  [  "text"  ]  [ 0  ]

数据集保存地址

默认情况下数据集保存在主目录下. cache 文件夹中，数据文件格式如下所示：

2.开启微调

然后即可把模型设置为微调模式：

model = FastLanguageModel.get _ peft _ model  ( model , r= 16 , target _ modules=  [  "q _ proj" ,  "k _ proj" ,  "v _ proj" ,  "o _ proj" ,  "gate _ proj" ,  "up _ proj" ,  "down _ proj" ,  ] , lora _ alpha= 16 , lora _ dropout= 0 , bias=  "none" , use _ gradient _ checkpointing=  "unsloth" , # True or  "unsloth" for very long context random _ state= 3407 , use _ rslora=False , loftq _ config=None ,  )

Unsloth 2025 . 1 . 8 patched 32 layers with 32 QKV layers , 32 O layers and 32 MLP layers .

然后导入相关的库：

from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is _ bfloat16 _ supported

创建有监督微调对象：

trainer  = SFTTrainer  ( model  = model , tokenizer  = tokenizer , train _ dataset  = dataset , dataset _ text _ field  =  "text" , max _ seq _ length  = max _ seq _ length , dataset _ num _ proc  = 2 , args  = TrainingArguments  ( per _ device _ train _ batch _ size  = 2 , gradient _ accumulation _ steps  = 4 ,  # Use num _ train _ epochs = 1 , warmup _ ratio for full training runs! warmup _ steps  = 5 , max _ steps  = 60 , learning _ rate  = 2e-4 , fp16  = not is _ bfloat16 _ supported  (  ) , bf16  = is _ bfloat16 _ supported  (  ) , logging _ steps  = 10 , optim  =  "adamw _ 8bit" , weight _ decay  = 0.01 ,lr _ scheduler _ type  =  "linear" , seed  = 3407 , output _ dir  =  "outputs" ,  ) ,  )

这段代码主要是用SFTTrainer 进行监督微调（ Supervised Fine-Tuning ,SFT ），适用于transformers 和Unsloth 生态中的模型微调：

1.导入相关库

SFTTrainer （来自trl库）：

trl （ Transformer Reinforcement Learning ）是 Hugging Face 旗下的trl 库，提供监督微调（ SFT ）和强化学习（ RLHF ）相关的功能。
SFTTrainer 主要用于有监督微调（ Supervised Fine-Tuning ），适用于LoRA 等低秩适配微调方式。

TrainingArguments （来自transformers 库）：

这个类用于定义训练超参数，比如批量大小、学习率、优化器、训练步数等。

is _ bfloat16 _ supported ( ) （来自unsloth ）：

这个函数检查当前GPU 是否支持bfloat16 （ BF16 ），如果支持，则返回True ，否则返回False 。
bfloat16 是一种更高效的数值格式，在新款 NVIDIA A100/H100 等 GPU 上表现更优。

2.初始化SFTTrainer 进行模型微调

参数解析

①`SFTTrainer` 部分

②`TrainingArguments` 部分

然后开始微调：

trainer _ stats  = trainer.train  (  )

Tracking run with wandb version 0.19.5

Run data is saved locally in

 /root/autodl-tmp/models/wandb/run-20250205 _ 004957-k0dz6rg7

Syncing run outputs to Weights & Biases ( docs )

View project at https : //wandb.ai/2323365771-ff/huggingface

View run at https : //wandb.ai/2323365771-ff/huggingface/runs/k0dz6rg7

此时 wandb 中显示内容如下：

trainer _ stats

注意， unsloth 在微调结束后，会自动更新模型权重（在缓存中），因此无需手动合并模型权重即可直接调用微调后的模型：

FastLanguageModel . for _ inference  ( model  )

inputs = tokenizer ([ prompt _ style.format ( question _ 1 , ""  )  ] , return _ tensors=  "pt"  ).to  (  "cuda"  ) 
outputs = model.generate  ( input _ ids=inputs.input _ ids ,attention _ mask=inputs.attention _ mask ,max _ new _ tokens=1200 , use _ cache=True ,  ) response = tokenizer.batch _ decode ( outputs )

print  ( response  [  0  ] . split  (  "### Response : "  )  [  1  ]  )

能够发现，第一个问题回答更加规范，并且回答正确。但第二个问题仍然回答错误。由此可以考虑继续进行大规模微调。

不过在此之前，我们可以将现在小规模微调的模型进行本地保存。

3.模型合并

此时本地保存的模型权重在outputs 文件夹中：

然后可使用如下代码进行模型权重合并：

new _ model _ local =  "DeepSeek-R1-Medical-COT-Tiny" model.save _ pretrained ( new _ model _ local )tokenizer.save _ pretrained ( new _ model _ local )
model.save _ pretrained _ merged ( new _ model _ local , tokenizer , save _ method =  "merged _ 16bit" ,  )

保存结束后，即可在当前文件夹中看到对应模型：

然后即可将其推送到 huggingface 上并保存为 GGUF 格式文件并进行调用。

三、完整高效微调实验

接下来我们尝试带入全部数据进行高效微调，以提升模型微调效果。

train _ prompt _ style = """Below is an instruction that describes a task , paired with an input that provides further context.Write a response that appropriately completes the request.Before answering , think carefully about the question and create a step- by  -step chain of thoughts to ensure a logical and accurate response.
 ### Instruction : You are a medical expert with advanced knowledge in clinical reasoning , diagnostics , and treatment planning.Please answer the following medical question.
 ### Question : { }
 ### Response :  <think> { } </think> { } """

EOS _ TOKEN = tokenizer.eos _ token  # Must add EOS _ TOKEN 

def formatting _ prompts _ func  (  examples  ) : inputs = examples  [  "Question"  ] cots = examples  [  "Complex _ CoT"  ] outputs = examples  [  "Response"  ] texts =  [  ] for input , cot , output in zip ( inputs , cots , outputs ):text = train _ prompt _ style.format ( input , cot , output ) + EOS _ TOKEN texts.append ( text )return {  "text" : texts , }

此时读取全部数据

dataset = load _ dataset ( "FreedomIntelligence/medical-o1-reasoning-SFT" , "en" , split = "train" , trust _remote _ code=True  ) dataset = dataset.map ( formatting _ prompts _func , batched = True , ) dataset  [  "text"  ]  [ 0  ]

model = FastLanguageModel.get _ peft _ model  ( model , r= 16 , target _ modules=  [  "q _ proj" ,  "k _ proj" ,  "v _ proj" ,  "o _ proj" ,  "gate _ proj" ,  "up _ proj" ,  "down _ proj" ,  ] , lora _ alpha= 16 , lora _ dropout= 0 , bias=  "none" , use _ gradient _ checkpointing=  "unsloth" , # True or  "unsloth" for very long context random _ state= 3407 , use _ rslora=False , loftq _ config=None ,  )

这里设置 epoch 为 3 ，遍历 3 次数据集：

from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is _ bfloat16 _ supported 
trainer  = SFTTrainer  ( model  = model , tokenizer  = tokenizer , train _ dataset  = dataset , dataset _ text _ field  =  "text" , max _ seq _ length  = max _ seq _ length , dataset _ num _ proc  = 2 , args  = TrainingArguments  ( per _ device _ train _ batch _ size  = 2 , gradient _ accumulation _ steps  = 4 , num _ train _ epochs  = 3 , warmup _ steps  = 5 ,  # max _ steps=60 , learning _ rate  = 2e-4 , fp16  = not is _ bfloat16 _ supported  (  ) , bf16  = is _ bfloat16 _ supported  (  ) , logging _ steps  = 10 , optim  =  "adamw _ 8bit" , weight _ decay  = 0.01 ,lr _ scheduler _ type  =  "linear" , seed  = 3407 , output _ dir  =  "outputs" ,  ) ,  )

trainer _ stats  = trainer.train  (  )

这里总共训练约 15 个小时。再带入两个问题进行测试，均有较好的回答效果：

问题一输出：

 <think> Alright , let  's think this through step by step.We' ve got a 61  -year-old woman who  's been dealing with involuntary urine loss whenever she does something like coughing or sneezing.That sounds like stress urinary incontinence , which usually means there' s some kind of leakage when there  's increased pressure in the abdomen.Now , the fact that she doesn' t have any leakage at night is interesting.That  's a big hint because it suggests that the problem isn' t about bladder control or retention , since there  's no issue when lying down.
Now , let' s consider the Q-tip test.This test is a clever way to check for urethral mobility.If the Q-tip moves a lot , it means the urethra is mobile , which is a typical sign of stress incontinence.So , we  're probably looking at a condition where the urethra isn' t staying closed under pressure.
Now , if we were to do a cystometry on her , we  'd be checking a few things.First , there' s the residual volume.Since she  's not leaking urine at night , her bladder should be empty , so we' d expect the residual volume to be normal.Then , there  's the detrusor contractions.Stress incontinence doesn' t usually involve the detrusor muscle contracting abnormally.So , we  'd expect normal or maybe slightly increased contractions , but nothing out of the ordinary .
So , putting it all together , if we do the cystometry , we' d likely see a normal residual volume and normal detrusor contractions.That fits with everything we know about her symptoms and the Q-tip test results.Everything lines up with stress urinary incontinence , confirming that our conclusion makes sense . </think> Based on the information provided , the 61  -year-old woman is likely experiencing stress urinary incontinence , as suggested by the involuntary urine loss during activities like coughing or sneezing , coupled with the fact that she doesn  't experience leakage at night.During a cystometry exam , we would expect the following findings :
1.**Residual Volume : ** This would be normal.Since she does not experience leakage at night , her bladder is likely empty at bedtime , indicating a normal post-void residual volume .
2.**Detrusor Contractions : ** These would also be normal or possibly slightly increased.Stress urinary incontinence typically does not involve abnormal detrusor contractions , as the issue is more related to the closure mechanism of the urethra under increased abdominal pressure rather than bladder control or contractions .
Thus , cystometry would most likely reveal a normal residual volume and normal detrusor contractions , consistent with stress urinary incontinence.< ｜ end ▁ of ▁ sentence ｜ >

问题二输出：

 <think> Okay , let  's think about this.The patient is having sudden chest pain that's shooting to their neck and left arm.That's pretty classic for something like a myocardial infarction , right  ? I mean , the pain spreading like that usually points to a big problem in the heart's supply.
Now , considering the heart's coronary arteries ,I know the left anterior descending artery , or LAD , is a big player.It's like the main highway for blood to reach the front wall of the heart.If there's a blockage there , it can definitely cause pain that radiates to the neck and arm .
Then , there's the right coronary artery , or RCA , which supplies the right side of the heart and can affect the inferior wall of the heart.But wait , the pain pattern here seems to be more on the left side , so maybe the RCA is less likely.
The patient has hypercholesterolemia and coronary artery disease.These conditions put them at risk for atherosclerosis , which can lead to blockages in the coronary arteries.The LAD is commonly involved in such scenarios , especially when the pain spreads to the neck and arm .
Also , the elevated troponin I levels and tachycardia are strong signals that something serious is happening in the heart.These are usually seen in myocardial infarctions.Given the pain pattern and the patient's risk factors , the LAD seems like the most likely culprit here.
So , when I put all this together , it really seems like the left anterior descending artery is the most likely artery involved in this situation.It just fits with the classic presentation of anterior myocardial infarction.Yeah , I'm pretty confident about that. </think> Based on the presentation of sudden-onset chest pain radiating to the neck and left arm , along with the patient's history of hypercholesterolemia and coronary artery disease , the most likely coronary artery involved is the left anterior descending  ( LAD  )  artery.This artery supplies the front wall of the heart , and a blockage here can cause the classic symptoms described .The elevated troponin I levels and tachycardia further support the likelihood of a myocardial infarction , with the LAD being a common site for such events.< ｜ end ▁ of ▁ sentence ｜ >

最后进行模型权重保存：

new _ model _ local =  "DeepSeek-R1-Medical-COT" model.save _ pretrained ( new _ model _ local )tokenizer.save _ pretrained ( new _ model _ local )
model.save _ pretrained _ merged ( new _ model _ local , tokenizer , save _ method=  "merged _ 16bit" ,  )

以上，即完成了本次微调，你也来试试看吧