我要投稿

从零开始学大模型 | 我们来用LoRA微调1个大模型吧！

发布日期：2024-04-17 08:02:28 浏览次数： 2632 作者：牛爷儿

引言

在深度学习中，模型的精度通常与模型的复杂度和参数数量有关。更多的参数，意味着模型可以捕捉到更复杂的特征，同时，也需要更多的计算资源来训练和推理。因此，计算资源有限的情况下，部署大模型可能会跑不起来，通过使用LoRA，我们可以在保持模型精度的同时，减少计算资源的需求。

LoRA通过引入低秩矩阵(参考前文：从零开始学大模型 | LoRA竟然这么简单，只要5分钟就能理解了！)来调整权重，在不显著增加参数数量的前提下，有效地调整和优化模型。这种方法在图像分类、自然语言处理和推荐系统等任务中特别有用。

微调的核心原理如上图，在针对特定领域做微调，我们在训练模型的过程中，冻结模型原始参数，引入 B（d * 1） A (1 * k) 矩阵，新加的参数其实只有 d + k 个，在微调训练过程时，前向传播，将原始矩阵与 A×B相加，得到新的输出；反向传播过程，不计算原始权重矩阵W的梯度，不更新原始矩阵W的参数，仅仅更新 B，A，从而避免对原来的 d * k 参数做全量调整。

微调后，我们仅需要保存 B, A 就可以，后续在使用模型时，我们可以外挂LoRA，甚至是外挂不同类型的LoRA。模型原来的参数会保持不变，比如7B的模型，微调后，后续推理阶段，启用LoRA，仅仅在原来模型的基础上，增加 d + k个参数，就可以让模型在特定场景下，得到微调后的结果。

使用LoRA微调模型

接下来，我们选择使用LoRA来做微调一个demo模型，直接使用的社区里面的示例，相对比较简单。如果没有本地显卡，可以使用谷歌的colab来处理免费薅羊毛，GPU资源免费用，没显卡也能玩AI大模型。

模型代码：

class RichBoyNet(nn.Module):def __init__(self, hidden_size_1=1000, hidden_size_2=2000):super(RichBoyNet,self).__init__()self.linear1 = nn.Linear(28*28, hidden_size_1) self.linear2 = nn.Linear(hidden_size_1, hidden_size_2) self.linear3 = nn.Linear(hidden_size_2, 10)self.relu = nn.ReLU()
def forward(self, img):x = img.view(-1, 28*28)x = self.relu(self.linear1(x))x = self.relu(self.linear2(x))x = self.linear3(x)return x
net = RichBoyNet().to(device)

定义LoRA：

class LoRAParametrization(nn.Module):def __init__(self, features_in, features_out, rank=1, alpha=1, device='cpu'):super().__init__()# Section 4.1 of the paper: # We use a random Gaussian initialization for A and zero for B, so ∆W = BA is zero at the beginning of trainingself.lora_A = nn.Parameter(torch.zeros((rank,features_out)).to(device))self.lora_B = nn.Parameter(torch.zeros((features_in, rank)).to(device))nn.init.normal_(self.lora_A, mean=0, std=1)# Section 4.1 of the paper: # We then scale ∆Wx by α/r , where α is a constant in r. # When optimizing with Adam, tuning α is roughly the same as tuning the learning rate if we scale the initialization appropriately. # As a result, we simply set α to the first r we try and do not tune it. # This scaling helps to reduce the need to retune hyperparameters when we vary r.self.scale = alpha / rankself.enabled = True
def forward(self, original_weights):if self.enabled:# Return W + (B*A)*scalereturn original_weights + torch.matmul(self.lora_B, self.lora_A).view(original_weights.shape) * self.scaleelse:return original_weights