LoRA

🤖：此文章由 gpt-4.1 生成，并由人类进行少量修改

LoRA（Low-Rank Adaptation）是一种参数高效微调（PEFT, Parameter-Efficient Fine-Tuning）方法，主要用于微调大型预训练模型，比如 GPT、BERT 等。

论文：LoRA: Low-Rank Adaptation of Large Language Models

核心思想：让大模型的参数保持冻结，只在某些层（比如线性层）增加很小的可训练“低秩”权重，用于特定任务的微调。这样做可以大大减少训练参数量和显存消耗。

数学解释

假设我们有一个大语言模型，某一层有一个线性变换 \(W_0\)，通常微调时你要更新 \(W_0\) 的所有元素。

普通微调

\[y = W_0 x \]

\(W_0\)：\(d \times d\) 的大矩阵（\(d = 4096\)）
训练时需要更新 \(d^2 = 16,777,216\) 个参数

LoRA 微调

LoRA 的做法是：

冻结 \(W_0\)，不更新
只训练两个小矩阵：\(B \in \mathbb{R}^{d \times r}\), \(A \in \mathbb{R}^{r \times d},\ r=8\)
用它们的乘积 \(BA\) 来近似补偿权重的变化

\[y = (W_0 + \Delta W) x = (W_0 + BA) x \]

只需要更新 \(2dr = 65,536\) 个参数

代码示例

假设我们要在一个简单的线性层上用 LoRA：

import torch as th
import torch.nn as nn

class LoRALinear(nn.Module):
    def __init__(self, in_features, out_features, r=8):
        super().__init__()
        self.linear = nn.Linear(in_features, out_features, bias=False)
        self.linear.weight.requires_grad = False  # 冻结原始权重

        # LoRA 部分: 初始化为低秩分解
        self.lora_A = nn.Parameter(th.randn(r, in_features) * 0.01)
        self.lora_B = nn.Parameter(th.randn(out_features, r) * 0.01)

    def forward(self, x):
        lora_update = self.lora_B @ self.lora_A  # LoRA 补偿项
        return self.linear(x) + x @ lora_update.t()

# 用法示例
layer = LoRALinear(4096, 4096, r=8)
input = th.randn(2, 4096)
output = layer(input)

posted @ 2025-06-08 01:30 Undefined443 阅读(58) 评论(0) 收藏举报

刷新页面返回顶部

undefined443

LoRA

数学解释

代码示例

公告