完整教程：ms-swift训练的感悟2

ms-swift官方中文文档
https://swift.readthedocs.io/zh-cn/latest/BestPractices/Reranker.html
原文

默认会从每条数据中取出MAX_POSITIVE_SAMPLES条正样本和MAX_NEGATIVE_SAMPLES条负样本，每条正样本会和MAX_NEGATIVE_SAMPLES条负样本组成一个group，因此每条数据会扩展成MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES)条数据。如果数据中正例/负例数量不足，会取全部正例/负例，如果数据中正例和负例数量超过MAX_POSITIVE_SAMPLES和MAX_NEGATIVE_SAMPLES，会进行随机采样。 IMPORTANT：展开后的数据会放在同一个batch中，因此每个设备上的实际批处理大小（effective batch size）将是 per_device_train_batch_size × MAX_POSITIVE_SAMPLES × (1 + MAX_NEGATIVE_SAMPLES)。请注意调整 per_device_train_batch_size 以避免显存不足。

MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES) 为什么这里是1+?而不是MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES) ？

这与他的训练范式有关，他本来就是point2point的，只是损失函数有区别
什么是point2point？直接给prompt

比如

你是一名优秀的数据专家，请从refer_doc中选择与用户query最相关的数据

迪迦奥特曼是哪一年播出的


a.迪迦奥特曼是1996年在日本首映的
b.迪迦奥特曼的人间体是大古
c.盖亚奥特曼是大地毁灭者

你是一名优秀的数据专家，请从refer_doc中选择与用户query最相关的数据

迪迦奥特曼是哪一年播出的


a.迪迦奥特曼是1996年在日本首映的

你是一名优秀的数据专家，请从refer_doc中选择与用户query最相关的数据

迪迦奥特曼是哪一年播出的


a.迪迦奥特曼的人间体是大古

你是一名优秀的数据专家，请从refer_doc中选择与用户query最相关的数据

迪迦奥特曼是哪一年播出的


a.盖亚奥特曼是大地毁灭者

query Official Chinese Documentation of ms-swift
https://swift.readthedocs.io/zh-cn/latest/BestPractices/Reranker.html
Original Text

By default, MAX_POSITIVE_SAMPLES positive samples and MAX_NEGATIVE_SAMPLES negative samples will be taken from each data point. Each positive sample will be paired with MAX_NEGATIVE_SAMPLES negative samples to form a group. Therefore, each data point will be expanded into MAX_POSITIVE_SAMPLES x (1 + MAX_NEGATIVE_SAMPLES) data points. If the number of positive/negative examples in the data is insufficient, all positive/negative examples will be taken. If the number of positive and negative examples exceeds MAX_POSITIVE_SAMPLES and MAX_NEGATIVE_SAMPLES, random sampling will be performed. IMPORTANT: The expanded data will be placed in the same batch, so the actual batch size on each device (effective batch size) will be per_device_train_batch_size × MAX_POSITIVE_SAMPLES × (1 + MAX_NEGATIVE_SAMPLES). Please adjust per_device_train_batch_size to avoid running out of GPU memory.

Why is it 1+ here instead of MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES) ?

This is related to his training paradigm. He is originally point2point, just with a different loss function.
What is point2point? Just give the prompt.

For example

You are an excellent data expert, please select the data most relevant to the user's query from refer_doc

When was Ultraman Tiga broadcasted?


a. Ultraman Tiga premiered in Japan in 1996
b. Ultraman Tiga's human form is Takeru
c. Gaia Ultraman is the Earth Destroyer

point2point

You are an excellent data expert, please select the most relevant data from refer_doc for the user's query

When was Ultraman Tiga broadcasted


a. Ultraman Tiga premiered in Japan in 1996

You are an excellent data expert, please select the data from refer_doc that is most relevant to the user's query

When was Ultraman Tiga aired


a. Tiga Ultraman's human form is Gao

You are an excellent data expert, please select the most relevant data from refer_doc to the user's query

When was Ultraman Tiga aired?


a. Ultraman Gaia is the Earth Destroyer

MAX_POSITIVE_SAMPLES x (1 + MAX_NEGATIVE_chouchSAMPLES)

为什么是1 + MAX_NEGATIVE_SAMPLES？

例子

MAX_POSITIVE_SAMPLES = 1
MAX_NEGATIVE_chouchSAMPLES = 2
{
query: a
pos：[A]
neg:[B,C]
}

a-A a-B a-C
1x(2+1) =3 //待入官方公式，对上了

因为他最后都是point2point的，
https://huggingface.co/Qwen/Qwen3-Reranker-0.6B
可以看出来输入一个query和多个docs 请求reranker的时候
还是通过推理 query-doc 来解决的

至于listwise ，也只是拿到point2point的loss后再编排而已，怎么编排让point2point的效果更好

posted @ 2026-01-07 20:24 gccbuaa 阅读(2) 评论(0) 收藏举报

刷新页面返回顶部