Transformers--4-37-中文文档-二十六-

Transformers 4.37 中文文档（二十六）

原文：huggingface.co/docs/transformers

PEGASUS-X

原始文本：huggingface.co/docs/transformers/v4.37.2/en/model_doc/pegasus_x

概述

PEGASUS-X 模型由 Jason Phang、Yao Zhao 和 Peter J. Liu 在 Investigating Efficiently Extending Transformers for Long Input Summarization 中提出。

PEGASUS-X（PEGASUS eXtended）通过额外的长输入预训练和在编码器中使用交错的块局部注意力与全局标记，扩展了 PEGASUS 模型，用于长输入摘要。

该论文的摘要如下：

尽管大型预训练 Transformer 模型已被证明在处理自然语言任务方面非常有能力，但处理长序列输入仍然是一个重大挑战。其中一个任务是长输入摘要，其中输入长于大多数预训练模型的最大输入上下文。通过一系列广泛的实验，我们研究了哪些模型架构变化和预训练范式可以最有效地使预训练的 Transformer 适应长输入摘要。我们发现，具有全局编码器标记的交错块局部 Transformer 取得了性能和效率的良好平衡，并且在长序列上进行额外的预训练阶段可以有意义地提高下游摘要性能。基于我们的发现，我们介绍了 PEGASUS-X，这是 PEGASUS 模型的扩展，具有额外的长输入预训练，以处理长达 16K 个标记的输入。PEGASUS-X 在长输入摘要任务上取得了强大的性能，与更大的模型相当，同时增加了少量额外参数，并且不需要模型并行训练。

该模型由 zphang) 贡献。原始代码可在此处找到。

文档资源

翻译任务指南
摘要任务指南

PEGASUS-X 使用与 PEGASUS 相同的分词器。

PegasusXConfig

`class transformers.PegasusXConfig`

< source >

( vocab_size = 96103 max_position_embeddings = 16384 encoder_layers = 16 encoder_ffn_dim = 4096 encoder_attention_heads = 16 decoder_layers = 16 decoder_ffn_dim = 4096 decoder_attention_heads = 16 encoder_layerdrop = 0.0 decoder_layerdrop = 0.0 use_cache = True is_encoder_decoder = True activation_function = 'gelu' d_model = 1024 dropout = 0.1 attention_dropout = 0.0 activation_dropout = 0.0 init_std = 0.02 decoder_start_token_id = 0 scale_embedding = True pad_token_id = 0 eos_token_id = 1 forced_eos_token_id = 1 num_global_tokens = 32 block_size = 512 stagger_local_blocks = True **kwargs )

参数

vocab_size (int, optional, defaults to 96103) — PEGASUS-X 模型的词汇大小。定义了在调用 PegasusXModel 时可以表示的不同标记数量。
d_model (int, optional, defaults to 1024) — 层和池化器层的维度。
encoder_layers (int, optional, defaults to 16) — 编码器层数。
decoder_layers (int, optional, defaults to 16) — 解码器层数。
encoder_attention_heads (int, optional, defaults to 16) — Transformer 编码器中每个注意力层的注意力头数。
decoder_attention_heads (int, optional, defaults to 16) — Transformer 解码器中每个注意力层的注意力头数。
decoder_ffn_dim (int, optional, defaults to 4096) — 解码器中“中间”（通常称为前馈）层的维度。
encoder_ffn_dim (int, optional, defaults to 4096) — 解码器中“中间”（通常称为前馈）层的维度。
activation_function (str or function, optional, defaults to "gelu") — 编码器和池化器中的非线性激活函数（函数或字符串）。如果是字符串，支持 "gelu", "relu", "silu" 和 "gelu_new"。
dropout (float, optional, defaults to 0.1) — 嵌入层、编码器和池化器中所有全连接层的丢弃概率。
attention_dropout (float, optional, defaults to 0.0) — 注意力概率的丢弃比率。
activation_dropout（float，可选，默认为 0.0）— 全连接层内激活的丢弃比率。
max_position_embeddings（int，可选，默认为 16384）— 此模型可能使用的最大序列长度。通常将其设置为较大的值以防万一（例如，512、1024 或 2048）。
init_std（float，可选，默认为 0.02）— 用于初始化所有权重矩阵的截断正态初始化器的标准差。
encoder_layerdrop（float，可选，默认为 0.0）— 编码器的 LayerDrop 概率。有关更多详细信息，请参阅 LayerDrop paper)。
decoder_layerdrop（float，可选，默认为 0.0）— 解码器的 LayerDrop 概率。有关更多详细信息，请参阅 LayerDrop paper)。
use_cache（bool，可选，默认为True）— 模型是否应返回最后的键/值注意力（并非所有模型都使用）
forced_eos_token_id（int，可选，默认为 1）— 当达到max_length时，强制作为最后生成的标记的标记 id。通常设置为eos_token_id。
num_global_tokens（int，可选，默认为 128）— 用于编码器的全局标记数
block_size（int，可选，默认为 512）— 编码器局部注意力的块大小。序列长度应该是块大小的精确倍数。如果stagger_local_block为 True，则block_size必须是 2 的倍数。
stagger_local_block（bool，可选，默认为True）— 是否将每个其他局部注意力错开半个块

这是用于存储 PegasusXModel 配置的配置类。它用于根据指定的参数实例化一个 PEGASUS-X 模型，定义模型架构。使用默认值实例化配置将产生类似于 PEGASUS-X google/pegasus-x-large架构的配置。

配置对象继承自 PretrainedConfig，可用于控制模型输出。阅读 PretrainedConfig 的文档以获取更多信息。

示例：

>>> from transformers import PegasusXConfig, PegasusXModel

>>> # Initializing a PEGASUS google/pegasus-x-large style configuration
>>> configuration = PegasusXConfig()

>>> # Initializing a model (with random weights) from the google/pegasus-x-large style configuration
>>> model = PegasusXModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

龙哥盟

掠夺·扩张·投机·博弈

Transformers--4-37-中文文档-二十六-

Transformers 4.37 中文文档（二十六）

PEGASUS-X

概述

文档资源

PegasusXConfig

class transformers.PegasusXConfig

PegasusXModel

class transformers.PegasusXModel

forward

PegasusXForConditionalGeneration

class transformers.PegasusXForConditionalGeneration

forward

Persimmon

概述

使用提示

PersimmonConfig

class transformers.PersimmonConfig

PersimmonModel

class transformers.PersimmonModel

forward

PersimmonForCausalLM

class transformers.PersimmonForCausalLM

forward

PersimmonForSequenceClassification

class transformers.PersimmonForSequenceClassification

forward

Phi

概述

摘要

使用提示

如何使用 Phi-2

示例：

结合 Phi 和 Flash Attention 2

预期加速

PhiConfig

class transformers.PhiConfig

PhiModel

class transformers.PhiModel

forward

PhiForCausalLM

class transformers.PhiForCausalLM

forward

PhiForSequenceClassification

class transformers.PhiForSequenceClassification

forward

PhiForTokenClassification

class transformers.PhiForTokenClassification

forward

PhoBERT

概述

用法示例

PhobertTokenizer

class transformers.PhobertTokenizer

add_from_file

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

get_special_tokens_mask

PLBart

概述

使用示例

监督训练

生成

资源

PLBartConfig

class transformers.PLBartConfig

PLBartTokenizer

class transformers.PLBartTokenizer

build_inputs_with_special_tokens

PLBartModel

class transformers.PLBartModel

forward

PLBartForConditionalGeneration

class transformers.PLBartForConditionalGeneration

forward

PLBartForSequenceClassification

class transformers.PLBartForSequenceClassification

forward

`class transformers.PegasusXConfig`

`class transformers.PegasusXModel`

`forward`

`class transformers.PegasusXForConditionalGeneration`

`forward`

`class transformers.PersimmonConfig`

`class transformers.PersimmonModel`

`forward`

`class transformers.PersimmonForCausalLM`

`forward`

`class transformers.PersimmonForSequenceClassification`

`forward`

`class transformers.PhiConfig`

`class transformers.PhiModel`

`forward`

`class transformers.PhiForCausalLM`

`forward`

`class transformers.PhiForSequenceClassification`

`forward`

`class transformers.PhiForTokenClassification`

`forward`

`class transformers.PhobertTokenizer`

`add_from_file`

`build_inputs_with_special_tokens`

`create_token_type_ids_from_sequences`

`get_special_tokens_mask`

`class transformers.PLBartConfig`

`class transformers.PLBartTokenizer`

`build_inputs_with_special_tokens`

`class transformers.PLBartModel`

`forward`

`class transformers.PLBartForConditionalGeneration`

`forward`

`class transformers.PLBartForSequenceClassification`

`forward`

`class transformers.PLBartForCausalLM`

`forward`

`class transformers.ProphetNetConfig`

`class transformers.ProphetNetTokenizer`

`build_inputs_with_special_tokens`

`convert_tokens_to_string`

`create_token_type_ids_from_sequences`

`get_special_tokens_mask`

`class transformers.models.prophetnet.modeling_prophetnet.ProphetNetSeq2SeqLMOutput`

`class transformers.models.prophetnet.modeling_prophetnet.ProphetNetSeq2SeqModelOutput`

`class transformers.models.prophetnet.modeling_prophetnet.ProphetNetDecoderModelOutput`

`class transformers.models.prophetnet.modeling_prophetnet.ProphetNetDecoderLMOutput`

`class transformers.ProphetNetModel`

`forward`

`class transformers.ProphetNetEncoder`

`class transformers.ProphetNetDecoder`

`forward`

`class transformers.ProphetNetForConditionalGeneration`

`forward`

`class transformers.ProphetNetForCausalLM`

`forward`

`class transformers.QDQBertConfig`

`class transformers.QDQBertModel`

`forward`

`class transformers.QDQBertLMHeadModel`

`forward`

`class transformers.QDQBertForMaskedLM`

`forward`

`class transformers.QDQBertForSequenceClassification`

`forward`

`class transformers.QDQBertForNextSentencePrediction`

`forward`

`class transformers.QDQBertForMultipleChoice`

`forward`

`class transformers.QDQBertForTokenClassification`

`forward`

`class transformers.QDQBertForQuestionAnswering`

`forward`