OpenKiwi学习笔记
Python 打印调用栈:
import traceback traceback.print_stack()
For CLI usage, the general command is:
kiwi (train|pretrain|predict|evaluate|search) CONFIG_FILE
导入bert模型:
from transformers import (
BERT_PRETRAINED_MODEL_ARCHIVE_LIST,
DISTILBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
AutoTokenizer,
BertConfig,
BertModel,
)
qe system 的组成:
Furthermore, all of our QE systems share a similar architecture. They are composed of - Encoder - Decoder - Output - (Optionally) TLM Output Encoder: Embedding and creating features to be used for downstream tasks. i.e. Predictor, BERT, etc Decoder: Responsible for learning feature transformations better suited for the downstream task. i.e. MLP, LSTM, etc Output: Simple feedforwards that take decoder features and transform them into the prediction required by the downstream task. Something in the same line as the common "classification heads" being used with transformers. TLM Output: A simple output layer that trains for the specific TLM objective. It can be useful to continue finetuning the predictor during training of the complete QE system.
qe system 继承关系:
All QE systems inherit from :class:`kiwi.systems.qe_system.QESystem`. Use ``kiwi train`` to train these systems. Currently available are: +--------------------------------------------------------------+ | :class:`kiwi.systems.nuqe.NuQE` | +--------------------------------------------------------------+ | :class:`kiwi.systems.predictor_estimator.PredictorEstimator` | +--------------------------------------------------------------+ | :class:`kiwi.systems.bert.Bert` | +--------------------------------------------------------------+ | :class:`kiwi.systems.xlm.XLM` | +--------------------------------------------------------------+ | :class:`kiwi.systems.xlmroberta.XLMRoberta` | +--------------------------------------------------------------+
TLM 继承关系:
TLM --- :mod:`kiwi.systems.tlm_system` -------------------------------------- All TLM systems inherit from :class:`kiwi.systems.tlm_system.TLMSystem`. Use ``kiwi pretrain`` to train these systems. These systems can then be used as the encoder part in QE systems by using the `load_encoder` flag. Currently available are: +-------------------------------------------+ | :class:`kiwi.systems.predictor.Predictor` |
Configuration class
kiwi.lib.train.Configuration kiwi.lib.train.RunConfig kiwi.lib.train.TrainerConfig kiwi.data.datasets.wmt_qe_dataset.WMTQEDataset.Config kiwi.systems.qe_system.QESystem.Config
encoder 采用 bert 模型,模型整体结构为:
Bert(
(encoder): BertEncoder(
(bert): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(119547, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
................
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
(scalar_mix): ScalarMixWithDropout(
(scalar_parameters): ParameterList(
(0): Parameter containing: [torch.FloatTensor of size 1]
(1): Parameter containing: [torch.FloatTensor of size 1]
(2): Parameter containing: [torch.FloatTensor of size 1]
(3): Parameter containing: [torch.FloatTensor of size 1]
(4): Parameter containing: [torch.FloatTensor of size 1]
(5): Parameter containing: [torch.FloatTensor of size 1]
(6): Parameter containing: [torch.FloatTensor of size 1]
(7): Parameter containing: [torch.FloatTensor of size 1]
(8): Parameter containing: [torch.FloatTensor of size 1]
(9): Parameter containing: [torch.FloatTensor of size 1]
(10): Parameter containing: [torch.FloatTensor of size 1]
(11): Parameter containing: [torch.FloatTensor of size 1]
(12): Parameter containing: [torch.FloatTensor of size 1]
)
)
(output_embeddings): Embedding(119547, 768, padding_idx=0)
)
(decoder): LinearDecoder(
(linear_outs): ModuleDict(
(target): Sequential(
(0): Linear(in_features=768, out_features=768, bias=True)
(1): Tanh()
)
(source): Sequential(
(0): Linear(in_features=768, out_features=768, bias=True)
(1): Tanh()
)
(target_sentence): Sequential(
(0): Linear(in_features=768, out_features=768, bias=True)
(1): Tanh()
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=768, out_features=768, bias=True)
(4): Tanh()
(5): Dropout(p=0.1, inplace=False)
)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(outputs): QEOutputs(
(word_outputs): ModuleDict(
(target_tags): WordLevelOutput(
(linear): Linear(in_features=768, out_features=2, bias=True)
(loss_fn): CrossEntropyLoss()
)
(gap_tags): GapTagsOutput(
(linear): Linear(in_features=1536, out_features=2, bias=True)
(loss_fn): CrossEntropyLoss()
)
(source_tags): WordLevelOutput(
(linear): Linear(in_features=768, out_features=2, bias=True)
(loss_fn): CrossEntropyLoss()
)
)
(sentence_outputs): ModuleDict(
(sentence_scores): SentenceScoreRegression(
(sentence_pred): Sequential(
(linear_0): Linear(in_features=768, out_features=384, bias=True)
(activation_0): Tanh()
(dropout_0): Dropout(p=0.0, inplace=False)
(linear_1): Linear(in_features=384, out_features=1, bias=True)
)
(loss_fn): MSELoss()
)
)
)
(tlm_outputs): TLMOutputs(
(masked_word_outputs): ModuleDict()
)
)
cli.py:print(arguments):
{'--example': False,
'--help': False,
'--quiet': False,
'--verbose': False,
'--version': False,
'CONFIG_FILE': 'config/bert.yaml',
'OVERWRITES': [],
'evaluate': False,
'predict': False,
'pretrain': False,
'search': False,
'train': True}
cli.py:config_dict:
{
'run': {
'experiment_name': 'BERT WMT20 EN-ZH',
'seed': 42,
'use_mlflow': False
},
'trainer': {
'deterministic': True,
'gpus': 1,
'epochs': 10,
'main_metric': ['WMT19_MCC', 'PEARSON'],
'gradient_max_norm': 1.0,
'gradient_accumulation_steps': 1,
'amp_level': 'O2',
'precision': 16,
'log_interval': 100,
'checkpoint': {
'validation_steps': 0.2,
'early_stop_patience': 10
}
},
'system': {
'class_name': 'Bert',
'batch_size': 2,
'num_data_workers': 1,
'model': {
'encoder': {
'model_name': 'bert-base-multilingual-cased',
'use_mlp': False,
'freeze': False
},
'decoder': {
'hidden_size': 768,
'bottleneck_size': 768,
'dropout': 0.1
},
'outputs': {
'word_level': {
'target': True,
'gaps': True,
'source': True,
'class_weights': {
'target_tags': {
'BAD': 3.0
},
'gap_tags': {
'BAD': 5.0
},
'source_tags': {
'BAD': 3.0
}
}
},
'sentence_level': {
'hter': True,
'use_distribution': False,
'binary': False
},
'n_layers_output': 2,
'sentence_loss_weight': 1
},
'tlm_outputs': {
'fine_tune': False
}
},
'optimizer': {
'class_name': 'adamw',
'learning_rate': 1e-05,
'warmup_steps': 0.1,
'training_steps': 12000
},
'data_processing': {
'share_input_fields_encoders': True
}
},
'data': {
'train': {
'input': {
'source': 'data/WMT20/en-zh/train/train.src',
'target': 'data/WMT20/en-zh/train/train.mt',
'alignments': 'data/WMT20/en-zh/train/train.src-mt.alignments',
'post_edit': 'data/WMT20/en-zh/train/train.pe'
},
'output': {
'source_tags': 'data/WMT20/en-zh/train/train.source_tags',
'target_tags': 'data/WMT20/en-zh/train/train.tags',
'sentence_scores': 'data/WMT20/en-zh/train/train.hter'
}
},
'valid': {
'input': {
'source': 'data/WMT20/en-zh/dev/dev.src',
'target': 'data/WMT20/en-zh/dev/dev.mt',
'alignments': 'data/WMT20/en-zh/dev/dev.src-mt.alignments',
'post_edit': 'data/WMT20/en-zh/dev/dev.pe'
},
'output': {
'source_tags': 'data/WMT20/en-zh/dev/dev.source_tags',
'target_tags': 'data/WMT20/en-zh/dev/dev.tags',
'sentence_scores': 'data/WMT20/en-zh/dev/dev.hter'
}
},
'test': {
'input': {
'source': 'data/WMT20/en-zh/test-blind/test.src',
'target': 'data/WMT20/en-zh/test-blind/test.mt',
'alignments': 'data/WMT20/en-zh/test-blind/test.src-mt.alignments'
}
}
},
'verbose': False,
'quiet': False
}
QESystem:
def forward(self, batch_inputs):
encoder_features = self.encoder(batch_inputs)
features = self.decoder(encoder_features, batch_inputs)
outputs = self.outputs(features, batch_inputs)
# For fine-tuning the encoder
if self.tlm_outputs:
outputs.update(self.tlm_outputs(encoder_features, batch_inputs))
return outputs
- pytorch_lightning 执行流程:
PL的流程很简单,生产流水线,有一个固定的顺序。
这部分代码只执行一次。
1. `__init__()`(初始化 LightningModule ) 2. `prepare_data()` (准备数据,包括下载数据、预处理等等) 3. `configure_optimizers()` (配置优化器)
测试 “验证代码”,提前来做的意义在于:不需要等待漫长的训练过程才发现验证代码有错。
1. `val_dataloader()` 2. `validation_step()` 3. `validation_epoch_end()`
batch 数据:
{
'source': BatchedSentence(
tensor = tensor([
[
15846, 10491, 82978, 10226, 75312, 10571, 10105, 106095, 11942,
38587, 10108, 10955, 11586, 119, 102, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0
],
[
10135, 10386, 11288, 10207, 117, 17668, 11945, 39091, 10226,
10751, 14310, 107, 17316, 10230, 10108, 11589, 107, 117,
17846, 15736, 10188, 13209, 14951, 13240, 117, 11406, 14320,
10270, 10108, 10226, 84977, 11305, 68999, 10731, 11202, 31419,
119, 102
]
], device = 'cuda:0'),
lengths = tensor([15, 38], device = 'cuda:0'),
bounds = tensor([
[0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
[0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 34, 36, 37]
], device = 'cuda:0'),
bounds_lengths = tensor([13, 31], device = 'cuda:0'),
strict_masks = tensor([
[
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False
],
[ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, False
]
], device = 'cuda:0'),
number_of_tokens = tensor([12, 30], device = 'cuda:0', dtype = torch.int32)
),
'target': BatchedSentence(
tensor = tensor([
[ 101, 4877, 113183, 113227, 118188, 3031, 2128, 114696, 217,
2674, 115512, 118188, 5718, 7724, 111978, 6348, 114696, 2079,
5740, 115551, 2196, 5718, 2773, 111915, 2206, 119, 102,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0
],
[ 101, 10207, 3642, 10186, 4460, 10386, 4348, 10064, 4536,
4580, 114286, 7735, 117459, 2196, 5718, 84977, 11305, 68999,
10731, 4237, 113162, 6063, 10270, 8272, 10064, 4163, 112293,
2146, 2196, 5718, 4333, 5718, 107, 3976, 5718, 4614,
113664, 107, 10064, 2460, 111870, 4461, 3228, 118573, 113826,
217, 4608, 5718, 4784, 112939, 1882, 102
]
], device = 'cuda:0'),
lengths = tensor([27, 52], device = 'cuda:0'),
bounds = tensor([
[0, 1, 5, 6, 8, 9, 12, 13, 15, 17, 18, 20, 21, 22, 24, 25, 26, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 19, 21, 22, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 41, 42, 45, 46,47, 48, 50, 51]
], device = 'cuda:0'),
bounds_lengths = tensor([17, 40], device = 'cuda:0'),
strict_masks = tensor([
[
False, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False
],
[
False, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, False
]
], device = 'cuda:0'),
number_of_tokens = tensor([15, 38], device = 'cuda:0', dtype = torch.int32)
),
'alignments': tensor([
[
[1, 0, 0, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
..., [0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]
],
[
[1, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
..., [0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 1, 0],
[0, 0, 0, ..., 0, 0, 0]
]
], device = 'cuda:0', dtype = torch.int32),
'pe': BatchedSentence(
tensor = tensor([
[
101, 4877, 113183, 113227, 118188, 10060, 15846, 10061, 3031,
2128, 114696, 217, 2674, 115512, 118188, 10060, 10955, 11586,
10061, 3740, 117490, 2775, 5718, 2212, 114236, 2468, 7475,
117244, 5740, 115551, 2196, 5718, 2773, 112165, 1882, 102,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0
],
[
101, 10207, 3642, 10186, 4460, 10386, 4348, 10064, 2234,
114346, 4520, 7735, 117459, 2196, 5718, 84977, 11305, 68999,
10731, 4237, 113162, 6063, 10270, 8272, 8505, 114003, 2196,
5718, 4333, 4447, 115521, 100, 17316, 10230, 10108, 11589,
100, 10064, 4792, 114213, 5611, 13209, 14951, 13240, 2726,
111847, 5162, 112652, 1882, 102
]
], device = 'cuda:0'),
lengths = tensor([36, 50], device = 'cuda:0'),
bounds = tensor([
[0, 1, 5, 6, 7, 8, 9, 11, 12, 15, 16, 17, 18, 19, 21, 22, 23, 25, 26, 28, 30, 31, 32, 34, 35, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 14, 15, 19, 21, 22, 23, 24, 26, 27, 28, 29, 31, 32, 34, 35, 36, 37, 38, 40, 41, 43, 44, 46, 48, 49
]
], device = 'cuda:0'), bounds_lengths = tensor([25, 37], device = 'cuda:0'),
strict_masks = tensor([
[ False, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False
],
[ False, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, False
]
],
device = 'cuda:0'),
number_of_tokens = tensor([23, 35], device = 'cuda:0', dtype = torch.int32)
),
'source_tags': BatchedSentence(
tensor = tensor([
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1]
], device = 'cuda:0'),
lengths = tensor([12, 30], device = 'cuda:0'),
bounds = tensor([
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
], device = 'cuda:0'),
bounds_lengths = tensor([12, 30], device = 'cuda:0'),
strict_masks = tensor([
[ True, True, True, True, True, True, True, True, True, True,
True, True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False
],
[ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True
]
], device = 'cuda:0'),
number_of_tokens = tensor([12, 30], device = 'cuda:0', dtype = torch.int32)
),
'target_tags': BatchedSentence(
tensor = tensor([
[1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]
], device = 'cuda:0'),
lengths = tensor([15, 38], device = 'cuda:0'),
bounds = tensor([
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]
], device = 'cuda:0'),
bounds_lengths = tensor([15, 38], device = 'cuda:0'),
strict_masks = tensor([
[ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False
],
[ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True
]
], device = 'cuda:0'),
number_of_tokens = tensor([15, 38], device = 'cuda:0', dtype = torch.int32)
),
'sentence_scores': tensor([0.6522, 0.5143], device = 'cuda:0'),
'gap_tags': BatchedSentence(
tensor = tensor([
[1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
], device = 'cuda:0'),
lengths = tensor([16, 39], device = 'cuda:0'),
bounds = tensor([
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]
], device = 'cuda:0'),
bounds_lengths = tensor([16, 39], device = 'cuda:0'),
strict_masks = tensor([
[ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False
],
[ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True
]
], device = 'cuda:0'),
number_of_tokens = tensor([16, 39], device = 'cuda:0', dtype = torch.int32)),
'binary': tensor([1, 1], device = 'cuda:0')
}
result example:
2021-08-27T08:15:17Z INFO kiwi.training.callbacks:117: Best validation so far was in epoch 0: val_loss: 85.9468, val_WMT19_MCC: 0.5628, val_loss_target_tags: 30.9671, val_loss_gap_tags: 22.8837, val_loss_source_tags: 32.0960, val_WMT19_F1_MULT: 0.5807, val_F1_BAD: 0.7120, val_F1_OK: 0.3879, val_WMT19_CORRECT: 0.7842, val_target_tags_F1_MULT: 0.3510, val_target_tags_MCC: 0.3523, val_target_tags_CORRECT: 0.6544, val_gap_tags_F1_MULT: 0.2005, val_gap_tags_MCC: 0.1625, val_gap_tags_CORRECT: 0.9068, val_source_tags_F1_MULT: 0.2762, val_source_tags_MCC: 0.2858, val_source_tags_CORRECT: 0.6083
未完待续。。。。。。
浙公网安备 33010602011771号