实验记录 | BERT 文本情感分类

Bert 文本情感分类实战

Intro

BERT 模型直接下载来自 HuggingFace 上由 Google 发布的 bert-case-uncased 预训练模型。

数据集采用 IMDB（Internet Movie Database Dataset），这是 NLP 领域中一个非常著名和广泛使用的数据集，主要应用于文本情感分析任务。

Experimental Setup

参考 ^[1] ^[2] 进行实验。加速卡为 3080Ti。

nproc = 64
Core(s) per socket = 16
Intel Xeon Silver4314@2.40 GHz
NVIDIA Corporation GA102 [GeForce RTX 3080 Ti]
CUDA Version = v11.8.89
Python 3.10.5

Train

教程给出的显存占用约 6G，在 3090 的运行时间约 40 min. 实测 3080 运行时间在 2~3 h.

Debug

首次运行迟缓

  1%|          | 48/9375 [07:28<24:12:23,  9.34s/it]

排查后定位为 pytorch-xla 未能自动选择 GPU 作为默认 PJRT (Portable JIT Runtime)。排查及修改如下。

# print(os.environ.get('PJRT_DEVICE'))
os.environ['PJRT_DEVICE'] = 'GPU'

解决该问题后又运行了两次，第一次在 model.save_pretrain 处中断，建议首先执行 model.to(cpu) 再保存。下文展示图表为第二次运行结果。

Results

SwanLab

Train

Eval

Predict

{'train_runtime': 9730.7196, 'train_samples_per_second': 7.708, 'train_steps_per_second': 0.963, 'train_loss': 0.17324476619084675, 'epoch': 3.0}                                                                         
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9375/9375 [2:42:06<00:00,  1.04s/it]
sft finished, saving pre_model...
Input Text: I absolutely loved this movie! The storyline was captivating and the acting was top-notch. A must-watch for everyone.
Predicted class: 1 positive
Input Text: This movie was a complete waste of time. The plot was predictable and the characters were poorly developed.
Predicted class: 0 negative
Input Text: An excellent film with a heartwarming story. The performances were outstanding, especially the lead actor.
Predicted class: 1 positive
Input Text: I found the movie to be quite boring. It dragged on and didn't really go anywhere. Not recommended.
Predicted class: 0 negative
Input Text: A masterpiece! The director did an amazing job bringing this story to life. The visuals were stunning.
Predicted class: 1 positive
Input Text: Terrible movie. The script was awful and the acting was even worse. I can't believe I sat through the whole thing.
Predicted class: 0 negative
Input Text: A delightful film with a perfect mix of humor and drama. The cast was great and the dialogue was witty.
Predicted class: 1 positive
Input Text: I was very disappointed with this movie. It had so much potential, but it just fell flat. The ending was particularly bad.
Predicted class: 0 negative
Input Text: One of the best movies I've seen this year. The story was original and the performances were incredibly moving.
Predicted class: 1 positive
Input Text: I didn't enjoy this movie at all. It was confusing and the pacing was off. Definitely not worth watching.
Predicted class: 0 negative

Summary

遇到的报错怀疑还是 pytorch-xla 版本问题所致。做实验的服务器正在复现 MAGIS，需要做多个后端的对比实验，各种软件包为兼容性做出牺牲，均控制在较低版本，导致这个 BERT 实验的代码不能直接运行。

posted @ 2025-04-21 14:08 Miya_Official 阅读(189) 评论(0) 收藏举报

刷新页面返回顶部

isumi

实验记录 | BERT 文本情感分类

Bert 文本情感分类实战

Intro

Experimental Setup

Train

Debug

Results

Train

Eval

Predict

Summary

公告