实验记录 | BERT 文本情感分类

Bert 文本情感分类实战

Intro

BERT 模型直接下载来自 HuggingFace 上由 Google 发布的 bert-case-uncased 预训练模型。

数据集采用 IMDB(Internet Movie Database Dataset),这是 NLP 领域中一个非常著名和广泛使用的数据集,主要应用于文本情感分析任务。

Experimental Setup

参考 [1] [2] 进行实验。加速卡为 3080Ti。

  • nproc = 64
  • Core(s) per socket = 16
  • Intel Xeon Silver4314@2.40 GHz
  • NVIDIA Corporation GA102 [GeForce RTX 3080 Ti]
  • CUDA Version = v11.8.89
  • Python 3.10.5

Train

教程给出的显存占用约 6G,在 3090 的运行时间约 40 min. 实测 3080 运行时间在 2~3 h.

Debug

首次运行迟缓

  1%|          | 48/9375 [07:28<24:12:23,  9.34s/it]

排查后定位为 pytorch-xla 未能自动选择 GPU 作为默认 PJRT (Portable JIT Runtime)。排查及修改如下。

# print(os.environ.get('PJRT_DEVICE'))
os.environ['PJRT_DEVICE'] = 'GPU'

解决该问题后又运行了两次,第一次在 model.save_pretrain 处中断,建议首先执行 model.to(cpu) 再保存。下文展示图表为第二次运行结果。

Results

SwanLab

Train

image

Eval

image

Predict

{'train_runtime': 9730.7196, 'train_samples_per_second': 7.708, 'train_steps_per_second': 0.963, 'train_loss': 0.17324476619084675, 'epoch': 3.0}                                                                         
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9375/9375 [2:42:06<00:00,  1.04s/it]
sft finished, saving pre_model...
Input Text: I absolutely loved this movie! The storyline was captivating and the acting was top-notch. A must-watch for everyone.
Predicted class: 1 positive
Input Text: This movie was a complete waste of time. The plot was predictable and the characters were poorly developed.
Predicted class: 0 negative
Input Text: An excellent film with a heartwarming story. The performances were outstanding, especially the lead actor.
Predicted class: 1 positive
Input Text: I found the movie to be quite boring. It dragged on and didn't really go anywhere. Not recommended.
Predicted class: 0 negative
Input Text: A masterpiece! The director did an amazing job bringing this story to life. The visuals were stunning.
Predicted class: 1 positive
Input Text: Terrible movie. The script was awful and the acting was even worse. I can't believe I sat through the whole thing.
Predicted class: 0 negative
Input Text: A delightful film with a perfect mix of humor and drama. The cast was great and the dialogue was witty.
Predicted class: 1 positive
Input Text: I was very disappointed with this movie. It had so much potential, but it just fell flat. The ending was particularly bad.
Predicted class: 0 negative
Input Text: One of the best movies I've seen this year. The story was original and the performances were incredibly moving.
Predicted class: 1 positive
Input Text: I didn't enjoy this movie at all. It was confusing and the pacing was off. Definitely not worth watching.
Predicted class: 0 negative

Summary

遇到的报错怀疑还是 pytorch-xla 版本问题所致。做实验的服务器正在复现 MAGIS,需要做多个后端的对比实验,各种软件包为兼容性做出牺牲,均控制在较低版本,导致这个 BERT 实验的代码不能直接运行。


  1. Zeyi-Lin/BERT-IMDB: BERT-IMDB电影评论情感分类实战:SwanLab可视化训练 ↩︎

  2. Bert文本情感分类(完整代码、数据集、实验过程) - 知乎 ↩︎

posted @ 2025-04-21 14:08  Miya_Official  阅读(173)  评论(0)    收藏  举报