2023 年 9月 20 日随笔档案 - 热爱工作的宁致桑

2023年9月20日

摘要：收集好的txt数据要先过滤一下，删掉无效信息。然后再按下面步骤处理 class DataPreprocessor(): '''def __init__(self, vocab_file, longest_sentence): self.tok = BertTokenizer(vocab_file) 阅读全文

posted @ 2023-09-20 20:22 热爱工作的宁致桑阅读(158) 评论(0) 推荐(0)

文本数据预处理（一）

摘要： # 将所有txt文件拷贝至alltxt这个文件夹 import os import shutil # Create new folder if it doesn't exist if not os.path.exists("alltxt"): os.makedirs("alltxt") # Loop 阅读全文

posted @ 2023-09-20 20:14 热爱工作的宁致桑阅读(35) 评论(0) 推荐(0)

Eva's Notes