fasttext的使用，预料格式，调用方法

数据格式：分词后的句子+\t__label__+标签

fasttext_model.py

from fasttext import FastText
import numpy as np

def get_data_path(by_word=True,train=True):
    if by_word:
        return "./classify/data_by_word_train.txt" if train else "./classify/data_by_word_test.txt"
    else:
        return "./classify/data_train.txt" if train else "./classify/data_test.txt"

def prepar_model():
    data_path = get_data_path(by_word=True,train=True)
    model = FastText.train_supervised(data_path,dim=100,epoch=20,wordNgrams=2)
    model.save_model("./fasttext_model/classify_by_word_100_20_2.model")

def ceshi_model():
    model = FastText.load_model("./fasttext_model/classify_by_word_100_20_2.model")
    test_data_path = get_data_path(by_word=True, train=False)

    sentences = []
    labels = []
    for line in open(test_data_path,encoding="utf-8").readlines():
        line = line.strip()
        temp_ret = line.split("\t")
        if len(temp_ret)==2:
            sentences.append(temp_ret[0])
            labels.append(temp_ret[1])

    ret = model.predict(sentences)[0]
    ret = [i[0] for i in ret]
    acc = np.mean([1 if labels[i] == ret[i] else 0 for i in range(len(labels))])
    print(acc)

if __name__ == '__main__':
    prepar_model()
    ceshi_model()

posted @ 2020-02-20 00:05 高颜值的殺生丸阅读(1419) 评论(0) 收藏举报

刷新页面返回顶部

点此进入CSDN

高颜值的殺生丸

博主擅长python和c++，从事人工智能领域自然语言处理和图像识别方面的工作，欢迎大家来讨论交流

fasttext的使用，预料格式，调用方法

作者信息

昵称：

园龄：4年6个月

粉丝：1209

QQ：522414928