transformer小白入门

transformer库是huggingface发布的1个框架,非常好用,很多外行看起来高大上的问题,用它都可以轻松解决,先来看1个小例子:

 一、情感分析

from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('you are beautiful')

这简单的三行代码,就能分析出"you are beautiful" 这句话的情感,是积极正向的(即:好话),还是消极负面(即:坏话)。顺利的话,会看到类似下面的输出:

[{'label': 'POSITIVE', 'score': 0.9998794794082642}] 表明这是一句好话,score可以理解为可信度,0.9998即99.98%。另外注意到首次使用 sentiment-analysis 这个分类器时,会从huggingface下载依赖的模型。

万事开头难,如果你第1个示例就跑不通,出现下面的错误:

多半是transformers版本太低。可以用

import transformers
transformers.__version__

看看当前版本,如果是2.1.1就表示太低了,可另开1个终端输入:

pip install --upgrade transformers -i https://pypi.tuna.tsinghua.edu.cn/simple

将其升级至最新版本。

from transformers import pipeline
print(transformers.__version__)
classifier = pipeline('sentiment-analysis')
classifier('you are beautiful')

这次对了,如下图:

 但是有一行警告文字 :

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.

 这个的意思是说,没有指定具体的模型,所以情感分析默认使用了https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english这个模型,建议指定1个具体的模型。

from transformers import pipeline
print(transformers.__version__)
model_id="distilbert-base-uncased-finetuned-sst-2-english"
classifier = pipeline('sentiment-analysis',model=model_id)
classifier('you are beautiful')

警告就被消除了。默认的模型对中文支持并不好,可以到HuggingFace上搜索"sentiment chinese",参考下图:

 可以看到很多模型,我们选下载量排行第1的这个(下图)

复制名称(参考下图)

试一下:

from transformers import pipeline
model_id="hw2942/bert-base-chinese-finetuning-financial-news-sentiment-v2"
classifier = pipeline('sentiment-analysis',model=model_id)
classifier(['这是什么鬼天气!','你可真棒!','看你那脸,拉得跟驴似的!','今天手气真差,又他妈输了!'])

模型首次使用会先下载,然后输出分析结果,可以看到,总体还算靠谱,但也有不太合理的,比如:“这是什么鬼天气!”,“看你那脸,拉得跟驴似的!” ,这二句明显是负面情绪,会被标为“中性”,所以效果好不好,主要还得看模型本身的质量。不过总体来讲,这比先前默认的英文模型,还是要强一些,来看看对比:

 

二、0样本分类

from transformers import pipeline
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

效果:给一段话和几个候选标签,让代码分析每个标签匹配的可信度。以上面的例子来说,最接近education(教育)

 

三、文本生成

from transformers import pipeline
generator = pipeline("text-generation",model="distilgpt2")
generator("once upon a time", max_length=30,num_return_sequences=2)

简单说,就是起个头,让它自己接着编

   

四、填空

from transformers import pipeline
unmasker = pipeline("fill-mask",model="distilroberta-base")
unmasker("I love sweet foods,such as <mask>.", top_k=2)

<mask>部分将由算法自动填充

 

五、阅读理解(提取答案)

from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Is it raining today?",
    context="In the evening, a large cloud drifted in the distance, and soon it began to rain"
)

大致效果就是给它一段话,然后提问,让它从这段话中把跟答案相关的内容找出来。

 

六、翻译 

汉译英

from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-zh-en")
translator("今天是周四,我要吃肯德基。")

英译汉

from transformers import pipeline
translator= pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh")
translator("It's Thursday. I'm gonna eat Kentucky Fried Chicken.")

 

七、生成摘要

from transformers import pipeline
summarizer = pipeline("summarization",model="sshleifer/distilbart-cnn-12-6")
summarizer("""Speaking a language is a skill, like driving a car, playing a musical instrument or learning to swim. 
To be a good driver, you need to practise driving. You can read a book about car mechanics. You can study the rules of the road. 
But nothing is as good for your driving as spending time behind the wheel of a car, actually driving.
It's the same with speaking English. No matter how much you study grammar and vocabulary, if you don't practise spoken communication, it's very difficult to get good at it. 
So maybe you talk to yourself in English as you go about your day. Or maybe you look for opportunities to chat in English with people you meet. 
But however you do it, the most powerful way to improve your English speaking skills is to use them. """,max_length=100)

 

全民AI计划:快来尝试你的第一个AI程序 (baidu.com)

2 🤗 Transformers pipeline 使用 (zhihu.com)

transformers/README_zh-hans.md at main · huggingface/transformers (github.com)

posted @ 2023-08-20 16:05  菩提树下的杨过  阅读(404)  评论(0编辑  收藏  举报