# 使用ML.NET实现情感分析[新手篇]

1. 描述问题产生的场景

2. 针对特定场景收集数据

3. 对数据预处理

4. 确定模型（算法）进行训练

5. 对训练好的模型进行验证和调优

6. 使用模型进行预测分析

接下来我将用案例逐一介绍。

#### 针对特定场景收集数据

kaggle.com一个著名的计算科学与机器学习竞赛网站

A very, very, very slow-moving, aimless movie about a distressed, drifting young man.  	0
Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out.  	0
Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent.  	0
Very little music or anything to speak of.  	0
The best scene in the movie was when Gerardo is trying to find a song that keeps running through his head.  	1
The rest of the movie lacks art, charm, meaning... If it's about emptiness, it works I guess because it's empty.  	0
Wasted two hours.  	0
...

#### 对数据预处理

const string _dataPath = @".\data\sentiment labelled sentences\imdb_labelled.txt";
const string _testDataPath = @".\data\sentiment labelled sentences\yelp_labelled.txt";

public class SentimentData
{
[Column(ordinal: "0")]
public string SentimentText;
[Column(ordinal: "1", name: "Label")]
public float Sentiment;
}

var pipeline = new LearningPipeline();
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));

#### 确定模型（算法）进行训练

public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Sentiment;
}

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();


#### 对训练好的模型进行验证和调优

var testData = new TextLoader<SentimentData>(_testDataPath, useHeader: false, separator: "tab");
var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine();
Console.WriteLine("PredictionModel quality metrics evaluation");
Console.WriteLine("------------------------------------------");
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}"); Console.WriteLine($"Auc: {metrics.Auc:P2}");
Console.WriteLine($"F1Score: {metrics.F1Score:P2}"); 像Accuracy，Auc，F1Score都是一些常见的评价指标，包含了正确率、误差一类的得分，如果得分很低，就需要调整前一个步骤中定义模型时的参数值。详细的解释参考：Machine learning glossary #### 使用模型进行预测分析 训练好一个称心如意的模型后，就可以正式使用了。本质上就是再取来一些没有人工标注结果的数据，让模型进行分析返回一个符合某目标值的概率。代码片段如下： IEnumerable<SentimentData> sentiments = new[] { new SentimentData { SentimentText = "Contoso's 11 is a wonderful experience", Sentiment = 0 }, new SentimentData { SentimentText = "The acting in this movie is very bad", Sentiment = 0 }, new SentimentData { SentimentText = "Joe versus the Volcano Coffee Company is a great film.", Sentiment = 0 } }; IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments); Console.WriteLine(); Console.WriteLine("Sentiment Predictions"); Console.WriteLine("---------------------"); var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction)); foreach (var item in sentimentsAndPredictions) { Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");
}

using System;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;

namespace SentimentAnalysis
{
class Program
{
const string _dataPath = @".\data\sentiment labelled sentences\imdb_labelled.txt";
const string _testDataPath = @".\data\sentiment labelled sentences\yelp_labelled.txt";

public class SentimentData
{
[Column(ordinal: "0")]
public string SentimentText;
[Column(ordinal: "1", name: "Label")]
public float Sentiment;
}

public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Sentiment;
}

public static PredictionModel<SentimentData, SentimentPrediction> Train()
{
var pipeline = new LearningPipeline();
pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();
return model;
}

public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model)
{
var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine();
Console.WriteLine("PredictionModel quality metrics evaluation");
Console.WriteLine("------------------------------------------");
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}"); Console.WriteLine($"Auc: {metrics.Auc:P2}");
Console.WriteLine($"F1Score: {metrics.F1Score:P2}"); } public static void Predict(PredictionModel<SentimentData, SentimentPrediction> model) { IEnumerable<SentimentData> sentiments = new[] { new SentimentData { SentimentText = "Contoso's 11 is a wonderful experience", Sentiment = 0 }, new SentimentData { SentimentText = "The acting in this movie is very bad", Sentiment = 0 }, new SentimentData { SentimentText = "Joe versus the Volcano Coffee Company is a great film.", Sentiment = 0 } }; IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments); Console.WriteLine(); Console.WriteLine("Sentiment Predictions"); Console.WriteLine("---------------------"); var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction)); foreach (var item in sentimentsAndPredictions) { Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");
}
Console.WriteLine();
}

static void Main(string[] args)
{
var model = Train();
Evaluate(model);
Predict(model);
}
}
}

posted on 2018-05-10 23:28  Bean.Hsiang  阅读(14993)  评论(11编辑  收藏  举报