ml.net例子笔记4-ml.net v2版本例子运行

1 Ml.NET版本更新

当前的Microsoft.ML的软件版本如下:

https://gitee.com/mirrors_feiyun0112/machinelearning-samples.zh-cn 例子使用版本为1.6.0
例子工程更换版本的办法:
1 Directory.Build.props nuget.config
修改samples目录下文件Directory.Build.props的内容


~~ ~~
** 2.0.1**
0.18.0


2 打开samples\csharp\All-Samples.sln解决方案
VisualStudio就会加载新的版本的Microsoft.ML库

如以前的工程的引用ml.net库的地方类似如下:

2 例子更新版本到ml.net2.0.1

3 情绪分析例子 [SentimentAnalysis]

SentimentAnalysisConsoleApp.csproj工程的设置修改为:

Exe ** netcoreapp2.1** **变更为如下:** Exe ** .net6.0** latest 最终编译结果的差别如下: 使用ml.net v2后指定.net6的编译文件 ![](https://cdn.nlark.com/yuque/0/2023/png/2964849/1703036631994-5b2ce9b6-8404-4dc8-ba12-c400939c8b4d.png#averageHue=%23f9f8f7&id=txQRo&originHeight=982&originWidth=1165&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 运行程序 ![](https://cdn.nlark.com/yuque/0/2023/png/2964849/1703036632316-e337db90-d185-49ac-ab2a-b65535e7e1de.png#averageHue=%23fcf5e2&id=KlWIL&originHeight=543&originWidth=1404&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ### 4 **垃圾信息检测** SpamDetectionConsoleApp设置后运行 ![](https://cdn.nlark.com/yuque/0/2023/png/2964849/1703036632637-f37893ac-1301-4050-a855-dbb4cc56ab11.png#averageHue=%23faf2de&id=rbSFU&originHeight=980&originWidth=1340&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ## 5 ML.NET2官方的例子 [https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/getting-started/MLNET2](https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/getting-started/MLNET2) [https://gitee.com/mirrors_dotnet/machinelearning-samples](https://gitee.com/mirrors_dotnet/machinelearning-samples) 这是gitee中国镜像站,1.8G,很大的文件 目前这个是英文的

6 AutoML

  • AutoMLQuickStart - C# console application that shows how to get started with the AutoML API.
  • AutoMLAdvanced - C# console application that shows the following concepts:
    • Modifying column inference results
    • Excluding trainers
    • Configuring monitoring
    • Choosing tuners
    • Cancelling experiments
  • AutoMLEstimators - C# console application that shows how to:
    • Customize search spaces
    • Create sweepable estimators
  • AutoMLTrialRunner - C# console application that shows how to create your own trial runner for the Text Classification API.

7 Natural Language Processing (NLP)

8 例子解析

数据来源,从这个地址下载

9 句子相似度 SentenceSimilarity

【测试时机器没有cuda环境,使用cpu进行训练】
train.csv - 训练集,包含产品、搜索和相关性分数

id product_uid product_title search_term relevance
2 100001 Simpson Strong-Tie 12-Gauge Angle angle bracket 3
3 100001 Simpson Strong-Tie 12-Gauge Angle l bracket 2.5
9 100002 BEHR Premium Textured DeckOver 1-gal. #SC-141 Tugboat Wood and Concrete Coating deck over 3

home-depot-sentence-similarity.csv数据代码库没有,原始的train.csv 和 home-depot-sentence-similarity.csv关系,可以参考如下下载和生成
https://github.com/dotnet/machinelearning-samples/issues/982 【我按照代码定义的格式写了合并 csv的数据预处理,如下
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace SentenceSimilarity
{
internal class GenData
{
// id product_uid product_title search_term relevance
// 2 100001 Simpson Strong-Tie 12-Gauge Angle angle bracket 3
public class HomeDepot
{
[LoadColumn(0)]
public int id { get; set; }
[LoadColumn(1)]
public int product_uid { get; set; }
[LoadColumn(2)]
public string product_title { get; set; }
[LoadColumn(3)]
public string search_term { get; set; }
[LoadColumn(4)]
public string relevance { get; set; }
}
// https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.custommappingcatalog.custommapping?view=ml-dotnet
[CustomMappingFactoryAttribute("product_description")]
private class ProdDescCustomAction : CustomMappingFactory<HomeDepot, CustomMappingOutput>
{
// We define the custom mapping between input and output rows that will
// be applied by the transformation.
public static void CustomAction(HomeDepot input, CustomMappingOutput
output) => output.product_description = prodDesc[input.product_uid.ToString()];
public override Action<HomeDepot, CustomMappingOutput> GetMapping()
=> CustomAction;
}
// Defines only the column to be generated by the custom mapping
// transformation in addition to the columns already present.
private class CustomMappingOutput
{
public string product_description { get; set; }
}
static Dictionary<string, string> prodDesc = new Dictionary<string, string>();
static void Main(string[] args)
{
var mlContext = new MLContext(seed: 1);
var DataPath = Path.GetFullPath(@"........\Data\product_descriptions.csv");
{
IDataView dv = mlContext.Data.LoadFromTextFile(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true,
columns: new[] {
new TextLoader.Column("product_uid",DataKind.String,0),
new TextLoader.Column("product_description",DataKind.String,1)
}
);
foreach (var row in dv.Preview(maxRows: 15_0000).RowView)
{
string uid="", desc="";
foreach (KeyValuePair<string, object> column in row.Values)
{
if (column.Key == "product_uid")
{
uid = column.Value.ToString();
}
else
{
desc= column.Value.ToString();
}
}
prodDesc[uid] = desc;
}
}
DataPath = Path.GetFullPath(@"........\Data\train.csv");
IDataView dataView = mlContext.Data.LoadFromTextFile(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true);
var preViewTransformedData = dataView.Preview(maxRows: 5);
foreach (var row in preViewTransformedData.RowView)
{
var ColumnCollection = row.Values;
string lineToPrint = "Row--> ";
foreach (KeyValuePair<string, object> column in ColumnCollection)
{
lineToPrint += $"| {column.Key}:{column.Value}";
}
Console.WriteLine(lineToPrint + "\n");
}
var pipeline = mlContext.Transforms.CustomMapping(new ProdDescCustomAction().GetMapping(), contractName: "product_description");
var transformedData = pipeline.Fit(dataView).Transform(dataView);
//mlContext.ComponentCatalog.RegisterAssembly(typeof(IsUnderThirtyCustomAction).Assembly);
Console.WriteLine("save file");
using FileStream fs = new FileStream(Path.GetFullPath(@"........\Data\home-depot-sentence-similarity.csv"), FileMode.Create);
mlContext.Data.SaveAsText(transformedData, fs, schema: false, separatorChar:',');
}
}
}
具体参考 https://gitee.com/iamops/x-unix-dotnet/blob/main/ml.net2/SentenceSimilarity/GenData.cs

数据放好后运行时,会类似如下下载模型文件:
[Source=NasBertTrainer; TrainModel, Kind=Trace] Channel started
[Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel started
[Source=NasBertTrainer; Ensuring model file is present., Kind=Info] Downloading NasBert2000000.tsm from https://aka.ms/mlnet-resources/models/NasBert2000000.tsm to C:\Users\homelap\AppData\Local\Temp\mlnet\NasBert2000000.tsm
[Source=NasBertTrainer; Ensuring model file is present., Kind=Info] NasBert2000000.tsm: Downloaded 3620 bytes out of 17907563
...

TorchSharp目前版本没有正式发布,例子运行问题多多,如上步骤放好数据文件后,直接运行出现
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:

at TorchSharp.torch+random.THSGenerator_manual_seed(Int64)

at TorchSharp.torch+random.manual_seed(Int64) ...错误

https://github.com/dotnet/machinelearning/issues/6669 按照这个说明设置也不对

ML.NET Version TorchSharp Package Version
2.0.0 0.98.1
2.0.1 0.98.1
3.0.0-preview 0.98.3
Next preview 0.99.5

按如上的设置版本也不对,仍是运行异常,那就进入TorchSharp的源码看看吧。

10 TorchSharp

初步的软件库结构
image.png

如上可见,在Microsoft.ML的大框架下【Microsoft.ML.Core.dll Microsoft.ML.dll Microsoft.ML.PCA.dll Microsoft.ML.Transforms.dll Microsoft.ML.Data.dll Microsoft.ML.KMeansClustering.dll Microsoft.ML.StandardTrainers.dll】
针对Torch,按照ML的框架结构,扩展了Microsoft.ML.TorchSharp这层【Microsoft.ML.TorchSharp.dll】,如下是其扩展的概览图

针对CPU/GPU 的场景,分别提供不同的库支持
TorchSharp TorchAudio TorchVision是使用C#语言实现的不同业务类别的库,这个库在pytorch的C语言库的基础上进行成抽象封装,为Microsoft.ML.TorchSharp提供服务;该工程使用C++语言提供了LibTorchSharp库【

】,供TorchSharp/TorchAudio/TorchVision来调用
LibTorchSharp最后使用P/Invoke的模式来调用 pytorch的c语言库
如下就是TorchSharp 工程封装C语言和打包的配置

相关工程的参考地址:
https://github.com/dotnet/TorchSharp
https://github.com/dotnet/TorchSharpExamples
https://www.nuget.org/packages/TorchSharp/
As we build up to a v1.0 release, we will continue to make breaking changes, but only when we consider it necessary for usability. Similarity to the PyTorch experience is a primary design tenet, and we will continue on that path.
如上可见TorchSharp由于未发布1.0,因此接口变化很快,而且官方仓库在nuget发布的的也没有分支和tag,兼容性问题较大
比如找到这个tag 如 https://github.com/dotnet/TorchSharpExamples/releases/tag/v0.95.4 这个的官方例子运行都有问题【
TorchSharpExamples-0.95.4.tar\TorchSharpExamples-0.95.4\src\CSharp\CSharpExamples

random.manual_seed(1); 这个直接不能访问,报异常
但取main分支,0.100.5的版本运行正常




回到当前的例子:








TorchSharp.dll!TorchSharp.torch.random.manual_seed(long seed) Line 21623
at TorchSharp\torch.cs(21623)

0.98.1的版本在 https://github.com/dotnet/TorchSharp 这个仓库没有分支和标签,这代码够乱的, https://www.nuget.org/packages/TorchSharp-cpu 不知道这个仓库发布的版本的代码来自哪里?
0.99.3的版本 https://github.com/dotnet/TorchSharp/releases/tag/v0.99.3 和Microsoft.ML.TorchSharp版本直接不兼容,运行缺少部分实现,估计版本迭代节奏差别大
找了个有分支的代码0.98.2,分析下

11 TorchSharp 0.98.2版本调试

https://github.com/dotnet/TorchSharp/tree/0.98.2 整个这个分支跟踪下,按照其DEVGUIDE.md的说明,使用devenv构建

msbuild TorchSharp.sln

msbuild TorchSharp.sln /p:Configuration=Release /p:Platform=x64

libtorch 的版本和 pytorch 是对应的,比如 libtorch 1.6.0 对应于 pytorch 1.6.0。

  • DEBUG模式版本

https://download.pytorch.org/libtorch/cpu/
https://download.pytorch.org/libtorch/cu113
https://download.pytorch.org/libtorch/cu113/libtorch-win-shared-with-deps-debug-1.11.0%2Bcu113.zip
https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-1.11.0%2Bcpu.zip
下载后会将这些编译结果自动下载下来

libtorch-cpu\libtorch-win-shared-with-deps-debug-1.11.0%2Bcpu.zip 650M
libtorch-cuda-11.3\libtorch-win-shared-with-deps-debug-1.11.0%2Bcu113.zip 2.7G
文件很大
libtorch-cpu\libtorch-win-shared-with-deps-1.11.0%2Bcpu.zip 143M
libtorch-cuda-11.3\libtorch-win-shared-with-deps-1.11.0%2Bcu113.zip 2G
准备好后,直接vs中调试出现问题的函数,正常没问题

将例子代码拿过来,运行,也正常

  • Release模式版本

https://download.pytorch.org/libtorch/cu113/libtorch-win-shared-with-deps-1.11.0%2Bcu113.zip
https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-1.11.0%2Bcpu.zip

12 例子正常运行

经尝试,只要将如下libtorch的相关库:

例子默认的发布文件不能工作,更换后如下

初步估计是nuget中发布0.98.2版本可能哪里有不一致的地方
【具体的工程参考
https://gitee.com/iamops/x-unix-dotnet/blob/main/ml.net2/SentenceSimilarity/SentenceSimilarity.csproj
运行过程

13 小结

Torch的数据训练使用cpu进行速度的确很慢

SentenceSimilarity -官方的这个例子由于混合了c#和pytorch的c版本,由于torch这块的版本不稳定,使用比较麻烦。具体原因上述已分析。

posted @ 2023-12-20 09:52  2012  阅读(85)  评论(0编辑  收藏  举报