对于大文件操作,readline 的方式读取文档,那操作起来跟蜗牛爬一样的慢了, 于是使用内存映射技术,

参考微软的这个使用方法说明

https://msdn.microsoft.com/zh-cn/library/dd997372(v=vs.110).aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1

1: 主要用到下面两个方法,一个是打开一个文本,一个是对文本进行操作

using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(srcFile, FileMode.Open, "xyz", size))

using (mmf_reader = mmf.CreateViewAccessor(0, portion1, MemoryMappedFileAccess.Read))
可以对文本按照位置提取相应内容,提取的内容可以使用byte方式 ,若是都为英文或者数字键盘符号的文档.这样节省很多资源

2:按照块提取会存在一个问题,你的分块肯定会破坏文档里边的整个行,所以我的方法是按照分块,确定块的附近换行符的位置.

3:根据换行符 确定了位置,把一个大文件分成块,当然也可以一次读入到内存,做进一步处理,你可以参考微软的帮助做相应的优化.

4: 打开一个新的保存文件,不破坏源文件,不在源文件上操作, 

5:启动task[]线程组,每个块分配一个task去做相关处理.

6:for循环涉及到每块处理的先后顺序

  Task[] tasks = new Task[t];
            for (int i = 0; i < mappedFiles.Count; i++)
            {
                tasks[i] = Task.Factory.StartNew(action, i);
                tasks[i].Wait();
            }

这里使用了Action 无返回值,和Fuc 带参数有返回值两种方式,我选择了前者 .

  Action<object> action = (object obj) =>
 {
}

7:还可以使用streamreader 对打开的文本进行操作

 using (FileStream fs = new FileStream(TXT_FILE_PATH, FileMode.Open, FileAccess.ReadWrite))
            {
                long targetRowNum = ttargetRowNum + 1;//目标行
                long curRowNum = 1;//当前行
                FILE_SIZE = fs.Length;
                using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(fs, "test", fs.Length, MemoryMappedFileAccess.ReadWrite, null, HandleInheritability.None, false))
                {
                    long offset = 0;
                    //int limit = 250;
                    int limit = 200;
                    try
                    {
                        StringBuilder sbDefineRowLine = new StringBuilder();
                        do
                        {
                            long remaining = fs.Length - offset;

  using (MemoryMappedViewStream mmStream = mmf.CreateViewStream(offset, remaining > limit ? limit : remaining))
//using (MemoryMappedViewStream mmStream = mmf.CreateViewStream(offset, remaining))
                            {
                                offset += limit;
                                using (StreamReader sr = new StreamReader(mmStream))
                                {
                                    //string ss = sr.ReadToEnd().ToString().Replace("\n", "囧").Replace(Environment.NewLine, "囧");
                                    string ss = sr.ReadToEnd().ToString().Replace("\n", SPLIT_VARCHAR).Replace(Environment.NewLine, SPLIT_VARCHAR);

可以直接读取块到byte[] buffer 块 我用的这种

private static void SpiltFile(string srcFile, int portionSize)  
       {  
           string savedPath = @"\\stcsrv-c81\MMFeedHealthyDatacache\2016_07_10\Feedkeys\No_Process_test.txt";  
           FileInfo fi = new FileInfo(srcFile);  
           // total size in bytes  
           Int64 size = fi.Length;  
           object locker = new object();  
           object writeLock = new object();  
           List<MappedFile> mappedFiles = new List<MappedFile>();  
           Int64 fileToRead = size;//文件总的大小  
       
           portionSize = portionSize * 1024 * 1024; //每块大小  
  
           Int64 portion = (Int64)Math.Ceiling(size * 1.0 / portionSize); //分成多少块  
  
           Int64 fileOffset = 0;  
  
           MemoryMappedViewAccessor mmf_reader = null;  
           Stopwatch watch = Stopwatch.StartNew();  
           watch.Start();  
           Int64 fileSize = 0;  
           using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(srcFile, FileMode.Open, "xyz", size))  
           {  
               //using (var writeMap = MemoryMappedFile.CreateFromFile(savedPath, FileMode.Create, "test", size, MemoryMappedFileAccess.ReadWrite))  
               //{  
                   //bool mutexCreated;  
                   //Mutex mutex = new Mutex(true, "testmapmutex", out mutexCreated);//进程间同步  
                   Parallel.For(0, portion, (i, ParallelLoopState) =>  
                   {  
  
                       //for (int i = 26; i < portion; i++)  
                       //{  
                       lock (locker)  
                       {  
                           fileSize = Math.Min(portionSize, fileToRead - portionSize * i);  
                           if (fileSize > 0)  
                           {  
                               byte[] buffer;  
                               using (mmf_reader = mmf.CreateViewAccessor(i * portionSize, fileSize, MemoryMappedFileAccess.Read))  
                               {  
                                   buffer = new byte[fileSize];  
                                   mmf_reader.ReadArray(0, buffer, 0, (int)fileSize);  
                                   mappedFiles.Add(new MappedFile  
                                   {  
                                       Offset = i * portionSize, //fileOffset,  
                                       Buffer = buffer,  
                                       FileSize = fileSize  
                                   });  
                               }  
  
                               //fileToRead -= fileSize;  
                               //lock (writeLock)  
                               //{  
                               //using (var writeMmf = MemoryMappedFile.OpenExisting("xyz"))  
                               //{  
                               //    using (var writeAccessor = writeMmf.CreateViewStream(i * portionSize, fileSize))  
                               //    {  
                               //        var w = new BinaryWriter(new FileStream(savedPath, FileMode.Create, FileAccess.Write));  
                               //        //writeAccessor.WriteArray(i * portionSize, buffer, 0, buffer.Length);  
                               //        //writeAccessor.Write(buffer, 0, buffer.Length);  
                               //        w.Write(buffer);  
                               //    }  
                               //}  
  
                               //using (MemoryMappedViewAccessor writeView = writeMap.CreateViewAccessor())  
                               //{  
                               //    writeView.WriteArray(i * portionSize, buffer, 0, (int)fileSize);  
                               //}  
  
                           }  
                           //}  
                       }  
  
                   });  
               }  
  
  
           using (var writeMap = MemoryMappedFile.CreateFromFile(savedPath, FileMode.Create, "test", size, MemoryMappedFileAccess.ReadWrite))  
           {  
               using (MemoryMappedViewAccessor writeView = writeMap.CreateViewAccessor())  
               {  
                   Parallel.For(0, mappedFiles.Count, i =>  
                   {  
                       try  
                       {  
                           Monitor.Enter(locker);  
                           writeView.WriteArray(mappedFiles[i].Offset, mappedFiles[i].Buffer, 0, (int)mappedFiles[i].FileSize);  
                       }  
                       catch (Exception)  
                       {  
  
                           throw;  
                       }  
                       finally  
                       {  
                           Monitor.Exit(locker);  
                       }  
  
                   });  
               }  
           }  
           watch.Stop();  
           Console.WriteLine(watch.ElapsedMilliseconds);  
           #region MyRegion  
       }  
[csharp] view plain copy
public class MappedFile  
{  
    public long Offset { get; set; }  
    public byte[] Buffer { get; set; }  
    public long FileSize { get; set; }  

 

C# 中字符串string和字节数组byte[]的转换

string转byte[]:

byte[] byteArray = System.Text.Encoding.Default.GetBytes ( str );

byte[]转string:

string str = System.Text.Encoding.Default.GetString ( byteArray );

string转ASCII byte[]:

byte[] byteArray = System.Text.Encoding.ASCII.GetBytes ( str );

ASCII byte[]转string:

string str = System.Text.Encoding.ASCII.GetString ( byteArray );
posted on 2017-08-14 11:11  小石头的一天  阅读(783)  评论(0编辑  收藏  举报