关于Paralle.For和Paralle.ForEach

.NET4中加入了并行机制——所谓并行就是同时开辟若干个线程来进行计算。这些线程由于都是互相独立的,所以在进行一些分布式(比如各自不同的工作)的时候是非常简单,不过要把这些处理结果汇总起来却不是那么容易——下面来看一个非常简单的例子(求1~1000的和)。

如果你尝试使用以下的代码计算,恐怕令你大跌眼镜!

[C#]

int sum = 0;
Parallel.For(0, 1000,i => {sum+=i;});

[VB.NET]

Dim sum As Integer = 0
Parallel.[For](0, 1000, Function(i) 
sum += i)

究其原因就是.NET会默认开辟一些线程同时进行“sum+=i”的计算。那么由于sum被这些线程同时使用,往往是一个线程还没有处理完毕,另外一个线程又介入了,自然无法得到正确结果了。

解决这个问题的办法有许多:

【一】分解法:

所谓分解法,就是针对“同一个变量”被不同线程“共享”这一诟病而提出的。——也就是说,把1~1000求和分成若干块进行处理(等于给每一个线程分配了不同的内存)。最后把分布计算的结果进行累计汇总即可。结果如下:

[C#]

int[] numbers = Enumerable.Range(1, 1000).ToArray();
int[] values=new int[4];
int sum = 0;
Parallel.For(0, 4, i => { values[i] = new Program().GetTotal(i * 250, 250, numbers); });
sum = values.Sum();
Console.WriteLine(sum);

[VB.NET]

Dim numbers As Integer() = Enumerable.Range(1, 1000).ToArray()
Dim values As Integer() = New Integer(3) {}
Dim sum As Integer = 0
Parallel.[For](0, 4, Function(i) 
values(i) = New Program().GetTotal(i * 250, 250, numbers))
sum = values.Sum()
Console.WriteLine(sum)

【二】使用lock(锁住一个变量,然后直到该线程操作完毕自动释放变量,另外一个线程进来操作……如此反复而已):

[C#]

 public class Program
        {
            int sum = 0;

             private static object obj = new object();

 

            public void ShowResult()
            {
                
                Parallel.For(1, 1001, i => { lock (obj) { sum += i; Thread.Sleep(10); } });
                Console.WriteLine(sum);
            }

            static void Main(string[] args)
            {
                Program p = new Program();
                p.ShowResult();
            }
        }

[VB.NET]

Public Class Program
    Private sum As Integer = 0
    Private Shared obj As New Object
    Public Sub ShowResult()
        Dim obj As New Object()
        Parallel.For(1, 1001, Sub(i)
                                  SyncLock obj
                                      sum = sum + i
                                  End SyncLock
                              End Sub)
                                  Console.WriteLine(sum)
    End Sub
End Class

Module M
    Sub Main()
        Dim p As New Program
        p.ShowResult()
    End Sub
End Module

【三】使用InternLock函数:

[C#]

 public class Program
        {
            int sum = 0;
            public void ShowResult()
            {
                Parallel.For(1, 1001, i => { Interlocked.Add(ref sum, i); });
                Console.WriteLine(sum);
            }

            static void Main(string[] args)
            {
                Program p = new Program();
                p.ShowResult();
            }
        }

[VB.NET]

Public Class Program
    Private sum As Integer = 0
    Public Sub ShowResult()
        Parallel.[For](1, 1001, Sub(i)
                                    Interlocked.Add(sum, i)
                                End Sub)
        Console.WriteLine(sum)
    End Sub

    Shared Sub Main(args As String())
        Dim p As New Program()
        p.ShowResult()
    End Sub
End Class


一个更复杂的例子在于多线程“并行”遍历某个文件夹中全部的文件,并且添加到DataTable中:

[C#]

public class Program
    {
        /// <summary> 
        /// Each time process with 3 files 
        /// </summary> 
        public const int BLOCKFILEPROCESS = 3;
        private static object flag = new object();

        static void Main(string[] args)
        {
            DataTable dt = new DataTable();
            dt.Columns.Add("Id", typeof(int));
            dt.Columns.Add("FileName", typeof(string));
            dt.Columns.Add("ExtensionName", typeof(string));
            dt.Columns[0].AutoIncrement = true;
            dt.Columns[0].AutoIncrementSeed = 1;
            dt.Columns[0].AutoIncrementStep = 1;

            string[] files = Directory.GetFiles("c:\\文件夹名称", "*.*", SearchOption.AllDirectories);
            int totalFiles = files.Length;
            int finalThreadNum = totalFiles / BLOCKFILEPROCESS == 0 ? (totalFiles / BLOCKFILEPROCESS) : (totalFiles / BLOCKFILEPROCESS + 1);

            Parallel.For<string[]>(0, finalThreadNum, () => files, (i, state, processCollection) =>
            {
                return processCollection.Skip(i * BLOCKFILEPROCESS).Take(BLOCKFILEPROCESS).ToArray();
            },
            (result) =>
            {
              Monitor.Enter(flag);
                {
                 foreach (var item in result)
                    {
                        DataRow row = dt.NewRow();
                        row["FileName"] = Path.GetFileName(item);
                        row["ExtensionName"] = Path.GetExtension(item);
                        dt.Rows.Add(row);
                    }
                    Monitor.Exit(flag);
                }
                   
            });

            foreach (DataRow item in dt.Rows)
            {
                Console.WriteLine(item["Id"].ToString() + "<==>" + item["FileName"] + "<==>" + item["ExtensionName"].ToString());
            }
        }
    }

[VB.NET]

Public Class Program
    ''' <summary> 
    ''' Each time process with 3 files 
    ''' </summary> 
    Public Const BLOCKFILEPROCESS As Integer = 3
    Private Shared flag As New Object()

    Private Shared Sub Main(args As String())
        Dim dt As New DataTable()
        dt.Columns.Add("Id", GetType(Integer))
        dt.Columns.Add("FileName", GetType(String))
        dt.Columns.Add("ExtensionName", GetType(String))
        dt.Columns(0).AutoIncrement = True
        dt.Columns(0).AutoIncrementSeed = 1
        dt.Columns(0).AutoIncrementStep = 1

        Dim files As String() = Directory.GetFiles("c:\文件夹名称", "*.*", SearchOption.AllDirectories)
        Dim totalFiles As Integer = files.Length
        Dim finalThreadNum As Integer = If(totalFiles \ BLOCKFILEPROCESS = 0, (totalFiles \ BLOCKFILEPROCESS), (totalFiles \ BLOCKFILEPROCESS + 1))

        Parallel.[For](Of String())(0, finalThreadNum, Function() files, Function(i, state, processCollection) 
        Return processCollection.Skip(i * BLOCKFILEPROCESS).Take(BLOCKFILEPROCESS).ToArray()

End Function, Function(result) 
        Monitor.Enter(flag)
        If True Then
            For Each item As var In result
                Dim row As DataRow = dt.NewRow()
                row("FileName") = Path.GetFileName(item)
                row("ExtensionName") = Path.GetExtension(item)
                dt.Rows.Add(row)
            Next
            Monitor.[Exit](flag)

        End If

End Function)

        For Each item As DataRow In dt.Rows
            Console.WriteLine((item("Id").ToString() & "<==>") + item("FileName") & "<==>" & item("ExtensionName").ToString())
        Next
    End Sub
End Class

上面的示例代码使用了Monitor进行对多线程访问同一个对象的锁定和解锁(其实lock的本质是调用了Monitor,不过lock花费的时间比Monitor要长)。如果不用lock或者Monitor锁定,那么将造成多线程访问最有一个DataTable,造成竞争现象的发生。相对第一个For的非泛型示例而言,最大区别在于第一个非泛型的For解决方案1在求和时是为每一个线程单独分配了存储单元,然而这个示例的存储单元(DataTable)却是共享的。

一般地,如果多线程共享一个资源,对一个资源操作,必须采用“同步”机制来控制!


另外,这里的For使用了其泛型的版本:第一,第二个参数决定循环的次数(其实是分块多少块,准备开多少线程;第三个参数每次会返回一个IEnumerable集合供特性的线程操作,第四个参数委托将自定义返回给当前已经创建线程的集合(示例中是使用Skip+Take方法计算返回给当前线程多少数据)。最后一个委托是无参数的Action,直接对返回的集合进行处理(添加进入表格中)。

同样这里还可以使用Paralle.ForEach,注意ForEach无法分块(因为内部已经分块的,具体开多少线程无法人为控制),代码参考如下:

[C#]

 public class Program
    {
        /// <summary> 
        /// Each time process with 3 files 
        /// </summary> 
        public const int BLOCKFILEPROCESS = 3;
        private static object flag = new object();

        static void Main(string[] args)
        {
            DataTable dt = new DataTable();
            dt.Columns.Add("Id", typeof(int));
            dt.Columns.Add("FileName", typeof(string));
            dt.Columns.Add("ExtensionName", typeof(string));
            dt.Columns[0].AutoIncrement = true;
            dt.Columns[0].AutoIncrementSeed = 1;
            dt.Columns[0].AutoIncrementStep = 1;

            string[] files = Directory.GetFiles("c:\\安装", "*.jpg", SearchOption.TopDirectoryOnly);

            Parallel.ForEach<string>(files, (s) => 
            {
                lock (flag)
                {
                    DataRow row = dt.NewRow();
                    row["FileName"] = Path.GetFileName(s);
                    row["ExtensionName"] = Path.GetExtension(s);
                    dt.Rows.Add(row);
                }
            });

            foreach (DataRow item in dt.Rows)
            {
                Console.WriteLine(item["Id"].ToString() + "<==>" + item["FileName"] + "<==>" + item["ExtensionName"].ToString());
            }
        }
    }

[VB.NET]

Public Class Program
    ''' <summary> 
    ''' Each time process with 3 files 
    ''' </summary> 
    Public Const BLOCKFILEPROCESS As Integer = 3
    Private Shared flag As New Object()

    Private Shared Sub Main(args As String())
        Dim dt As New DataTable()
        dt.Columns.Add("Id", GetType(Integer))
        dt.Columns.Add("FileName", GetType(String))
        dt.Columns.Add("ExtensionName", GetType(String))
        dt.Columns(0).AutoIncrement = True
        dt.Columns(0).AutoIncrementSeed = 1
        dt.Columns(0).AutoIncrementStep = 1

        Dim files As String() = Directory.GetFiles("c:\安装", "*.jpg", SearchOption.TopDirectoryOnly)

        Parallel.ForEach(Of String)(files, Function(s) 
        SyncLock flag
            Dim row As DataRow = dt.NewRow()
            row("FileName") = Path.GetFileName(s)
            row("ExtensionName") = Path.GetExtension(s)
            dt.Rows.Add(row)
        End SyncLock

End Function)

        For Each item As DataRow In dt.Rows
            Console.WriteLine((item("Id").ToString() & "<==>") + item("FileName") & "<==>" & item("ExtensionName").ToString())
        Next
    End Sub
End Class

 

posted @ 2012-05-12 16:48  Serviceboy  阅读(1562)  评论(0编辑  收藏  举报