代码改变世界

何止 Linq 的 Distinct 不给力

2011-08-02 18:45 by 鹤冲天, ... 阅读, ... 评论, 收藏, 编辑

昨日看到一篇文章 《Linq的Distinct太不给力了》,文中指出 Linq 中 Distinct 方法的一个重载使用了 IEqualityComparer<T> 作为参数,调用时大多都要创建新的类去实现这个接口,很不给力。文中给出了一种解决办法,略显烦索,我也写了《c# 扩展方法 奇思妙用 基础篇 八:Distinct 扩展》一文使用扩展方法予以简化。

但问题远远没有结束,不给力是因为使用了 IEqualityComparer<T> 作为参数,而 .net 中将 IEqualityComparer<T> 用作参数的地方相当多:

IEqualityComparer<T> 用作参数

.net 中 IEqualityComparer<T> 用作参数,大致可分为以下两种情况:

1. Linq

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public static class Enumerable
{
    public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value, IEqualityComparer<TSource> comparer);
    public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer);
    public static IEnumerable<TSource> Except<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second,
        IEqualityComparer<TSource> comparer);
    public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);
    public static IEnumerable<TSource> Intersect<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second,
        IEqualityComparer<TSource> comparer);
    public static bool SequenceEqual<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second,
        IEqualityComparer<TSource> comparer);
    public static Dictionary<TKey, TSource> ToDictionary<TSource, TKey>(this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);
    public static ILookup<TKey, TSource> ToLookup<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector,
        IEqualityComparer<TKey> comparer);
    public static IEnumerable<TSource> Union<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second,
        IEqualityComparer<TSource> comparer);
    //...
}

同样 Queryable 类中也有类似的一些方法

2. 字典、集合类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public class Dictionary<TKey, TValue> : IDictionary<TKey, TValue>, ICollection<KeyValuePair<TKey, TValue>>, 
    IEnumerable<KeyValuePair<TKey, TValue>>, IDictionary, ICollection, IEnumerable, ISerializable, IDeserializationCallback
{
    public Dictionary();
    public Dictionary(IDictionary<TKey, TValue> dictionary);
    public Dictionary(IEqualityComparer<TKey> comparer);
    public Dictionary(int capacity);
    public Dictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer);
    public Dictionary(int capacity, IEqualityComparer<TKey> comparer);
    //...
}

public class HashSet<T> : ISerializable, IDeserializationCallback, ISet<T>, ICollection<T>, IEnumerable<T>, IEnumerable
{
    public HashSet();
    public HashSet(IEnumerable<T> collection);
    public HashSet(IEqualityComparer<T> comparer);
    public HashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer);
    //...
}

Dictionary<TKey, TValue> 和 HashSet<T> 类的构造函数都用到了 IEqualityComparer<T> 接口。

除了如上两个,还有 ConcurrentDictionary<TKey, TValue>、SortedSet<T>、KeyedCollection<TKey, TItem>(抽象类)、SynchronizedKeyedCollection<K, T> 等等也使用 IEqualityComparer<T> 接口作为构造函数的参数。

 

IEqualityComparer<T> 作为参数多在复杂的重载中出现,满足一些特殊情况的要求,而相应的简单的重载确是经常使用的。因此,虽然 IEqualityComparer<T> 在 .net 应用广泛,但在我们编程时,确是较少涉及。
不过话又说回来,一旦使用到时,就会感觉相当麻烦。多数时候你不得不去创建一个新类,去实现 IEqualityComparer<T> 接口,再去 new 一个实例,而你真正需要的可能仅仅是根据某个属性(如 ID )进行比较。创建新类实现 IEqualityComparer<T> 接口,不但增加了代码量,还增加的复杂度:你要考虑这个新类放在哪里合适,如何命名等等。

因此,我们期望有一个简单的方法来能直接创建 IEqualityComparer<T> 的实例。《c# 扩展方法 奇思妙用 基础篇 八:Distinct 扩展》一文中给出了一个简单实用的类 CommonEqualityComparer<T, V>,在这里可以复用来达到我们的目标。

CommonEqualityComparer<T, V>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Linq;

public class CommonEqualityComparer<T, V> : IEqualityComparer<T>
{
    private Func<T, V> keySelector;
    private IEqualityComparer<V> comparer;

    public CommonEqualityComparer(Func<T, V> keySelector, IEqualityComparer<V> comparer)
    {
        this.keySelector = keySelector;
        this.comparer = comparer;
    }
    public CommonEqualityComparer(Func<T, V> keySelector)
        : this(keySelector, EqualityComparer<V>.Default)
    {  }

    public bool Equals(T x, T y)
    {
        return comparer.Equals(keySelector(x), keySelector(y));
    }
    public int GetHashCode(T obj)
    {
        return comparer.GetHashCode(keySelector(obj));
    }
}

使用这个类,可以简易通过 lambda 表达式来创建 IEqualityComparer<T> 的实例:

1
2
3
4
5
6
var dict = new Dictionary<Person, string>(new CommonEqualityComparer<Person, string>(p => p.Name));

List<Person> persons = null;
Person p1 = null;
//...
var ps = persons.Contains(p1, new CommonEqualityComparer<Person, int>(p=>p.ID));

相信看了上面代码的,你会觉得 new CommonEqualityComparer<Person, string>(p => p.Name)) 太冗长。不过我们可以借助下面的类加以改善:

1
2
3
4
5
6
7
8
9
10
11
public static class Equality<T>
{
    public static IEqualityComparer<T> CreateComparer<V>(Func<T, V> keySelector)
    {
        return new CommonEqualityComparer<T, V>(keySelector);
    }
    public static IEqualityComparer<T> CreateComparer<V>(Func<T, V> keySelector, IEqualityComparer<V> comparer)
    {
        return new CommonEqualityComparer<T, V>(keySelector, comparer);
    }
}

调用代码可简化:

1
2
var dict = new Dictionary<Person, string>(Equality<Person>.CreateComparer(p => p.Name));
var ps = persons.Contains(p1, Equality<Person>.CreateComparer(p => p.ID));

不考虑类名和方法名的前提下,Equality<Person>.CreateComparer(p => p.ID) 的写法也经精简到极限了(如果你能进一步精简,不妨告诉我)

其实有了 Equality<T> 这个类,我们大可将 CommonEqualityComparer<T, V> 类封装隐藏起来。

Equality<T> 类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public static class Equality<T>
{
    public static IEqualityComparer<T> CreateComparer<V>(Func<T, V> keySelector)
    {
        return new CommonEqualityComparer<V>(keySelector);
    }
    public static IEqualityComparer<T> CreateComparer<V>(Func<T, V> keySelector, IEqualityComparer<V> comparer)
    {
        return new CommonEqualityComparer<V>(keySelector, comparer);
    }

    class CommonEqualityComparer<V> : IEqualityComparer<T>
    {
        private Func<T, V> keySelector;
        private IEqualityComparer<V> comparer;

        public CommonEqualityComparer(Func<T, V> keySelector, IEqualityComparer<V> comparer)
        {
            this.keySelector = keySelector;
            this.comparer = comparer;
        }
        public CommonEqualityComparer(Func<T, V> keySelector)
            : this(keySelector, EqualityComparer<V>.Default)
        { }

        public bool Equals(T x, T y)
        {
            return comparer.Equals(keySelector(x), keySelector(y));
        }
        public int GetHashCode(T obj)
        {
            return comparer.GetHashCode(keySelector(obj));
        }
    }
}

CommonEqualityComparer<T, V> 封装成了 Equaility<T> 的嵌套类 CommonEqualityComparer<V>,对外不可见,降低了使用的复杂度。

c# 扩展方法 奇思妙用 基础篇 八:Distinct 扩展》一文中的 Distinct 扩展方法 写起来也简单了:

1
2
3
4
5
6
7
8
9
10
11
public static class DistinctExtensions
{
    public static IEnumerable<T> Distinct<T, V>(this IEnumerable<T> source, Func<T, V> keySelector)
    {
        return source.Distinct(Equality<T>.CreateComparer(keySelector));
    }
    public static IEnumerable<T> Distinct<T, V>(this IEnumerable<T> source, Func<T, V> keySelector, IEqualityComparer<V> comparer)
    {
        return source.Distinct(Equality<T>.CreateComparer(keySelector, comparer));
    }
}

Linq 中除 Distinct 外还有众多方法使用了 IEqualityComparer<T> 接口,逐一扩展未必是一个好方式,使用 Equality<T>.CreateComparer 方法比较明智。

总结

.net 中经常把 IEqualityComparer<T> 用作某些重载的参数。
虽然这些重载在日常使用中并不频繁,不过一旦用到,大多要创建新类实现 IEqualityComparer<T>,繁琐不给力。
本文创建 Equality<T> 泛型类,配合一个 lambda 表达式可快速创建 IEqualityComparer<T> 的实例。