
2008年4月11日
转自http://community.bartdesmet.net/blogs/bart/archive/2008/03/30/a-functional-c-type-switch.aspx
A while ago I posted a functional way of exception handling, introducing similar functionality as exception filters (VB's When keyword). I admitted it was a crazy Sunday afternoon idea, maybe I should create a category entitled "Crazy Sundays" since this post very much belongs to that same category (update: I did create the category).
It all started a few weeks ago when I was explaining the way LINQ works, starting by focusing on the concept of extension methods and mentioning sexy words like monads, continuations, etc. After the session somebody came to me and wondered what other ideas could be expressed in a similar way like LINQ's query operator chaining. I came up with a couple of uses and this post concretizes one of those.
The rules
Okay, let's get started. What's up? Cloning the switch functionality as is? Well, almost, but adding some other stuff to it. First, take a look at what we have today, baked in by the language (section 15.7.2 of the C# Language Specification):
switch-statement:
switch ( expression ) switch-block
switch-block:
{ [switch-sections] }
switch-sections:
switch-section
switch-sections switch-section
switch-section:
switch-labels statement-list
switch-labels:
switch-label
switch-labels switch-label
switch-label:
case constant-expression :
default :
What's interesting about the above are the limitations imposed by the switch statement. First of all there's the governing type defined by the switch expression, which needs to be of (or convertible into using one user-defined implicit conversion) a built-in integral numerical type or a string (the exact list of types is specified in 15.7.2). You can think of it as a big if-else if-else statement although the implementation might be quite different, using switching tables (IL switch) and for (more than 5 I believe) strings a Dictionary. I won't go in further IL detail nor dive into the subtleties of nullables (maybe another time).
Another thing to know about is the fact the language has a "no fall through" rule (read: forgetting a break statement), which eliminates a series of common problems in other curly brace languages that will remain unnamed. In addition to this, one can reorder the switch sections at will without affecting the semantics of the switch. And last but not least, all of the (values of the) labels should be unique.
All of this being said, we're going to break certain rules over the course of this fun activity. Beware of this, especially when you'd be tempted (I doubt) to embrace this idea.
Simple switch
We'll start by defining simple switch logic. How could we mimic a switch statement by means of method calls? Right, extension methods. Before we go there, it's quite important to pick a target for those extensions (the 'this' parameter). Sure, we could go for System.Object but do we want to spoil such a fundamental type with (seemingly) additional methods? I'm tempting to say no, but feel free to have another opinion, so we'll define a wrapper. Quick-n-dirty, here it is:
class Switch
{
public Switch(object o)
{
Object = o;
}
public object Object { get; private set; }
}
Exercise: Adding a special class for switching logic has some drawbacks. What about a struct? Predict what would happen if you trade the class keyword for the struct keyword above. Will the fragment still compile? If not, what needs to change? Try to push forward the choice of a struct in the rest of this post if you're convinced about the alternative.
In fact, we could even forget about extension methods at this point in time, since we own the Switch class. You can choose either way, but to keep myself honest on the goal of Crazy Sunday posts I'll stick with extension methods.
Exercise: Abandon my idea of using extension methods and go with instance methods from here on.
A first difference has become apparent already, we'll support all types to be used in our switch. Next, we have to pick the syntax we're aiming for. It should go along those lines:
void Do(int age)
{
new Switch(age)
.Case(a => (int)a < 18, a =>
{
Console.WriteLine("Young");
})
.Case(18, a =>
{
Console.WriteLine("Middle-age");
})
.Default(a =>
{
Console.WriteLine("Old");
});
}
There are a couple of remarkable things in here. Let's analyze case by case:
- We use System.Object as our base type, so the first switch needs a cast. Further on, we'll do something about this.
- Again in the first switch, notice we use a Func<object, bool> as the switching condition. This goes beyond the simply constant-based comparison of the typical switch.
- The second switch is a typical one, comparing just a value for equality.
- Finally, we have the familiar default base case.
The whole thing 'returns' void, but to allow for chaining we need to pass through objects between the Case 'labels' obviously. We could go further and make the whole thing a valued expression, but let's not go there for now.
Another notable (but obvious) thing is the lack of break keywords. There's nothing to break after all, so we need to bake the semantics into the method calls. We'll stick with "no fall-through by default" but will provide an overload:
void Do(string name)
{
new Switch(name)
.Case(s => ((string)s).StartsWith("B"), s =>
{
Console.WriteLine(((string)s) + " starts with B.");
}, true)
.Case(s => ((string)s).StartsWith("Ba"), s =>
{
Console.WriteLine(((string)s) + " starts with Ba.");
})
.Default(s =>
{
Console.WriteLine(((string)s) + " starts with who knows what.");
});
}
The true parameter to the first Case call indicates to fall through. Time for some implementation work. Here's a first set of (extension) methods:
static class SwitchExtensions
{
public static Switch Case(this Switch s, object o, Action<object> a)
{
return Case(s, o, a, false);
}
public static Switch Case(this Switch s, object o, Action<object> a, bool fallThrough)
{
return Case(s, x => object.Equals(x, o), a, fallThrough);
}
public static Switch Case(this Switch s, Func<object, bool> c, Action<object> a)
{
return Case(s, c, a, false);
}
public static Switch Case(this Switch s, Func<object, bool> c, Action<object> a, bool fallThrough)
{
if (s == null)
{
return null;
}
else if (c(s.Object))
{
a(s.Object);
return fallThrough ? s : null;
}
return s;
}
}
Notice the way chaining works, by returning null to break the chain. Extension methods and classes make sense after all, although (exercise) you can still (?) work around it (what about a Switch.Break thingy?). Let's bring Default on the scene too:
public static void Default(this Switch s, Action<object> a)
{
if (s != null)
{
a(s.Object);
}
}
This is where we close the loop by returning void, so that no subsequent Case or Default calls can be made (which really wouldn't make sense).
Exercise: What would it take to turn the whole thing in a valued expression?
Generic switch
Remember the first case 'label' of our first sample? A reminder:
.Case(a => (int)a < 18, a =>
{
Console.WriteLine("Young");
})
The cast is ugly and this became even more apparent in the second sample where we had to cast a string multiple times. Not only is this inefficient, it's ugly and is a bummer for IntelliSense. Let's fix this by introducing a generic Switch<T> class:
class Switch<T>
{
public Switch(T o)
{
Object = o;
}
public T Object { get; private set; }
}
The extensions are simple once more:
public static Switch<T> Case<T>(this Switch<T> s, T t, Action<T> a)
{
return Case(s, t, a, false);
}
public static Switch<T> Case(this Switch<T> s, T t, Action<T> a, bool fallThrough)
{
return Case(s, x => object.Equals(x, t), a, fallThrough);
}
public static Switch<T> Case(this Switch<T> s, Func<T, bool> c, Action<T> a)
{
return Case(s, c, a, false);
}
public static Switch<T> Case(this Switch<T> s, Func<T, bool> c, Action<T> a, bool fallThrough)
{
if (s == null)
{
return null;
}
else if (c(s.Object))
{
a(s.Object);
return fallThrough ? s : null;
}
return s;
}
public static void Default<T>(this Switch<T> s, Action<T> a)
{
if (s != null)
{
a(s.Object);
}
}
This allows us to write our previous samples more concise:
void Do(string name)
{
new Switch<string>(name)
.Case(s => s.StartsWith("B"), s =>
{
Console.WriteLine(s + " starts with B.");
}, true)
.Case(s => s.StartsWith("Ba"), s =>
{
Console.WriteLine(s + " starts with Ba.");
})
.Default(s =>
{
Console.WriteLine(s + " starts with who knows what.");
});
}
Much cleaner.
Type switch
Crazy or not, there's most of the time always something useful to it. What about capturing the following pattern?
void Do(Control c)
{
if (c is Label)
{
Label l = (Label)c;
// ...
}
else if (c is Button)
{
Button b = (Button)c;
// ...
}
else
{
// ...
}
}
This is a common pattern when dealing with extensions to UI code that need to process all sorts of controls, or when writing parsers as with System.Linq.Expressions where you have to switch on the type of the expression. Unfortunately, the code above isn't the most efficient one. First we do type checks, followed by raw casts. Use of the as keyword is better (even FxCop will tell you):
void Do(Control c)
{
Label l;
Button b;
if ((l as Label) != null)
{
// ...
}
else if ((c as Button) != null)
{
// ...
}
else
{
// ...
}
}
But soon it starts to become uglier. I'm not claiming to improve things concerning readability or efficiency in this post, suffice to say I'm capturing a pattern. Enter our type switch. We'd like to be able to rewrite the code above as:
void Do(Control c)
{
new Switch(c)
.Case<Label>(l =>
{
// ...
})
.Case<Button>(b =>
{
// ...
})
.Default(cc =>
{
// ...
});
}
First of all, notice we piggyback on the non-generic switch. Every Case-'label' already has type information and can only be entered if the switch expression is of the specified type, therefore the body of each label's action body will have the original expression casted to the specified type. E.g. when typing b. in the second label, you'll see the IntelliSense list for a Button variable. The only drawback is the Default block where cc won't be of a more specific type. Obviously you could make it Default<T>, passing in Control in the sample above.
Exercise: Think about the reason not to use the generic Switch<T> in here (tip: see the implementation below).
On to the implementation. Almost trivial again:
public static Switch Case<T>(this Switch s, Action<T> a) where T : class
{
return Case<T>(s, o => true, a, false);
}
public static Switch Case<T>(this Switch s, Action<T> a, bool fallThrough) where T : class
{
return Case<T>(s, o => true, a, fallThrough);
}
public static Switch Case<T>(this Switch s, Func<T, bool> c, Action<T> a) where T : class
{
return Case<T>(s, c, a, false);
}
public static Switch Case<T>(this Switch s, Func<T, bool> c, Action<T> a, bool fallThrough) where T : class
{
if (s == null)
{
return null;
}
else
{
T t = s.Object as T;
if (t != null)
{
if (c(t))
{
a(t);
return fallThrough ? s : null;
}
}
}
return s;
}
Default has been specified already although you could have a Default<T> as well (as outlined previously).
Exercise: Why the generic constraint in the code above? Any way around it (without using plain casts and exception handling obviously...)? What about nullables?
Ultimately the same chaining is made possible with the above, but this time by switching on types. Notice fall-through is still relevant, not just because we have all sorts of conditions through Func<T, bool> but also because the type hierarchy we're dealing with. That is (one of the rules we're breaking): order matters.
To show you the above works like a charm:
Yes, there are ways around this with a classic switch using the Expression.NodeType enum value, but sometimes you want more or other (sealed) object-hierarchies lack such infrastructure vehicles.
Valued switches
So, what would it take to make the switch valued, meaning it doesn't return a void but any "projection" you want? In fact, this is much like functional languages where we have if-expressions (instead of statements), much like we have the ternary operator in curly brace languages (and the new If in VB 9.0). I won't nag too much about this, but such a construct isn't seldom seen. Take LISP for example, with:
(cond (e1 e1') (e2 e2') ... (en en'))
so that
(if e1 e2 e3) = (cond (e1 e2) ('T e3))
where if is redefined in terms of cond: if e1 evaluates true, e2 is returned; otherwise, car (old name for head, standing for current address register, a historical name) of the second argument is evaluated (i.e. 'T = true) and if that returns true (i.e. always) e3 is returned.
In order to enable this, we'll need to create a new generic Switch object that not only takes in a type specifying the source but also a target type. This is our definition:
class Switch<T, R>
{
public Switch(T o)
{
Object = o;
}
public T Object { get; private set; }
public bool HasValue { get; private set; }
public R Value { get; private set; }
public void Set(R value)
{
Value = value;
HasValue = true;
}
}
It looks a bit like Nullable<R> with the HasValue and Value properties. Essentially, once a value has been assigned (through Set), HasValue will flip to true which indicates we've found a match. The semantics are that the first match in a Switch-expression wins, although one could easily adapt this. However, notice this is less efficient that an early return from a function since we'll have to forward the result till the end of the method call chain that makes up the Switch-expression. Let's make it concrete with just three functions (it only gets easier it seems):
public static Switch<T, R> Case<T, R>(this Switch<T, R> s, T t, Func<T, R> f)
{
return Case<T, R>(s, x => object.Equals(x, t), f);
}
public static Switch<T, R> Case<T, R>(this Switch<T, R> s, Func<T, bool> c, Func<T, R> f)
{
if (!s.HasValue && c(s.Object))
{
s.Set(f(s.Object));
}
return s;
}
public static R Default<T, R>(this Switch<T, R> s, Func<T, R> f)
{
if (!s.HasValue)
{
s.Set(f(s.Object));
}
return s.Value;
}
Actually this starts to look a little LINQ-familiar, with Func<T, bool> being a predicate (as in Where) and Func<T, R> being a projection (as in Select*). The idea is simple: a case evaluates the condition only if the switch hasn't a final value yet. If the test (c) passes, the projection is carried out (f) and the value is set (Set). Default is unconditional and has to be supplied as the final 'projection' but it could well be trivial (especially when case labels are present for all cases that can occur, e.g. when switching on an enumeration or a fixed object hierarchy). Notice there's no type switch functionality (exercise). Here's a trivial sample of this switch at work:
var res =
from x in typeof(string).GetMembers()
select new Switch<MemberInfo, string>(x)
.Case(m => m is MethodInfo, m => m.Name + " is a method")
.Case(m => m is PropertyInfo, m => m.Name + " is a property")
.Default(m => m.Name + " is something else");
foreach (var s in res)
Console.WriteLine(s);
producing the following result:
Conclusion
Crazy but lot of fun. And much room for follow-up. Just a few ideas: Expression<T>, Reflection.Emit. Anyway, enough for now. Have a nice week!
转自http://community.bartdesmet.net/blogs/bart/archive/2008/03/30/a-functional-c-type-switch.aspx
posted @
2008-04-11 12:17 SZW 阅读(76) |
评论 (0) |
编辑

2008年1月28日
The singleton pattern is one of the best-known patterns in software engineering.
Essentially, a singleton is a class which only allows a single instance of itself
to be created, and usually gives simple access to that instance. Most commonly,
singletons don't allow any parameters to be specified when creating the instance -
as otherwise a second request for an instance but with a different parameter could
be problematic! (If the same instance should be accessed for all requests with the
same parameter, the factory pattern is more appropriate.) This article deals only with
the situation where no parameters are required. Typically a requirement of singletons
is that they are created lazily - i.e. that the instance isn't created until it is
first needed.
There are various different ways of implementing the singleton pattern in C#. I shall
present them here in reverse order of elegance, starting with the most commonly seen,
which is not thread-safe, and working up to a fully lazily-loaded, thread-safe, simple
and highly performant version. Note that in the code here, I omit the private
modifier, as it is the default for class members. In many other languages such as Java, there
is a different default, and private should be used.
All these implementations share four common characteristics, however:
-
A single constructor, which is private and parameterless.
This prevents other classes from instantiating it (which would be a violation of the pattern).
Note that it also prevents subclassing - if a singleton can be subclassed once, it can be
subclassed twice, and if each of those subclasses can create an instance, the pattern is
violated. The factory pattern can be used if you need a single instance of a base type,
but the exact type isn't known until runtime.
-
The class is sealed. This is unnecessary, strictly speaking, due to the above point,
but may help the JIT to optimise things more.
-
A static variable which holds a reference to the single created instance, if any.
-
A public static means of getting the reference to the single created instance, creating
one if necessary.
Note that all of these implementations also use a public static property Instance
as the means of accessing the instance. In all cases, the property could easily be converted
to a method, with no impact on thread-safety or performance.
First version - not thread-safe
// Bad code! Do not use!
public sealed class Singleton
{
static Singleton instance=null;
Singleton()
{
}
public static Singleton Instance
{
get
{
if (instance==null)
{
instance = new Singleton();
}
return instance;
}
}
}
|
As hinted at before, the above is not thread-safe. Two different threads could both
have evaluated the test if (instance==null) and found it to be true,
then both create instances, which violates the singleton pattern. Note that in fact
the instance may already have been created before the expression is evaluated, but
the memory model doesn't guarantee that the new value of instance will be seen by
other threads unless suitable memory barriers have been passed.
Second version - simple thread-safety
public sealed class Singleton
{
static Singleton instance=null;
static readonly object padlock = new object();
Singleton()
{
}
public static Singleton Instance
{
get
{
lock (padlock)
{
if (instance==null)
{
instance = new Singleton();
}
return instance;
}
}
}
}
|
This implementation is thread-safe. The thread takes out a lock on a shared
object, and then checks whether or not the instance has been created before creating the instance.
This takes care of the memory barrier issue (as locking makes sure that
all reads occur logically after the lock acquire, and unlocking makes sure that all writes occur
logically before the lock release) and ensures that only one thread will create an instance
(as only one thread can be in that part of the code at a time - by the time the second thread
enters it,the first thread will have created the instance, so the expression will evaluate to false).
Unfortunately, performance suffers as a lock is acquired every time the instance is requested.
Note that instead of locking on typeof(Singleton) as some versions of this
implementation do, I lock on the value of a static variable which is private to the class.
Locking on objects which other classes can access and lock on (such as the type) risks
performance issues and even deadlocks. This is a general style preference of mine - wherever
possible, only lock on objects specifically created for the purpose of locking, or which
document that they are to be locked on for specific purposes (e.g. for waiting/pulsing a queue).
Usually such objects should be private to the class they are used in. This helps to make
writing thread-safe applications significantly easier.
Third version - attempted thread-safety using double-check locking
// Bad code! Do not use!
public sealed class Singleton
{
static Singleton instance=null;
static readonly object padlock = new object();
Singleton()
{
}
public static Singleton Instance
{
get
{
if (instance==null)
{
lock (padlock)
{
if (instance==null)
{
instance = new Singleton();
}
}
}
return instance;
}
}
}
|
This implementation attempts to be thread-safe without the necessity of taking out a lock every time.
Unfortunately, there are four downsides to the pattern:
-
It doesn't work in Java. This may seem an odd thing to comment on, but it's worth knowing
if you ever need the singleton pattern in Java, and C# programmers may well also be Java
programmers. The Java memory model doesn't ensure that the constructor completes before
the reference to the new object is assigned to instance. The Java memory model underwent
a reworking for version 1.5, but double-check locking is still broken after this without a volatile
variable (as in C#).
-
Without any memory barriers, it's broken in the ECMA CLI specification too. It's possible that under the .NET 2.0
memory model (which is stronger than the ECMA spec) it's safe, but I'd rather not rely on those stronger
semantics, especially if there's any doubt as to the safety.
Making the
instance
variable volatile can make it work, as would explicit memory barrier
calls, although in the latter case even experts can't agree exactly
which barriers are required. I tend to try to avoid situations where
experts don't agree what's right and what's wrong!
-
It's easy to get wrong. The pattern needs to be pretty much exactly as above - any
significant changes are likely to impact either performance or correctness.
-
It still doesn't perform as well as the later implementations.
Fourth version - not quite as lazy, but thread-safe without using locks
public sealed class Singleton
{
static readonly Singleton instance=new Singleton();
static Singleton()
{
}
Singleton()
{
}
public static Singleton Instance
{
get
{
return instance;
}
}
}
|
As you can see, this is really is extremely simple - but why is it thread-safe and how lazy is it?
Well, static constructors in C# are specified to execute only when an instance of the class is
created or a static member is referenced, and to execute only once per AppDomain. Given that
this check for the type being newly constructed needs to be executed whatever else happens, it
will be faster than adding extra checking as in the previous examples. There are a couple of
wrinkles, however:
-
It's not as lazy as the other implementations. In particular, if you have static members
other than
Instance, the first reference to those members will involve
creating the instance. This is corrected in the next implementation.
-
There are complications if one static constructor invokes another which invokes the
first again. Look in the .NET specifications (currently section 9.5.3 of partition II)
for more details about the exact nature of type initializers - they're unlikely to bite you,
but it's worth being aware of the consequences of static constructors which refer to each
other in a cycle.
-
The laziness of type initializers is only guaranteed by .NET when the type isn't
marked with a special flag called
beforefieldinit. Unfortunately,
the C# compiler (as provided in the .NET 1.1 runtime, at least) marks all types
which don't have a static constructor (i.e. a block which looks
like a constructor but is marked static) as beforefieldinit. I now
have a discussion page with more details about
this issue. Also note that it affects performance, as discussed near the bottom
of this article.
One shortcut you can take with this implementation (and only this one) is to just make
instance a public static readonly variable, and get rid of the property entirely.
This makes the basic skeleton code absolutely tiny! Many people, however, prefer to have a
property in case further action is needed in future, and JIT inlining is likely to make
the performance identical. (Note that the static constructor itself is still required
if you require laziness.)
Fifth version - fully lazy instantiation
public sealed class Singleton
{
Singleton()
{
}
public static Singleton Instance
{
get
{
return Nested.instance;
}
}
class Nested
{
static Nested()
{
}
internal static readonly Singleton instance = new Singleton();
}
}
|
Here, instantiation is triggered by the first reference to the static member of the nested
class, which only occurs in Instance. This means the implementation is fully
lazy, but has all the performance benefits of the previous ones. Note that although nested
classes have access to the enclosing class's private members, the reverse is not true, hence
the need for instance to be internal here. That doesn't raise any other problems,
though, as the class itself is private. The code is a bit more complicated in order to make
the instantiation lazy, however.
Performance vs laziness
In many cases, you won't actually require full laziness - unless your class initialization
does something particularly time-consuming, or has some side-effect elsewhere, it's probably
fine to leave out the explicit static constructor shown above. This can increase performance
as it allows the JIT compiler to make a single check (for instance at the start of a method)
to ensure that the type has been initialized, and then assume it from then on. If your
singleton instance is referenced within a relatively tight loop, this can make a (relatively)
significant performance difference. You should decide whether or not fully lazy instantiation
is required, and document this decision appropriately within the class. (See below for more on
performance, however.)
Exceptions
Sometimes, you need to do work in a singleton constructor which may throw an exception, but
might not be fatal to the whole application. Potentially, your application may be able to
fix the problem and want to try again. Using type initializers to construct the singleton
becomes problematic at this stage. Different runtimes handle this case differently,
but I don't know of any which do the desired thing (running the type initializer again), and
even if one did, your code would be broken on other runtimes. To avoid these problems, I'd
suggest using the second pattern listed on the page - just use a simple lock, and go through
the check each time, building the instance in the method/property if it hasn't already been
successfully built.
Thanks to Andriy Tereshchenko for raising this issue.
A word on performance
A lot of the reason for this page stemmed from people trying to be clever, and thus coming
up with the double-checked locking algorithm. There is an attitude of locking being expensive
which is common and misguided. I've written a very quick benchmark
which just acquires singleton instances in a loop a billion ways, trying different variants.
It's not terribly scientific, because in real life you may want to know how fast it is if each
iteration actually involved a call into a method fetching the singleton, etc. However, it does
show an important point. On my laptop, the slowest solution (by a factor of about 5) is the locking
one (solution 2). Is that important? Probably not, when you bear in mind that it still managed to
acquire the singleton a billion times in under 40 seconds. That means that if you're "only"
acquiring the singleton four hundred thousand times per second, the cost of the acquisition
is going to be 1% of the performance - so improving it isn't going to do a lot. Now, if you are
acquiring the singleton that often - isn't it likely you're using it within a loop? If you care
that much about improving the performance a little bit, why not declare a local variable outside the loop,
acquire the singleton once and then loop. Bingo, even the slowest implementation becomes easily
adequate.
I would be very interested to see a real world application where the difference between using
simple locking and using one of the faster solutions actually made a significant performance difference.
Conclusion (modified slightly on January 7th 2006)
There are various different ways of implementing the singleton pattern in C#.
A reader has written to me detailing a way he has encapsulated the synchronization aspect,
which while I acknowledge may be useful in a few very particular situations
(specifically where you want very high performance, and the ability to determine whether or not
the singleton has been created, and full laziness regardless of other static
members being called). I don't personally see that situation coming up often enough
to merit going further with on this page, but please mail
me if you're in that situation.
My personal preference is for solution 4: the only time I would normally go away from it
is if I needed to be able to call other static methods without triggering initialization, or
if I needed to know whether or not the singleton has already been instantiated. I don't remember
the last time I was in that situation, assuming I even have. In that case, I'd probably go
for solution 2, which is still nice and easy to get right.
Solution 5 is elegant, but trickier than 2 or 4, and as I said above, the benefits it provides
seem to only be rarely useful.
(I wouldn't use solution 1 because it's broken, and I wouldn't use solution 3 because it has no
benefits over 5.)
引用自:http://www.yoda.arachsys.com/csharp/singleton.html
posted @
2008-01-28 11:35 SZW 阅读(51) |
评论 (0) |
编辑