Clojure STM 笔记-上篇

年前有时间关注了一下Clojure,兴趣点在Lisp和其对并发的解决方案.对于后者,老聂推荐了一篇比较不错的文章:"Software Transactional Memory" .这篇文章是个很好的切入点,一方面可以学习Clojure,一方面可以借这个机会温习一下"程序设计语言-实践之路"的第12章"并发",不拘泥于文章本身.文章比较长,笔记分而治之,分上中下三篇完成.

并发之痛

既然是讲并发解决方案,文章开头还是做了一番并发之痛的回顾和现有方案的概括,忆苦思甜,喝水不忘挖坑人.

并发场景引入了很多单线程编程没有的挑战,问题的根源在于代码的执行顺序是不固定的,问题很难重现,调试难,软件测试需要投入更多的精力.作者将解决方案分成两大类:一类将job提前分配避免冲突,另一类是协调多个job的执行.显然后者是更为常见的场景,这种场景下有四种典型的解决模型:locks , actors , trsactional memory,future;文章中只提到了前面的三种,future模型clojure也提供了支持所以补充在这里.

后面作者对上面的locks , actors , trsactional memory 做了一个简单的pros and cons的分析,看下面的思维导图:

future

先说一下文中没有提到的future,future把运算外包给其它线程完成.Clojure中对应的原语就是future.看一下样例代码"

user=> (def long-calculation (future (apply + (range 1e8))))
#'user/long-calculation
user=> (type long-calculation)
clojure.core$future_call$reify__6110
user=> @long-calculation
4999999950000000
user=> (deref long-calculation)
4999999950000000

future宏的作用,在meta里面有很清晰的描述:

user=> (source future)
(defmacro future
  "Takes a body of expressions and yields a future object that will
  invoke the body in another thread, and will cache the result and
  return it on all subsequent calls to deref/@. If the computation has
  not yet finished, calls to deref/@ will block, unless the variant of
  deref with timeout is used. See also - realized?."
  {:added "1.1"}
  [& body] `(future-call (^{:once true} fn* [] ~@body)))
nil

上面的@符号是deref运算符的简记符.

Promise

Promise常常和future一起被提及,Promise

promise可以有超时特性的解析引用对象值,解析过程是阻塞的直到有值.promise只能一次赋值.但promise的特质是并不创建最终为变量赋值的code或者function.promise初始化一个空容器,后续通过deliver填充数据.

user=> (def a (promise))
#'user/a
user=> (def b (promise))
#'user/b
user=> (def c (promise))
#'user/c
user=> (future
  (deliver c (+ @a @b))
  (println "Delivery complete!"))
#<core$future_call$reify__6110@74aa513b: :pending>
user=> (deliver a 101)
#<core$promise$reify__6153@23293541: 101>
user=> (deliver b 102)
Delivery complete!
#<core$promise$reify__6153@582b0e7b: 102>
user=>

看看它的实现:

user=> (source promise)
(defn promise
  "Alpha - subject to change.
  Returns a promise object that can be read with deref/@, and set,
  once only, with deliver. Calls to deref/@ prior to delivery will
  block, unless the variant of deref with timeout is used. All
  subsequent derefs will return the same delivered value without
  blocking. See also - realized?."
  {:added "1.1"
   :static true}
  []
  (let [d (java.util.concurrent.CountDownLatch. 1)
        v (atom d)]
    (reify
     clojure.lang.IDeref
       (deref [_] (.await d) @v)
     clojure.lang.IBlockingDeref
       (deref
        [_ timeout-ms timeout-val]
        (if (.await d timeout-ms java.util.concurrent.TimeUnit/MILLISECONDS)
          @v
          timeout-val))
     clojure.lang.IPending
      (isRealized [this]
       (zero? (.getCount d)))
     clojure.lang.IFn
     (invoke
      [this x]
      (when (and (pos? (.getCount d))
                 (compare-and-set! v d x))
        (.countDown d)
        this)))))

C# Parallel FX

"程序设计语言-实践之路"里面提到的C# Parallel FX的例子是比较老了,用现在的写法举两个例子:

  Task.Factory.StartNew(() => { Console.WriteLine("Hello Parallel Program."); });
 
       Parallel.For(0, 100, (index) =>
                                        {
                                        index.Dump();
                                        });
 
      //Console.WriteLine("Hello Parallel Program.");被外包到其它线程的进行计算,Task背后的实现是进程池.
 
      //下面这个例子使用了Lazy
      Lazy<Task<string>> lazyTask = new Lazy<Task<string>>(() =>
            {
                return new Task<string>(() => { Console.WriteLine("Task Body working......"); return "Task Result"; });
            });
 
      Console.WriteLine("Calling lazy variable");
      Console.WriteLine("Result from task: {0}", lazyData.Value.Result);

继续看图, 上图中的STM简单展开说一下:

STM简介

先解决什么是STM?

维基百科的定义:

In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. A transaction in this context is a piece of code that executes a series of reads and writes to shared memory. These reads and writes logically occur at a single instance in time; intermediate states are not visible to other (successful) transactions. The idea of providing hardware support for transactions originated in a 1986 paper by Tom Knight. The idea was popularized by Maurice Herlihy and J. Eliot B. Moss. In 1995 Nir Shavit and Dan Touitou extended this idea to software-only transactional memory (STM). STM has recently been the focus of intense research and support for practical implementations is growing.

从表现来看,事务中的内存修改就像某一时刻事务提交了修改就立刻完成了,事务提交之前所有的变动对其它的线程是不可见的.事务在一份一致的内存快照之上执行.如果内存数据在A事务中修改了但是事务B首先提交了,那么事务A的代码逻辑就要重新执行.STM就是把传统数据库事务的处理方式在内存里面实现了,所以它并不保证Durable,软件崩溃或者硬件故障都会导致数据丢失,持久化还是要通过关系型数据库之类的实现.TM使用乐观锁,每个事务的执行都会假设当前没有并发写冲突,如果这个假设不成立,事务会丢弃所有已经做的工作从头来过.既然有可能"重试",那就要保证事务中包含的操作是可撤销的.如果有操作不可撤销,整个过程就不是可撤销,可重做的,比如中间产生了I/O.Clojure提供的解决方案套装是Refs和Agents,这个后面会提到.

Transactional Memory能够区别已经提交的值和in-transcation状态的值,在一个事务中,变量的值要么是初始值要么是别的事务成功提交的值.变量的值在事务中修改,这个修改只是对当前事务可见.事务成功提交之后变动对事务之外的代码可见.

STM的优势?

更大程度上的并行性,而不是悲观锁造成的序列化
开发难度低不再考虑执行顺序和锁的获取,释放只需要关注哪些变量需要读写一致
这样的机制可以保证死锁竞争态不出现

STM的问题?

大量的重试带来的浪费
分门别类维护存储状态带来的开销
支持工具缺失;需要工具来查看一个事务的重试次数以及为什么需要重试(方便排错,优化)

Persistent Data Structures ?!

Transactional Memory在可变变量和不可变变量(mutable/immutable)泾渭分明的语言中表现最好,变量只有在事务中是可以修改值的.如果没有语言基础设施的支持,开发者就要自求多福,自己保证只在事务内修改变量值了.Clojure的immutable变量这一基础设施是用Persistent Data Structures实现的,这两个概念之间是什么关系呢?

在并发环境中避免数据被并发修改的一个途径就是提供不可变的变量,变量不会变化也就不需要保护.程序处理数据常见的情况就是基于现有数据创建一份新的数据,比如在list的头上添加一个元素.Persistent Data Structures提供了一个结构基础:它维护了数据的版本,新数据结构与已存在的数据共享内存.这样节省了内存和时间.看下维基百科对Persistent Data Structures的讲解:

In computing, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. (A persistent data structure is not a data structure committed to persistent storage, such as a disk; this is a different and unrelated sense of the word "persistent.")

看下面的思维导图:

如果允许对最新数据进行更新,对历史版本数据查询,称这是Partially Persistent.如果允许对任何版本进行更新和查询,称这是Full Persistence.纯函数式编程语言(Pure Functional)所有数据结构都是fully persistent.

STM可以有很多实现方式,Clojure实现与众不同的是"data coordinated by transactions cannot be modified outside a transaction."(需要事务中协调的数据状态不可以在事务外进行修改).这是在语言层面做了限制而非要求开发者自觉遵守规则.

就到这里,下一篇将继续对"Software Transactional Memory"的研读,关注Clojure语言处理并发的基础原语.

新年快乐!

[1] Race condition http://en.wikipedia.org/wiki/Race_condition

[2] The Transactional Memory / Garbage Collection Analogy

[3] Software transactional memory

[4] Persistent data structures

[5] Persistent Data Structure及其应用

最后,小图一张,新年快乐!