Programming clojure – Concurrency - fxjwind

Programming clojure – Concurrency

Clojure的并发, 这兄弟写的比较系统, http://www.blogjava.net/killme2008/archive/2010/07/archive/2010/07/14/326027.html

Concurrency is a fact of life and, increasingly, a fact of software.

为什么需要并发?
• Expensive computations may need to execute in parallel on multiple cores .
• Tasks that are blocked waiting for a resource.
• User interfaces need to remain responsive while performing longrunning tasks.
• Operations that are logically independent are easier to implement

并发的问题是什么? 毫无疑问是状态同步
如果大家各跑各的, 象pure FP, 那没有任何问题.
但是当需要协调和同步的时候, 这个问题就变的很复杂.
为了解决这个问题, Clojure提供了强大的并发库, 并把需要同步的状态分成4类, 分别用不同的API来handle

Clojure provides a powerful concurrency library, consisting of four APIs that enforce different concurrency models: refs, atoms, agents, and vars.
• Refs manage coordinated, synchronous changes to shared state.
• Atoms manage uncoordinated, synchronous changes to shared state.
• Agents manage asynchronous changes to shared state.
• Vars manage thread-local state.

6.1 The Problem with Locks

对于传统的语言, 状态默认是可变的(mutable), 所以当处于并发状况下, 对于所有可能被并发写的状态都要进行锁保护. 如果不小心遗漏了, 就会有很大的问题. 并且加锁并不是一件简单的事, Race conditions, Deadlocks等都很复杂... 可以说这种solution完全没有美感, 是否有更好的方案?

Yes, there is. In Clojure, immutable state is the default. Most data is immutable. The small parts of the codebase that truly benefit from mutability are distinct and must explicitly select one or more concurrency APIs. Using these APIs, you can split your models into two layers:

• A functional model that has no mutable state. Most of your code will normally be in this layer, which is easier to read, easier to test, and easier to run concurrently.
• A mutable model for the parts of the application that you find more convenient to deal with using mutable state (despite its disadvantages).

当然这是FP核心优势之一, 妥善对待可变状态
Clojure的好处是, 所有的状态默认是不可变的, 所以你只需要关注很小一部分真正需要mutable的状态(在默认可变的情况下, 你需要考虑full codebase, 任意一处遗漏都会带来很大的问题). 并且显式的把mutable state独立出来, 便于管理.

6.2 Refs and Software Transactional Memory

Most objects in Clojure are immutable.
When you really want mutable data, creating a mutable reference (ref) to an immutable object.
Clojure对象本身是不可变的, 如果需要可变数据, 做法是创建可变的reference, 可以指向不同的immutable object

Refs支持同步change state, 并可以支持同时在transaction里面change多个states

创建Ref

对于播放器应用, 歌曲本身是不变的对象, 但是当前播放歌曲, 是一个变化的状态
创建ref, current-track

(def current-track (ref "Mars, the Bringer of War"))

读取contents of the reference, you can call deref (@ reader macro),

(deref current-track)
"Mars, the Bringer of War"

@current-track
"Mars, the Bringer of War"

修改Ref, ref-set

(ref-set reference new-value)

直接调用ref-set修改reference会报错, 这是比较好的保护机制, 防止误操作.
在clojure可以使用transaction进行封装, 而对于一般语言必须使用lock, 这取决于实现方式.

Because refs are mutable, you must protect their updates. In many languages, you would use a lock for this purpose.
In Clojure, you can use a transaction. Transactions are wrapped in a dosync:

(dosync & exprs)
(dosync (ref-set current-track "Venus, the Bringer of Peace")) 
"Venus, the Bringer of Peace"

如上例子完成ref的切换, 歌曲对象本身没有发生变化.

Transactional Properties, 保证ACI, 不保证D
Like database transactions, STM transactions guarantee some important properties:
• Updates are atomic.
• Updates are consistent.
• Updates are isolated.
Databases provide the additional guarantee that updates are durable.
Because Clojure’s transactions are in-memory transactions, Clojure does not guarantee that updates are durable.

Transaction包含多条语句,

(def current-track (ref "Venus, the Bringer of Peace")) 
(def current-composer (ref "Holst")) 

(dosync 
(ref-set current-track "Credo") 
(ref-set current-composer "Byrd"))

Read-and-write, alter, commute

ref-set, 直接覆盖write, 比较简单
更常用的是Read-and-write, 比如简单的累加器, 先要知道当前值是多少, 才能更新.

(alter ref update-fn & args...)  ;ref = update-fn(ref, &args)

messager的应用, 更新message

(defn add-message [msg] 
  (dosync (alter messages conj msg)))

How STM Works: MVCC
Clojure’s STM uses a technique called Multiversion Concurrency Control (MVCC), which is also used in several major databases.
这个机制在DB里面也被广泛使用, 比如couchDB. 可以参考Practical Clojure - 简介
同时clojure通过persistent data structures来保证MVCC的空间利用效率
这就是为什么Clojure可以简单的实现transaction和保证ACI的原因, 因为所有的更新, 都只有在reference切换的时候才对外可见.

那么对于read-and-write肯定要解决冲突问题, 在read到write的过程中, 如果有其他transaction修改value, 怎么处理?

做法是当前的transaction会被强制retry, 从而保证transaction内的执行顺序

What if you don’t care that another transaction altered a reference out from under you in the middle of your transaction?

If another transaction alters a ref that you are trying to commute, the STM will not restart your transaction. Instead, it will simply run your commute function again, out of order.

如果可以容忍transaction过程中被其他的transaction alter reference, 那么就使用commute
当发生冲突时, 不会restart整个transaction, 而只是从新run一下commute, 这就意味着commute更新什么时候执行都可以(否则就会有问题)
这样便于STM系统进行reorder优化, Tradeoff, 换取更高并发程度, 更好性能

(commute ref update-fn & args...)

在没有特别要求的情况下, 不要使用commute, 因为用alter逻辑一定是正确的, 而误用commute会导致错误

Validation to Refs, 增加约束条件

Validation function to the messages reference that guarantees that all messages have non-nil values for :sender and :text:

(def validate-message-list 
    (partial every? #(and (:sender %) (:text %))))
(def messages (ref () :validator validate-message-list))

6.3 Use Atoms for Uncoordinated, Synchronous Updates

Atoms are a lighter-weight mechanism than refs.
Where multiple ref updates can be coordinated in a transaction, atoms allow updates of a single value, uncoordinated with anything else.

Atoms就是轻量级的refs, 效率更高. 只允许对单个state进行更新.
所以不需要transaction的封装, 减少开销

Atoms do not participate in transactions and thus do not require a dosync. To set the value of an atom, simply call reset!

(def current-track (atom "Venus, the Bringer of Peace"))
(reset! current-track "Credo") ;ref-set

为啥reset!和swap!都要加个!?

(swap! an-atom f & args) ;alter
(def current-track (atom {:title "Credo" :composer "Byrd"}))
(swap! current-track assoc :title "Sancte Deus")

6.4 Use Agents for Asynchronous Updates

Agent用于异步更新

(def counter (agent 0))

更新agent的命令, send. 很形象, 异步就是发过去就返回.

(send agent update-fn & args)
(send counter inc)

Notice that the call to send does not return the new value of the agent, returning instead the agent itself.

If you want to be sure that the agent has completed the actions you sent to it, you can call await or await-for:

(await & agents) 
(await-for timeout-millis & agents)

Validating Agents and Handling Errors

Agent也可以和ref一样, 增加约束条件

(def counter (agent 0 :validator number?))

如果valid条件不满足会报错, 但对于所有异步而言, 怎样handle errors?

(send counter (fn [_] "boo"))

查看agent, 发现有错误

@counter
java.lang.Exception: Agent has errors

通过agent-errors去查看具体错误信息.

(agent-errors counter)
(#<IllegalStateException ...>)

最后通过clear-agent-errors, 清除error, 这样再查看agent, 就不会报错了

(clear-agent-errors counter)
@counter
0

Including Agents in Transactions

Transactions should not have side effects, because Clojure may retry a transaction an arbitrary number of times.
However, sometimes you want a side effect when a transaction succeeds.
Agents provide a solution.
If you send an action to an agent from within a transaction, that action will be sent exactly once, if and only if the transaction succeeds.

因为冲突的原因, transaction会可能被retry很多次, 所以transaction里面不能有side effects, 比如写文件, IO等, 否则就会被执行很多遍, 大家都清楚clojure transaction的实现, 不可能象mysql那样有回滚的操作.
所以agent作为一个很好的solution, 因为只有当transaction成功时, 才会发一次agent action.

这儿谈到side effects一般都是IO操作, 比较耗时, 所以用send发送不合适, 需要使用send-off

具体agent实现的原理参考这篇blog, http://www.blogjava.net/killme2008/archive/2010/07/archive/2010/07/archive/2010/07/19/326540.html

agent本身只是个普通的java对象，它的内部维持一个状态和一个队列
然后线程会来队列里面取action, 并处理. 这些线程是放在线程池里面的
两种线程池,
固定大小的线程池(CPU核数加上2), 处理send发送的action
没有大小限制(取决于内存)的线程池, 处理send-off发送的action
最重要的是, 这些线程池是所有线程公用的, 所以耗时的用send发送, 会导致其他agent的更新被block.
实现并不复杂, 典型的producer-consumer模式

6.5 Managing Per-Thread State with Vars

可以参考这个blog
http://www.blogjava.net/killme2008/archive/2010/07/archive/2010/07/archive/2010/07/archive/2010/07/23/326976.html

var的binding有几种方式,

root binding, 用def定义, 被所有线程共享

local binding, 用let定义, 即静态lexical binding, 因其只在lexical scope内起作用

thread-local dynamic binding, 用binding定义, 线程内的binding, 不局限于lexical scope

通过例子来看let和binding的不同,

(def foo 1)
(defn print-foo [] (println foo))
  
(let [foo 2] (print-foo)) ;不在同一个lexical scope中, 所以let不起作用
1
(binding [foo 2] (print-foo)) ;在同一个线程中, binding起作用
2

这是由于let的绑定是静态的，它并不是改变变量foo的值，而是用一个词法作用域的foo“遮蔽”了外部的foo的值.
而binding则是在变量的root binding之外在线程的ThreadLocal内存储了一个绑定值, 所以只要在该线程内就可以看到, 而不限于lexical scope

binding用于临时的修改function逻辑

只在该线程范围内, 给fib加上memorize

user=> (defn fib [n]
         (loop [ n n r 1]
            (if (= n 1)
                r
                (recur (dec n) (* n r)))))  
user=> (binding [fib (memoize fib)] 
                (call-fibs 9 10))
3991680

Used occasionally, dynamic binding has great power. But it should not become your primary mechanism for extension or reuse. Functions that use dynamic bindings are not pure functions and can quickly lose the benefits of Clojure’s functional style.
慎用, not pure, 会引入复杂性, 和FP的初衷不符合.

Working with Java Callback APIs

Several Java APIs depend on callback event handlers.
XML parsers such as SAX depend on the user implementing a callback handler interface.

These callback handlers are written with mutable objects in mind. Also, they tend to be single-threaded.
In Clojure, the best way to meet such APIs halfway is to use dynamic bindings. This will involve mutable references that feel almost like variables, but because they are used in a single-threaded setting, they will not present any concurrency problems.

Clojure provides the set! special form for setting a thread-local dynamic binding:
(set! var-symbol new-value)

set! should be used rarely. In fact, the only place in the entire Clojure core that uses set! is the Clojure implementation of a SAX ContentHandler.

Clojure可以用set!来修改thread-local dynamic binding, 但是应该尽量慎用set!, 仅仅用于对Java Callback的处理

posted on 2013-02-20 11:55 fxjwind 阅读(583) 评论(0) 收藏举报

刷新页面返回顶部

fxjwind