The Joy of Clojure – Laziness(6.3)

Clojure is partially a lazy language.

Clojure is lazy in the way it handles its sequence types.

Laziness是FP重要的特征, pure FP应该是lazy language, 比如Haskell. 当然完全的lazy带来问题是, 无法保证程序的执行顺序.
而Clojure是在实用性和pure FP之间做妥协的, 所以他只是部分的支持lazy, 主要就表现在处理sequence的时候, 在其他大部分的时候仍然是eager(not lazy)

比如, 典型的例子, 函数参数,
(- 13 (+ 2 2))
对于eager的做法是, 先算出2+2=4, 然后将4传入
而lazy的做法是, 直接将(+ 2 2)传入, 至到减法时, 必须要算的时候, 才会算出4

 

既然Clojure只有在处理sequence时, 才是lazy的, 具体来看看clojure的lazy-seq

Understanding the lazy-seq recipe

先来个例子看看, 为什么需要lazy-seq?

(defn rec-step [[x & xs]] ;第一个item传入x,剩下的传入xs
  (if x
    [x (rec-step xs)]
    []))


(rec-step [1 2 3 4])
;=> [1 [2 [3 [4 []]]]]

(rec-step (range 200000))
;=> java.lang.StackOverflowError


会blow stack, 因为使用递归函数, 所以要解决这个问题, 要不使用tail optimization, 但这儿使用lazy-seq更合适一些

怎样使用lazy-seq, 有如下recipe...

lazy-seq recipe for applying laziness to your own functions:

1. Use the lazy-seq macro at the outermost level of your lazy sequence producing expression(s).
2. If you happen to be consuming another sequence during your operations, then use ‘rest’ instead of ‘next’. 因为next比rest多一次realize, 因为next要先判断是否为空
3. Prefer higher-order functions when processing sequences.
4. Don’t hold onto your head.

根据这个recipe, lazy version实现如下, 其实只是简单的加上lazy-seq封装(first, rest其实等同于上面的写法)

(defn lz-rec-step [s]
  (lazy-seq
    (if (seq s)
    [(first s) (lz-rec-step (rest s))]
    [])))


更简单的形式如下,

(defn simple-range [i limit]
  (lazy-seq
    (when (< i limit)
    (cons i (simple-range (inc i) limit)))))

(simple-range 0 9)
;=> (0 1 2 3 4 5 6 7 8)


(simple-range 0 200000) 是否会blew stack?

不会, 原因是没有真正使用递归
你可以试想, 如果直接使用递归, lazy是没有意义的, 递归需要从底至上的回溯得到值
所以虽然这儿加上lazy-seq, 形式上仍然好像使用了函数递归调用, 其实lazy-seq这儿做了优化.

 

那么lazy-seq做了怎样的优化?

image

Each step of a lazy seq may be in one of two states. 
Unrealized state, it’ll contain a function or closure of no arguments (a thunk) that can be called later to realize the step.
Realized state, the thunk’s return value is cached instead, and the thunk itself is released
Note that although not shown here, a realized lazy seq may simply contain nothing at all, called nil, indicating the end of the seq.

比较清晰的可以看出lazy-seq是怎么实现的
刚开始所有items都是unrealized, 其实我的理解就是一个闭包(simple-range 0 9)
每调用一次first, 就会做一次realize, 并产生一个realized item, 如图, 调用两次first产生两个realized item, 0, 1, 而rest得到剩下的unrealized items(simple-range 2 9)
可以看出, 完全没有使用到stack, 所以不会出现blow stack的现象.
需要注意的是, realized item只是被cache在memory里面, 如果没有被referenced, 会自动被垃圾回收掉, 所以realized lazy seq是不会包含任何结果的, nil, 用于指向seq尾部.
所以对于很大的seq, 不要hold head, 即不要refer realized item, 这样会导致无法被自动回收, 导致outofmemory

 

对于lose head, 可以看下面的例子, 只是(first r) (last r)的顺序不同, 就会导致outofmemory

(let [r (range 1e9)] [(first r) (last r)])
;=> [0 999999999]
(let [r (range 1e9)] [(last r) (first r)])
; java.lang.OutOfMemoryError: GC overhead limit exceeded

原因就是第二个例子, 需要一直hold主first item, 所以导致JVM无法进行GC, 所以OutOfMemoryError
部分由于clojure不是pure FP, 必须严格按照顺序执行, 对于Haskell这样的pure FP, 可以通过简单的complier优化解决这个问题

这个例子如果不够清晰, 看这个blog, 配图的, 很清楚. Clojure惰性序列的头保持问题

 

The delay and force macros

Although Clojure sequences are largely lazy, Clojure itself isn’t.
In most cases, expressions in Clojure are evaluated once prior to their being passed into a function rather than at the time of need. But Clojure does provide mechanisms for implementing what are known as call-by-need semantics. The most obvious of these mechanisms is itsmacro facilities, but we’ll defer that discussion until chapter 8.

The other mechanism for providing what we’ll call explicit laziness are Clojure’s delay and force.
In short, the delay macro is used to defer the evaluation of an expression until explicitly forced using the force function.

Clojure本身不是lazy语言, 但他也支持这种call by need的语义. 简单的方式就是通过delay, force

例子,

(defn inf-triangles [n]  ;函数返回lazy linked-list, head存值, tail存产生下个item的闭包
  {:head (triangle n)    ;triangle表示计算逻辑
   :tail (delay (inf-triangles (inc n)))}) ;通过delay, 不会实际调用函数

 

(defn head [list] (:head list))
(defn tail [list] (force (:tail list))) ;通过force, 强制delay的闭包执行

 

;定义tri-nums, 这种lazy structure叫 ‘head strict’, 不同于lazy-seq, 因为head的值会先被realize, 而lazy-seq不会realize, 必须显示调用first

(def tri-nums (inf-triangles 1))
(head tri-nums)
;=> 1
(head (tail tri-nums))
;=> 3
(head (tail (tail tri-nums)))
;=> 6

image

 

Of course, writing programs using delay and force is an onerous way to go about the problem of laziness, and you’d be better served by using Clojure’s lazy sequences to full effect rather than building your own from these basic blocks.

posted on 2013-02-17 15:28  fxjwind  阅读(602)  评论(0编辑  收藏  举报