linux 内核的RCU本质

RCU，Read-Copy Update，是一种同步机制，它的本质就是在同步什么？1. 它只有reader-side lock，并且不产生锁竞争。2. 它同步reader-side 临界区和 reclaim-side 临界区，而不是writer-side临界区。3. rcu reader并发访问由共享指针指向的不同版本的数据副本（copy），而(reader/writer) spinlock同步对同一份数据的所有访问。4. rcu writer-side临界区的同步必须由使用者来完成。5. rcu对数据旧副本的读访问和回收进行同步，保护的是数据旧副本。

rcu的主要思想是将update拆分成removal和reclamation两步，并且延后(defer)(旧副本的)析构回收(reclaim)。

rcu将原本同一份数据互斥的read操作和write操作，转化成read操作和update操作在一份数据不同版本的副本上，依赖的是指针指向的切换。由于数据有不同副本，旧副本必须要回收，所以rcu将update操作拆分成removal操作和reclamation操作。这样一来，数据的每一个副本的write操作从竞态条件分离出来，根据不需要write_lock，包含在update-side的removal操作步骤。而read操作则是并发在不同的副本上，它与update-side的removal操作步骤唯一的竞态条件发生在共享指针的访问上。副本的切换是由update-side的removal操作完成的，它只使用了内存屏障来同步reader-side对共享指针的访问。update-side的reclammation操作完成对数据的旧副本回收，它必须与进行访问旧副本的reader-side临界区同步，但是并不与reader-side lock存在锁竞争，换句话说，reader-side临界区因为锁竞争而阻塞在reader-side lock上。事实也是如此，虽然rcu提供了原语操作rcu_read_lock和rcu_read_unlock，但是在里面的锁计数并没有进行原子操作，并且锁的计数不是对应于一个锁，而对应于一个线程，只用来描述rcu_read_lock在当前线程嵌套的层数。所以虽然不同cpu上的线程都进入了rcu reader-side临界区，但是它们的却各自使用一个锁计数，因此reader-side临界区不会阻塞在rcu_read_lock上。reclaim-side临界区必须同步于访问旧副本reader-side临界区之后，但是并不与访问新副本reader-side临界区同步。这样来看，rcu的同步机制保护的不是同一份数据的访问，而是一份数据的旧副本的访问和回收。

下面是官方设计文档

What is RCU?

RCU is a synchronization mechanism that was added to the Linux kernel
during the 2.5 development effort that is optimized for read-mostly
situations.  Although RCU is actually quite simple once you understand it,
getting there can sometimes be a challenge.  Part of the problem is that
most of the past descriptions of RCU have been written with the mistaken
assumption that there is "one true way" to describe RCU.  Instead,
the experience has been that different people must take different paths
to arrive at an understanding of RCU.  This document provides several
different paths, as follows:

1.    RCU OVERVIEW
2.    WHAT IS RCU'S CORE API?
3.    WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
4.    WHAT IF MY UPDATING THREAD CANNOT BLOCK?
5.    WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
6.    ANALOGY WITH READER-WRITER LOCKING
7.    FULL LIST OF RCU APIs
8.    ANSWERS TO QUICK QUIZZES

People who prefer starting with a conceptual overview should focus on
Section 1, though most readers will profit by reading this section at
some point.  People who prefer to start with an API that they can then
experiment with should focus on Section 2.  People who prefer to start
with example uses should focus on Sections 3 and 4.  People who need to
understand the RCU implementation should focus on Section 5, then dive
into the kernel source code.  People who reason best by analogy should
focus on Section 6.  Section 7 serves as an index to the docbook API
documentation, and Section 8 is the traditional answer key.

So, start with the section that makes the most sense to you and your
preferred method of learning.  If you need to know everything about
everything, feel free to read the whole thing -- but if you are really
that type of person, you have perused the source code and will therefore
never need this document anyway.  ;-)


1.  RCU OVERVIEW

The basic idea behind RCU is to split updates into "removal" and
"reclamation" phases.  The removal phase removes references to data items
within a data structure (possibly by replacing them with references to
new versions of these data items), and can run concurrently with readers.
The reason that it is safe to run the removal phase concurrently with
readers is the semantics of modern CPUs guarantee that readers will see
either the old or the new version of the data structure rather than a
partially updated reference.  The reclamation phase does the work of reclaiming
(e.g., freeing) the data items removed from the data structure during the
removal phase.  Because reclaiming data items can disrupt any readers
concurrently referencing those data items, the reclamation phase must
not start until readers no longer hold references to those data items.

Splitting the update into removal and reclamation phases permits the
updater to perform the removal phase immediately, and to defer the
reclamation phase until all readers active during the removal phase have
completed, either by blocking until they finish or by registering a
callback that is invoked after they finish.  Only readers that are active
during the removal phase need be considered, because any reader starting
after the removal phase will be unable to gain a reference to the removed
data items, and therefore cannot be disrupted by the reclamation phase.

So the typical RCU update sequence goes something like the following:

a.    Remove pointers to a data structure, so that subsequent
    readers cannot gain a reference to it.

b.    Wait for all previous readers to complete their RCU read-side
    critical sections.

c.    At this point, there cannot be any readers who hold references
    to the data structure, so it now may safely be reclaimed
    (e.g., kfree()d).

Step (b) above is the key idea underlying RCU's deferred destruction.
The ability to wait until all readers are done allows RCU readers to
use much lighter-weight synchronization, in some cases, absolutely no
synchronization at all.  In contrast, in more conventional lock-based
schemes, readers must use heavy-weight synchronization in order to
prevent an updater from deleting the data structure out from under them.
This is because lock-based updaters typically update data items in place,
and must therefore exclude readers.  In contrast, RCU-based updaters
typically take advantage of the fact that writes to single aligned
pointers are atomic on modern CPUs, allowing atomic insertion, removal,
and replacement of data items in a linked structure without disrupting
readers.  Concurrent RCU readers can then continue accessing the old
versions, and can dispense with the atomic operations, memory barriers,
and communications cache misses that are so expensive on present-day
SMP computer systems, even in absence of lock contention.

In the three-step procedure shown above, the updater is performing both
the removal and the reclamation step, but it is often helpful for an
entirely different thread to do the reclamation, as is in fact the case
in the Linux kernel's directory-entry cache (dcache).  Even if the same
thread performs both the update step (step (a) above) and the reclamation
step (step (c) above), it is often helpful to think of them separately.
For example, RCU readers and updaters need not communicate at all,
but RCU provides implicit low-overhead communication between readers
and reclaimers, namely, in step (b) above.

So how the heck can a reclaimer tell when a reader is done, given
that readers are not doing any sort of synchronization operations???
Read on to learn about how RCU's API makes this easy.

whatisRCU

RCU的思想是将updates分离成removal（迁移）和reclamation（回收）两个动作。
removal：
将数据结构的成员用新版本替换旧版本的引用。可以与readers安全地并发。
原因是现在cpu保证reader能够可见一个数据结构的新旧两个版本。
reclamation：
回收在removal动作过程中移除的数据项。由于回收动作会破坏任何并发的readers在那些要回收的数据项上的引用，
所以reclamation一定不能开始直到没有readers保留那些数据项。

updates分离的好处是，允许removal（迁移）动作可以立即执行，而延后reclamation（回收）动作起到readers在removal期间的所有活动
完成。延后的reclamation要么同步阻塞等待，要么注册异步回调。
只要关心进行removal阶段的readers活动，因为在removal阶段之后的readers开始的活动是不可能得到引用到被移除的数据项，
就不会受到reclamation的破坏。

所以典型的RCU update顺序如下三步：
1. 移去一个数据结构的指针(s)，以使后到的readers不能得到它的引用。
2. 等待所有已经进行的readers完成RCU读侧（端）临界区。
3. 在某一时刻，没有任何readers保留数据结构的（旧）引用，此时就可以安全地reclaim。

第2步是RCU延后析构的主要底层思想。阻塞等待所有readers完成允许RCU readers使用十分轻型的同步，
某些情况下，完全无同步。但是，在更传统的基于锁的情况，readers必须使用重型同步来避免一个updater删除数据结构，它们正使用。
因为基于锁的updaters典型地，必须排他readers才能正确update数据项。但是，基于RCU的updaters典型地有这样的事实优点，
对单一对齐的指针的写是原子的，在现代cpu，允许对一链表结构的数据项原子地插入，移除，以及替换，而不破坏readers。
并发RCU readers能够继续访问旧版本（数据项），还能够不得不原子操作，内存屏障，和通讯快存不命中（现代SMP计算机系统昂贵的开销），甚至在锁竞争。

第3步，updater将执行迁移和回收，但让一个完全不同的线程去执行回收却十分有帮助，正如事实上内核的dcache例子。
甚至于在同一线程上执行第1步和第3步，将它们分开来思考仍然十分有帮助。
例如，RCU readers和updaters完全不需要进行通讯，但RCU提供隐式的低成本的通讯，带名称的，在第2步。

在源代码rcu/rcupdate.h对rcu_read_lock有这样的注释说明：

/**
 * rcu_read_lock() - mark the beginning of an RCU read-side critical section
 *
 * When synchronize_rcu() is invoked on one CPU while other CPUs
 * are within RCU read-side critical sections, then the
 * synchronize_rcu() is guaranteed to block until after all the other
 * CPUs exit their critical sections.  Similarly, if call_rcu() is invoked
 * on one CPU while other CPUs are within RCU read-side critical
 * sections, invocation of the corresponding RCU callback is deferred
 * until after the all the other CPUs exit their critical sections.
 *
 * Note, however, that RCU callbacks are permitted to run concurrently
 * with new RCU read-side critical sections.  One way that this can happen
 * is via the following sequence of events: (1) CPU 0 enters an RCU
 * read-side critical section, (2) CPU 1 invokes call_rcu() to register
 * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
 * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
 * callback is invoked.  This is legal, because the RCU read-side critical
 * section that was running concurrently with the call_rcu() (and which
 * therefore might be referencing something that the corresponding RCU
 * callback would free up) has completed before the corresponding
 * RCU callback is invoked.
 *
 * RCU read-side critical sections may be nested.  Any deferred actions
 * will be deferred until the outermost RCU read-side critical section
 * completes.
 *
 * You can avoid reading and understanding the next paragraph by
 * following this rule: don't put anything in an rcu_read_lock() RCU
 * read-side critical section that would block in a !PREEMPT kernel.
 * But if you want the full story, read on!
 *
 * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),
 * it is illegal to block while in an RCU read-side critical section.
 * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPT
 * kernel builds, RCU read-side critical sections may be preempted,
 * but explicit blocking is illegal.  Finally, in preemptible RCU
 * implementations in real-time (with -rt patchset) kernel builds, RCU
 * read-side critical sections may be preempted and they may also block, but
 * only when acquiring spinlocks that are subject to priority inheritance.
 */

下面一段注释说明根本不需要write_lock，与read_lock进行锁竞争，但是写与写操作之间必须由rcu的使用都来完成它们的同步。

/*
 * So where is rcu_write_lock()?  It does not exist, as there is no
 * way for writers to lock out RCU readers.  This is a feature, not
 * a bug -- this property is what provides RCU's performance benefits.
 * Of course, writers must coordinate with each other.  The normal
 * spinlock primitives work well for this, but any other technique may be
 * used as well.  RCU does not care how the writers keep out of each
 * others' way, as long as they do so.
 */

posted on 2017-05-02 16:39 bbqz007 阅读(1645) 评论(0) 编辑收藏举报