解密随机数生成器（二）——从java源码看线性同余算法

Random

Java中的Random类生成的是伪随机数，使用的是48-bit的种子，然后调用一个linear congruential formula线性同余方程（Donald Knuth的编程艺术的3.2.1节）

如果两个Random实例使用相同的种子，并且调用同样的函数，那么生成的sequence是相同的

也可以调用Math.random()生成随机数

Random实例是线程安全的，但是并发使用Random实例会影响效率，可以考虑使用java.util.concurrent.ThreadLocalRandom(jdk1.7)。

/**
 * A random number generator isolated to the current thread.  Like the
 * global {@link java.util.Random} generator used by the {@link
 * java.lang.Math} class, a {@code ThreadLocalRandom} is initialized
 * with an internally generated seed that may not otherwise be
 * modified. When applicable, use of {@code ThreadLocalRandom} rather
 * than shared {@code Random} objects in concurrent programs will
 * typically encounter much less overhead and contention.  Use of
 * {@code ThreadLocalRandom} is particularly appropriate when multiple
 * tasks (for example, each a {@link ForkJoinTask}) use random numbers
 * in parallel in thread pools.
 *
 * <p>Usages of this class should typically be of the form:
 * {@code ThreadLocalRandom.current().nextX(...)} (where
 * {@code X} is {@code Int}, {@code Long}, etc).
 * When all usages are of this form, it is never possible to
 * accidently share a {@code ThreadLocalRandom} across multiple threads.
 *
 * <p>This class also provides additional commonly used bounded random
 * generation methods.
 *
 * <p>Instances of {@code ThreadLocalRandom} are not cryptographically
 * secure.  Consider instead using {@link java.security.SecureRandom}
 * in security-sensitive applications. Additionally,
 * default-constructed instances do not use a cryptographically random
 * seed unless the {@linkplain System#getProperty system property}
 * {@code java.util.secureRandomSeed} is set to {@code true}.
 *
 * @since 1.7
 * @author Doug Lea
 */
public class ThreadLocalRandom extends Random {

 int nextInt = ThreadLocalRandom.current().nextInt(10);

Random实例不是安全可靠的加密，可以使用java.security.SecureRandom来提供一个可靠的加密。

Random implements Serializable 可序列化的

AtomicLong seed 原子变量

解密随机数生成器（2）——从java源码看线性同余算法

上篇博客中，我们了解了基于物理现象的真随机数生成器，然而，真随机数产生速度较慢，为了实际计算需要，计算机中的随机数都是由程序算法，也就是某些公式函数生成的，只不过对于同一随机种子与函数，得到的随机数列是一定的，因此得到的随机数可预测且有周期，不能算是真正的随机数，因此称为伪随机数（Pseudo Random Number）。

不过，别看到伪字就瞧不起，这里面也是有学问的，看似几个简简单单的公式可能是前辈们努力了几代的成果，相关的研究可以写好几本书了！
顺便提一下，亚裔唯一图灵奖得主姚期智，研究的就是伪随机数生成论（The pseudo random number generating theory）。
在这里，我重点介绍两个常用的算法：同余法（Congruential method）和梅森旋转算法（Mersenne twister）

1、同余法

同余法（Congruential method）是很常用的一种随机数生成方法，在很多编程语言中有应用，最明显的就是java了，java.util.Random类中用的就是同余法中的一种——线性同余法（Linear congruential method），除此之外还有乘同余法（Multiplicative congruential method）和混合同余法（Mixed congruential method）。好了，现在我们就打开java的源代码，看一看线性同余法的真面目！

在Eclipse中输入java.util.Random，按F3转到Random类的源代码：

首先，我们看到这样一段说明：

翻译过来是：

这个类的一个实现是用来生成一串伪随机数。这个类用了一个48位的种子，被线性同余公式修改用来生成随机数。（见Donald Kunth《计算机编程的艺术》第二卷，章节3.2.1）

显然，java的Random类使用的是线性同余法来得到随机数的。

接着往下看，我们找到了它的构造函数与几个方法，里面包含了获得48位种子的过程：

private static final AtomicLong seedUniquifier = new AtomicLong(8682522807148012L);
/**
 * Creates a new random number generator. This constructor sets
 * the seed of the random number generator to a value very likely
 * to be distinct from any other invocation of this constructor.
 */
public Random() {
    this(seedUniquifier() ^ System.nanoTime());
}

private static long seedUniquifier() {
    // L'Ecuyer, "Tables of Linear Congruential Generators of
    // Different Sizes and Good Lattice Structure", 1999
    for (;;) {
        long current = seedUniquifier.get();
        long next = current * 181783497276652981L;
        if (seedUniquifier.compareAndSet(current, next))
            return next;
    }
}

private static final AtomicLong seedUniquifier
    = new AtomicLong(8682522807148012L);
public Random(long seed) {
    if (getClass() == Random.class)
        this.seed = new AtomicLong(initialScramble(seed));
    else {
        // subclass might have overriden setSeed
        this.seed = new AtomicLong();
        setSeed(seed);
    }
}
private static long initialScramble(long seed) {
    return (seed ^ multiplier) & mask;
}

java.util.concurrent.atomic.AtomicLong
public final boolean compareAndSet(long expect,
                                   long update)
Atomically sets the value to the given updated value if the current value == the expected value.
Parameters:
expect - the expected value
update - the new value
Returns:
true if successful. False return indicates that the actual value was not equal to the expected value.

这里使用了System.nanoTime()方法来得到一个纳秒级的时间量，参与48位种子的构成，然后还进行了一个很变态的运算——不断乘以181783497276652981L，直到某一次相乘前后结果相同——来进一步增大随机性，这里的nanotime可以算是一个真随机数，不过有必要提的是，nanoTime和我们常用的currenttime方法不同，返回的不是从1970年1月1日到现在的时间，而是一个随机的数——只用来前后比较计算一个时间段，比如一行代码的运行时间，数据库导入的时间等，而不能用来计算今天是哪一天。

    /**
     * Returns the current value of the running Java Virtual Machine's
     * high-resolution time source, in nanoseconds.
     *
     * <p>This method can only be used to measure elapsed time and is
     * not related to any other notion of system or wall-clock time.
     * The value returned represents nanoseconds since some fixed but
     * arbitrary <i>origin</i> time (perhaps in the future, so values
     * may be negative).  The same origin is used by all invocations of
     * this method in an instance of a Java virtual machine; other
     * virtual machine instances are likely to use a different origin.
     *
     * <p>This method provides nanosecond precision, but not necessarily
     * nanosecond resolution (that is, how frequently the value changes)
     * - no guarantees are made except that the resolution is at least as
     * good as that of {@link #currentTimeMillis()}.
     *
     * <p>Differences in successive calls that span greater than
     * approximately 292 years (2<sup>63</sup> nanoseconds) will not
     * correctly compute elapsed time due to numerical overflow.
     *
     * <p>The values returned by this method become meaningful only when
     * the difference between two such values, obtained within the same
     * instance of a Java virtual machine, is computed.
     *
     * <p> For example, to measure how long some code takes to execute:
     *  <pre> {@code
     * long startTime = System.nanoTime();
     * // ... the code being measured ...
     * long estimatedTime = System.nanoTime() - startTime;}</pre>
     *
     * <p>To compare two nanoTime values
     *  <pre> {@code
     * long t0 = System.nanoTime();
     * ...
     * long t1 = System.nanoTime();}</pre>
     *
     * one should use {@code t1 - t0 < 0}, not {@code t1 < t0},
     * because of the possibility of numerical overflow.
     *
     * @return the current value of the running Java Virtual Machine's
     *         high-resolution time source, in nanoseconds
     * @since 1.5
     */
    public static native long nanoTime();

好了，现在我不得不佩服这位工程师的变态了：到目前为止，这个程序已经至少进行了三次随机：

1、获得一个长整形数作为“初始种子”（系统默认的是8682522807148012L）

2、不断与一个变态的数——181783497276652981L相乘（天知道这些数是不是工程师随便滚键盘滚出来的-.-）得到一个不能预测的值，直到能把这个不能事先预期的值赋给Random对象的静态常量seedUniquifier 。因为多线程环境下赋值操作可能失败，就for(;;)来保证一定要赋值成功

3、与系统随机出来的nanotime值作异或运算，得到最终的种子

再往下看，就是我们常用的得到随机数的方法了，我首先找到了最常用的nextInt（）函数，代码如下：

public int nextInt() {
    return next(32);
}

代码很简洁，直接跳到了next函数：

protected int next(int bits) {
    long oldseed, nextseed;
    AtomicLong seed = this.seed;
    do {
        oldseed = seed.get();
        nextseed = (oldseed * multiplier + addend) & mask;
    } while (!seed.compareAndSet(oldseed, nextseed));
    return (int)(nextseed >>> (48 - bits));
}

OK,祝贺一下怎么样，因为我们已经深入到的线性同余法的核心了——没错，就是这几行代码！

在分析这段代码前，先来简要介绍一下线性同余法。

在程序中为了使表达式的结果小于某个值，我们常常采用取余的操作，结果是同一个除数的余数，这种方法叫同余法（Congruential method）。

线性同余法是一个很古老的随机数生成算法，它的数学形式如下：

Xn+1 = (a*Xn+c)(mod m)

其中，

m>0,0<a<m,0<c<m

这里Xn这个序列生成一系列的随机数，X0是种子。随机数产生的质量与m，a，c三个参数的选取有很大关系。这些随机数并不是真正的随机，而是满足在某一周期内随机分布，这个周期的最长为m。根据Hull-Dobell Theorem，当且仅当：

1. c和m互素;

2. a-1可被所有m的质因数整除;

3. 当m是4的整数倍，a-1也是4的整数倍时，周期为m。所以m一般都设置的很大，以延长周期。

现在我们回过头来看刚才的程序，注意这行代码：

nextseed = (oldseed * multiplier + addend) & mask;

和Xn+1=(a*Xn+c)(mod m)的形式很像有木有！

没错，就是这一行代码应用到了线性同余法公式！不过还有一个问题：怎么没见取余符号？嘿嘿，先让我们看看三个变量的数值声明：

    private static final long multiplier = 0x5DEECE66DL;
    private static final long addend = 0xBL;
    private static final long mask = (1L << 48) - 1;

其中multiplier和addend分别代表公式中的a和c，很好理解，但mask代表什么呢？其实，x & [(1L << 48)–1]与 x（mod 2^48）等价。解释如下：

x对于2的N次幂取余，由于除数是2的N次幂，如：

0001，0010，0100，1000。。。。

相当于把x的二进制形式向右移N位，此时移到小数点右侧的就是余数，如：

13 = 1101 8 = 1000

13 / 8 = 1.101，所以小数点右侧的101就是余数，化成十进制就是5

然而，无论是C语言还是java,位运算移走的数显然都一去不复返了。（什么，你说在CF寄存器中？好吧，太高端了点，其实还有更给力的方法）有什么好办法保护这些即将逝去的数据呢？

学着上面的mask，我们不妨试着把2的N次幂减一：

0000，0001，0011，0111，01111，011111。。。

怎么样，有启发了吗？

我们知道，某个数（限0和1）与1作与（&）操作，结果还是它本身；而与0作与操作结果总是0，即：

a & 1 = a, a & 0 = 0

而我们将x对2^N取余操作希望达到的目的可以理解为：

1、所有比2^N位（包括2^N那一位）全都为0

2、所有比2^N低的位保持原样

因此， x & （2^N-1）与x（mod 2^N）运算等价，还是13与8的例子：

1101 % 1000 = 0101 1101 & 0111 = 0101

二者结果一致。

嘿嘿，讲明白了这个与运算的含义，我想上面那行代码的含义应该很明了了，就是线性同余公式的直接套用，其中a = 0x5DEECE66DL, c = 0xBL, m = 2^48，就可以得到一个48位的随机数，而且这个谨慎的工程师进行了迭代，增加结果的随机性。再把结果移位，就可以得到指定位数的随机数。

接下来我们研究一下更常用的一个函数——带参数n的nextInt：

    public int nextInt(int n) {
        if (n <= 0)
            throw new IllegalArgumentException("n must be positive");
 
        if ((n & -n) == n)  // i.e., n is a power of 2
            return (int)((n * (long)next(31)) >> 31);
 
        int bits, val;
        do {
            bits = next(31);
            val = bits % n;
        } while (bits - val + (n-1) < 0);
        return val;
    }

显然，这里基本的思路还是一样的，先调用next函数生成一个31位的随机数（int类型的范围），再对参数n进行判断，如果n恰好为2的方幂，那么直接移位就可以得到想要的结果；如果不是2的方幂，那么就关于n取余，最终使结果在[0,n)范围内。另外，do-while语句的目的应该是防止结果为负数。

你也许会好奇为什么(n & -n) == n可以判断一个数是不是2的次方幂，其实我也是研究了一番才弄明白的，其实，这主要与补码的特性有关：

众所周知，计算机中负数使用补码储存的（不懂什么是补码的自己百度恶补），举几组例子：

2 ：0000 0010 -2 ：1111 1110

8 ：0000 1000 -8 ：1111 1000

18 ：0001 0010 -18 ：1110 1110

20 ：0001 0100 -20 ：1110 1100

不知道大家有没有注意到，补码有一个特性，就是可以对于两个相反数n与-n，有且只有最低一个为1的位数字相同且都为1，而更低的位全为0，更高的位各不相同。因此两数作按位与操作后只有一位为1，而能满足这个结果仍为n的只能是原本就只有一位是1的数，也就是恰好是2的次方幂的数了。

不过个人觉得还有一种更好的判断2的次方幂的方法：

n & (n-1) == 0

感兴趣的也可以自己研究一下^o^。

好了，线性同余法就介绍到这了，下面简要介绍一下另一种同余法——乘同余法（Multiplicative congruential method）。

上文中的线性同余法，主要用来生成整数，而某些情景下，比如科研中，常常只需要（0，1）之间的小数，这时，乘同余法是更好的选择，它的基本公式和线性同余法很像：

Xn+1=（a*Xn ）(mod m ）

其实只是令线性公式中的c=0而已。只不过，为了得到小数，我们多做一步：

Yn = Xn/m

由于Xn是m的余数，所以Yn的值介于0与1之间，由此到（0，1）区间上的随机数列。

除此之外，还有混合同余法，二次同余法，三次同余法等类似的方法，公式类似，也各有优劣，在此不详细介绍了。

同余法优势在计算速度快，内存消耗少。但是，因为相邻的随机数并不独立，序列关联性较大。所以，对于随机数质量要求高的应用，特别是很多科研领域，并不适合用这种方法。

不要走开，下篇博客介绍一个更给力的算法——梅森旋转算法（Mersenne Twister），持续关注啊！

http://www.myexception.cn/program/1609435.html

Atomic 从JDK5开始, java.util.concurrent包里提供了很多面向并发编程的类. 使用这些类在多核CPU的机器上会有比较好的性能.
主要原因是这些类里面大多使用(失败-重试方式的)乐观锁而不是synchronized方式的悲观锁.

今天有时间跟踪了一下AtomicInteger的incrementAndGet的实现.
本人对并发编程也不是特别了解, 在这里就是做个笔记, 方便以后再深入研究.

1. incrementAndGet的实现

    public final int incrementAndGet() {
        for (;;) {
            int current = get();
            int next = current + 1;
            if (compareAndSet(current, next))
                return next;
        }
    }

首先可以看到他是通过一个无限循环(spin)直到increment成功为止.
循环的内容是
1.取得当前值
2.计算+1后的值
3.如果当前值还有效(没有被)的话设置那个+1后的值
4.如果赋值没成功(
当前值已经无效了即被别的线程改过了.
expect这个参数就是用来校验当前值是否被别的参数更改了
), 再从1开始.

2. compareAndSet的实现

    public final boolean compareAndSet(int expect, int update) {
        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
    }

直接调用的是UnSafe这个类的compareAndSwapInt方法
全称是sun.misc.Unsafe. 这个类是Oracle(Sun)提供的实现. 可以在别的公司的JDK里就不是这个类了

3. compareAndSwapInt的实现

    /**
     * Atomically update Java variable to <tt>x</tt> if it is currently
     * holding <tt>expected</tt>.
     * @return <tt>true</tt> if successful
     */
    public final native boolean compareAndSwapInt(Object o, long offset,
                                                  int expected,
                                                  int x);

可以看到, 不是用Java实现的, 而是通过JNI调用操作系统的原生程序.

4. compareAndSwapInt的native实现
如果你下载了OpenJDK的源代码的话在hotspot\src\share\vm\prims\目录下可以找到unsafe.cpp

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
  UnsafeWrapper("Unsafe_CompareAndSwapInt");
  oop p = JNIHandles::resolve(obj);
  jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
  return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END

可以看到实际上调用Atomic类的cmpxchg方法.

5. Atomic的cmpxchg
这个类的实现是跟操作系统有关, 跟CPU架构也有关, 如果是windows下x86的架构
实现在hotspot\src\os_cpu\windows_x86\vm\目录的atomic_windows_x86.inline.hpp文件里

inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  // alternative for InterlockedCompareExchange
  int mp = os::is_MP();
  __asm {
    mov edx, dest
    mov ecx, exchange_value
    mov eax, compare_value
    LOCK_IF_MP(mp)
    cmpxchg dword ptr [edx], ecx
  }
}

在这里可以看到是用嵌入的汇编实现的, 关键CPU指令是 cmpxchg
到这里没法再往下找代码了. 也就是说CAS的原子性实际上是CPU实现的. 其实在这一点上还是有排他锁的. 只是比起用synchronized, 这里的排他时间要短的多. 所以在多线程情况下性能会比较好.

代码里有个alternative for InterlockedCompareExchange
这个InterlockedCompareExchange是WINAPI里的一个函数, 做的事情和上面这段汇编是一样的
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683560%28v=vs.85%29.aspx

6. 最后再贴一下x86的cmpxchg指定

Opcode CMPXCHG

CPU: I486+
Type of Instruction: User

Instruction: CMPXCHG dest, src

Description: Compares the accumulator with dest. If equal the "dest"
is loaded with "src", otherwise the accumulator is loaded
with "dest".

Flags Affected: AF, CF, OF, PF, SF, ZF

CPU mode: RM,PM,VM,SMM
+++++++++++++++++++++++
Clocks:
CMPXCHG reg, reg 6
CMPXCHG mem, reg 7 (10 if compartion fails)

http://www.blogjava.net/mstar/archive/2013/04/24/398351.html

姚期智：
他先是进入清华大学高等研究中心任全职教授。之后主导成立了一个“姚班”！
之所以发起成立这个实验班，是因为他感觉当前，中国的计算机科学本科教育水平，与麻省理工、斯坦福等，国外一流大学的教学水平，仍有一定的差距，因此，他希望能以他在国外，多年的理论研究与教学经验，把这个班的学生培养成为具有麻省理工、斯坦福同等水平的世界顶尖计算机科学人才。
他曾在致清华全校同学的信中写道：
“我们的目标并不是培养优秀的计算机软件程序员，我们要培养的是具有国际水平的一流计算机人才。”

他说：“我感觉物理学研究，与我原来想象的有些不同。恰在这个时期计算机刚刚兴起，有很多有意思的问题等着解决。我恰巧遇上这一学科，我认为这个选择是对的。”
“人生就像鸡蛋，从外打破是压力，从内打破是成长。只有不断自我修正，才会拥有向上爬的力量！”

posted @ 2016-08-31 00:21 沧海一滴阅读(6616) 评论(11) 收藏举报

刷新页面返回顶部

沧海一滴

不积小流，无以成江海

解密随机数生成器（二）——从java源码看线性同余算法

Opcode CMPXCHG

公告