永不言弃,简单就好
每一个付出,都有回报,永远不放弃
随笔- 110  文章- 0  评论- 29 
博客园  首页  新随笔  联系  管理  订阅 订阅
2009年7月9日
dbms_random包体

CREATE OR REPLACE PACKAGE BODY SYS.dbms_random AS
    mem        num_array;           -- big internal state hidden from the user
    counter    BINARY_INTEGER := 55;-- counter through the results
    saved_norm NUMBER := NULL;      -- unused random normally distributed value
    need_init  BOOLEAN := TRUE;     -- do we still need to initialize


    -- Seed the random number generator with a binary_integer
    PROCEDURE seed(val IN BINARY_INTEGER) IS
    BEGIN
 seed(TO_CHAR(val));
    END seed;


    -- Seed the random number generator with a string.
    PROCEDURE seed(val IN VARCHAR2) IS
        junk     VARCHAR2(2000);
        piece    VARCHAR2(20);
        randval  NUMBER;
        mytemp   NUMBER;
        j        BINARY_INTEGER;
    BEGIN
        need_init   := FALSE;
        saved_norm  := NULL;
        counter     := 0;
        junk        := TO_SINGLE_BYTE(val);
        FOR i IN 0..54 LOOP
            piece   := SUBSTR(junk,1,19);
            randval := 0;
            j       := 1;

            -- convert 19 characters to a 38-digit number
            FOR j IN 1..19 LOOP
                randval := 1e2*randval + NVL(ASCII(SUBSTR(piece,j,1)),0.0);
            END LOOP;

            -- try to avoid lots of zeros
            randval := randval*1e-38+i*.01020304050607080910111213141516171819;
            mem(i)  := randval - TRUNC(randval);

            -- we've handled these first 19 characters already; move on
            junk    := SUBSTR(junk,20);
        END LOOP;

 randval := mem(54);
        FOR j IN 0..10 LOOP
            FOR i IN 0..54 LOOP

                -- barrelshift mem(i-1) by 24 digits
                randval := randval * 1e24;
                mytemp  := TRUNC(randval);
                randval := (randval - mytemp) + (mytemp * 1e-38);

                -- add it to mem(i)
                randval := mem(i)+randval;
                IF (randval >= 1.0) THEN
                    randval := randval - 1.0;
                END IF;

  -- record the result
                mem(i) := randval;
            END LOOP;
        END LOOP;
    END seed;


    -- give values to the user
    -- Delayed Fibonacci, pilfered from Knuth volume 2
    FUNCTION value RETURN NUMBER IS
    randval  NUMBER;
    BEGIN
        counter := counter + 1;
        IF counter >= 55 THEN

            -- initialize if needed
            IF (need_init = TRUE) THEN
                seed(TO_CHAR(SYSDATE,'MM-DD-YYYY HH24:MI:SS') ||
                     USER || USERENV('SESSIONID'));
            ELSE
                -- need to generate 55 more results
                FOR i IN 0..30 LOOP
                    randval := mem(i+24) + mem(i);
                    IF (randval >= 1.0) THEN
                        randval := randval - 1.0;
                    END IF;
                    mem(i) := randval;
                END LOOP;
                FOR i IN 31..54 LOOP
                    randval := mem(i-31) + mem(i);
                    IF (randval >= 1.0) THEN
                        randval := randval - 1.0;
                    END IF;
                    mem(i) := randval;
                END LOOP;
            END IF;
            counter := 0;
        END IF;
        RETURN mem(counter);
    END value;


    -- Random 38-digit number between LOW and HIGH.
    FUNCTION value ( low in NUMBER, high in NUMBER) RETURN NUMBER is
    BEGIN
        RETURN (value*(high-low))+low;
    END value;


    -- Random numbers in a normal distribution.
    -- Pilfered from Knuth volume 2.
    FUNCTION normal RETURN NUMBER is  -- 38 decimal places: Mean 0, Variance 1
        v1  NUMBER;
        v2  NUMBER;
        r2  NUMBER;
        fac NUMBER;
    BEGIN
        IF saved_norm is not NULL THEN     -- saved from last time
            v1 := saved_norm;              -- to be returned this time
            saved_norm := NULL;
        ELSE
            r2 := 2;
            -- Find two independent uniform variables
            WHILE r2 > 1 OR r2 = 0 LOOP
                v1 := value();
                v1 := v1 + v1 - 1;
                v2 := value();
                v2 := v2 + v2 - 1;
                r2 := v1*v1 + v2*v2;  -- r2 is radius
            END LOOP;      -- 0 < r2 <= 1:  in unit circle
            /* Now derive two independent normally-distributed variables */
            fac := sqrt(-2*ln(r2)/r2);
            v1 := v1*fac;          -- to be returned this time
            saved_norm := v2*fac;  -- to be saved for next time
        END IF;
        RETURN v1;
    END  normal;


    -- Random string.  Pilfered from Chris Ellis.
    FUNCTION string (opt char, len NUMBER)
        RETURN VARCHAR2 is -- string of <len> characters
        optx char (1)      := lower(opt);
        lo   NUMBER;
        rng  NUMBER;
        n    NUMBER;
        xstr VARCHAR2 (4000) := NULL;
    BEGIN
        IF    optx = 'u' THEN  -- upper case alpha characters only
            lo := 65; rng := 26; -- ASCII 41 to 5A (hex)
        ELSIF optx = 'l' THEN  -- lower case alpha characters only
            lo := 97; rng := 26; -- ASCII 61 to 7A (hex)
        ELSIF optx = 'a' THEN  -- alpha characters only (mixed case)
            lo := 65; rng := 52; -- ASCII 41 to 5A and 61 to 7A (see below)
        ELSIF optx = 'x' THEN   -- any alpha-numeric characters (upper)
            lo := 48; rng := 36; -- ASCII 30 to 39 and 41 to 5A (see below)
        ELSIF optx = 'p' THEN  -- any printable characters
            lo := 32; rng := 95; -- ASCII 20 to 7E (hex)
        ELSE
            lo := 65; rng := 26; -- default to upper case
        END IF;
        FOR i IN 1 .. least(len,4000) LOOP
            /* Get random ASCII character value in specified range */
            n := lo + TRUNC(rng * value); -- between lo and (lo + rng -1)
            /* Adjust FOR split range */
            IF    optx = 'a' AND n > 90 THEN
                n := n+6;               -- exclude ASCII characters 5B to 60
            ELSIF optx = 'x' AND n > 57 THEN
                n := n+7;               -- exclude ASCII characters 3A to 40
            END IF;
            xstr := xstr||chr(n); -- Append character to string
        END LOOP;
        RETURN xstr;
    END string;


    -- For compatibility with 8.1
    PROCEDURE initialize(val IN BINARY_INTEGER) IS
    BEGIN
 seed(to_char(val));
    END initialize;


    -- For compatibility with 8.1
    -- Random binary_integer, -power(2,31) <= Random < power(2,31)
    -- Delayed Fibonacci, pilfered from Knuth volume 2
    FUNCTION random RETURN BINARY_INTEGER IS
    BEGIN
 RETURN TRUNC(Value*4294967296)-2147483648;
    END random;


    -- For compatibility with 8.1
    PROCEDURE terminate IS
    BEGIN
 NULL;
    END terminate;

END dbms_random;
/
select dbms_random.value(10,20) from dual;
select dbms_random.string('ABcasdfasd',100) from dual

posted @ 2009-07-09 12:17 嘎子 阅读(115) 评论(0) 编辑
随机抽取表中数据

方法1:

select gmsfhm, xm

 from ( select gmsfhm, xm from czrk where gmsfhm like '%%' order by dbms_random.value )

 where rownum <= 100

方法2:

select gmsfhm, xm
  from ( select gmsfhm, xm from czrk where gmsfhm like '%%' order by dbms_random.random )

  where rownum <= 100;

方法3:(速度会快点)

select gmsfhm,xm

  from ( select gmsfhm, xm from czrk sample(1) where gmsfhm like '%%' order by dbms_random.random )

  where rownum <= 100;

posted @ 2009-07-09 12:03 嘎子 阅读(22) 评论(0) 编辑
ORACLE SAMPLE 语法应用说明

这两天发了一个贴子,寻找比较好的办法随机取出表的一条记录

地址见:http://www.oracle.com.cn/viewthread.php?tid=20848

 

本文介绍了一下ORACLE SAMPLE语法的应用,采用SAMPLE也是一种解决方法,不过还不是很好,主要是返回的记录分布不均匀,要么是靠表前面的一些,要么是不返回记录。
下面是SAMPLE语法的使用示例:
选择10%的记录
select * from t1 sample(10)
选择0.1%的记录
select * from t1 sample(0.1)
根据数据块选择1%的记录
select * from t1 sample block(1)
使用数据块选择与使用记录行选择的区别:使用数据块选择表示样本的采集是基于数据块采集的,也就是说样本如果一个数据块被采集为样本,则数据块里的记录全部都是样本

样本统计是基于统计学采集的,是有概率问题,不一定完全准确,如你要取50%的记录,但实际可能返回给你49%的记录集,也可能返回给你51%的记录集

例如
如果表T1有数据块B1,B2
B1有记录R1,R2,R3,R4,R5
B2有记录R6,R7,R8,R9,R10
如果使用如下SQL选择50%的数据
select * from t1 sample block(50)
则返回的结果可能是数据块B1的记录
R1,R2,R3,R4,R5
也可能是数据块B2的记录
R6,R7,R8,R9,R10
也可能不返回记录集

如果使用如下SQL选择50%的数据
select * from t1 sample (50)
则返回的结果可能是
R2,R3,R5,R8,R9
也可能是如下的样子
R1,R3,R4,R8

应用示例:
随机从表中取中1条记录,选取记录的概率是1%
select * from t1 sample(1) where rownum=1
随机从表中取中10条记录,选取记录的概率是0.1%
select * from t1 sample(0.1) where rownum<=10
注:当选取的概率越低,访问表的记录数将越多

ORACLE参考手册中的相关说明:
sample_clause
The sample_clause lets you instruct Oracle to select from a random sample of rows from the table, rather than from the entire table.

BLOCK
BLOCK instructs Oracle to perform random block sampling instead of random row sampling.

sample_percent
sample_percent is a number specifying the percentage of the total row or block count to be included in the sample. The value must be in the range .000001 to (but not including) 100.
Restrictions on Sampling During Queries
You can specify SAMPLE only in a query that selects from a single table. Joins are not supported. However, you can achieve the same results by using a CREATE TABLE ... AS SELECT query to materialize a sample of an underlying table and then rewrite the original query to refer to the newly created table sample. If you wish, you can write additional queries to materialize samples for other tables.
When you specify SAMPLE, Oracle automatically uses cost-based optimization. Rule-based optimization is not supported with this clause.
--------------------------------------------------------------------------------
Caution:
The use of statistically incorrect assumptions when using this feature can lead to incorrect or undesirable results.
--------------------------------------------------------------------------------
 
译:
Sample选项
使用sample选项的意思是指定Oracle从表中随机选择记录样本,这样比从整个表中选择更高效.
block选项
加上 BLOCK选项时表示随机取数据块,而不是随机取记录行.
sample_percent选项
sample_percent是指定总记录行或数据块为数据样本的百分比数值,这个值只能在0.000001到100之间,且不能等于100
限制
只能在单表查询的SQL中指定sample选项,不支持有连接的查询。但是,你可以使用CREATE TABLE ... AS SELECT查询的语法完成同样的效果,然后再采用新建的样本表重新编写查询SQL。
当你指定用sample时,不支持基于规则(rule)的优化法则,ORACLE自动使用基本成本(cost)的优化法则.
注意:
The use of statistically incorrect assumptions when using this feature can lead to incorrect or undesirable results.
这句话不太理解,估计是说采用样本采集特性可能会产生不准确的结果集。

 

 

posted @ 2009-07-09 11:55 嘎子 阅读(255) 评论(0) 编辑
Copyright ©2012 嘎子