代码改变世界

Asktom: Thanks for the question regarding "Partitioning question", version 8.1.7

2011-10-12 14:47  Tracy.  阅读(312)  评论(0编辑  收藏  举报

Brian -- Thanks for the question regarding "Partitioning question", version 8.1.7

Submitted on 12-Oct-2001 13:16 Central time zone
Last updated 8-Oct-2011 6:53

You Asked

Let's say we have a table of documents with key of doc_id.  This table contains millions 
of documents(rows).  There is also a 

table that assigns one or more industry codes to each doc:

create table doc_ind
( doc_id    integer,
  ind_code      varchar2(10)
)

This table will therefore also contain millions of rows.

The industry code is a part of the default set of industry codes used by our business.

Now let's say that we sign deals with clients that want to use their own industry 
classifications.  I don't want to create new tables for each client (there could 
eventually be hundreds of clients and therefore hundreds of new tables) and have some 
scheme where logic is built into the application to derive the correct table_name to 
query based on the client.

My first thought is to create a table like:

create table doc_client_ind
( client_id     varchar2(10),
  doc_id    integer,
  ind_code      varchar2(10)
)

The client id could be any value - usually resembling the client name.

Because each new client would add millions of rows to this table, I want to use 
partitioning.  After reading up on partitioning in your book, my situation does not seem 
to fit perfectly into standard scenarios.  At first, it seemed I would need to use hash 
partitioning.  However, the number of distinct client_id values in this table will be 
very small to start with and grow over time.  It is not clear how I would easily manage 
hash partitioning well over time.

In order to use range partitioning, it seems to me that client_id's would need to be 
assigned with values that have a natural ascending sort value (i.e., c0001, c0002, etc.). 
 Then I could have one partition per client and create them on an ongoing as 

needed basis.

I have three questions:

1. Are there other options (with or without partitioning) I don't see to manage this?
2. Is there a way to manage this effectively using the original essentially random client 
id values?
3. What do you suggest given the limited info above?
4. Given your suggestion in question 3, what are the disadvantages, caveats I need to be 
aware of?

Thanks - Brian

 

and we said...

Well, in 9i, there is LIST partitioning as well, that maybe something to consider.  Here 
each client_id could be targeted to a specific partition.


If you just want to achieve a nice distribution of data across many disks/partitions, you 
could hash right now on DOC_ID.  There will be millions of unique values for that from 
the get go -- that will hash very nicely.  Since you'll always join on doc_id and 
client_id, partition elimination will take place just as effectively.

So, either look at 9i with list partitioning if you feel you would gain some advantage to 
partition by client_id (one would be that if you drop a client, gettng rid of their data 
is trivial -- drop the partition).

Or, hash partition on the doc_id to achieve a good overall spread of data.
 I wanted to know if client id was a small domain, something manageable.  If not, they would not be 
able to range partition by it as it would be an ongoing job just to split and partition the table.

50 -- they could do it.
500,000 -- they would have a hard time at it (but hash partitioning would start to come into play).

values in between with ranges/hashes -- not practical at some point.  


Index key compression is a feature whereby repeating fields on the leading edge of the index are 
removed from the leaf block row entries.  Compresses the data.

If you create an index on (owner,object_type,object_type) on a copy of all_objects - with compress 
2, we would store the owner, object type ONCE per leaf block, rather then once per ROW per leaf 
block -- leading to significant savings in space.
 
couple of questions for you,

1.What if the column that i'm partitioning contains null entries ? is this allowed ? how will 
oracle know where to place the data ?

2. in Page 620 of the expert one on one book an error 01502 was raised because the 
local_nonprefixed index was unusable why ? are "queries" on Partitioned Tables (PT) index dependent 
?

3. say i have a table
create table detail
( cust_id      number
, cust_balance number
, some other columns.....
)

and cust_id has 65 distinct values over 7 million records. Will partitioning hash ( cust_id ) be of 
any use assuming that the Front-End apps issues out a dynamic query that will not use any indexed 
columns on any of its predicate ?

one advantage that i see in using PT's is data will be forced group by partition.

 

Followup June 27, 2003 - 9am Central time zone:

1) yes allowed, it'll put them where it "wants".  

ops$tkyte@ORA920> create table t
  2  (
  3    object_id        number ,
  4    object_type      varchar2(18)
  5  )
  6  partition by hash(object_id) partitions 8
  7  /

Table created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into t select object_id, object_type from all_objects;

31589 rows created.

ops$tkyte@ORA920> insert into t select NULL, 'xxx' from all_users;

47 rows created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> declare
  2      l_cnt number;
  3  begin
  4      for x in ( select partition_name
  5                   from user_tab_partitions
  6                  where table_name = 'T' )
  7      loop
  8          execute immediate
  9          'select count(*)
 10             from t partition(' || x.partition_name || ')'
 11             into l_cnt;
 12          dbms_output.put_line
 13          ( x.partition_name || ' ' || l_cnt || ' rows...' );
 14
 15          execute immediate
 16          'select count(*)
 17             from t partition(' || x.partition_name || ')
 18            where object_id is null'
 19            into l_cnt;
 20          dbms_output.put_line
 21          ( x.partition_name || ' ' || l_cnt || ' NULL rows...' );
 22      end loop;
 23  end;
 24  /
SYS_P347 3965 rows...
SYS_P347 47 NULL rows...
SYS_P348 3948 rows...
SYS_P348 0 NULL rows...
SYS_P349 4042 rows...
SYS_P349 0 NULL rows...
SYS_P350 3930 rows...
SYS_P350 0 NULL rows...
SYS_P351 3868 rows...
SYS_P351 0 NULL rows...
SYS_P352 3895 rows...
SYS_P352 0 NULL rows...
SYS_P353 4072 rows...
SYS_P353 0 NULL rows...
SYS_P354 3916 rows...
SYS_P354 0 NULL rows...

PL/SQL procedure successfully completed.


for example -- they all went into the "first" partition in that case.

2) page 640 maybe?  it was raised because

   a) the index was unusable
   b) the query plan said "i need to read that index"

queries against partitioned tables are no more or less "index dependent" then queries against 
unpartitioned tables.  same thing would happen against a "normal" table if its indexes were 
unusable and query plan said "i want to use that index"

3) 65 is too small in all likelyhood, unless you are really lucky.  It would be far too easy to 
have this happen:

ops$tkyte@ORA920> select count(distinct object_id) from t;

COUNT(DISTINCTOBJECT_ID)
------------------------
                      65

ops$tkyte@ORA920>
ops$tkyte@ORA920> declare
  2      l_cnt number;
  3  begin
  4      for x in ( select partition_name
  5                   from user_tab_partitions
  6                  where table_name = 'T' )
  7      loop
  8          execute immediate
  9          'select count(*)
 10             from t partition(' || x.partition_name || ')'
 11             into l_cnt;
 12          dbms_output.put_line
 13          ( x.partition_name || ' ' || l_cnt || ' rows...' );
 14      end loop;
 15  end;
 16  /
SYS_P395 65 rows...
SYS_P396 0 rows...
SYS_P397 0 rows...
SYS_P398 0 rows...
SYS_P399 0 rows...
SYS_P400 0 rows...
SYS_P401 0 rows...
SYS_P402 0 rows...

PL/SQL procedure successfully completed.


All 7 million rows could be in the same hash partition.  You would use range or list instead.