Hive重写表数据丢失风险记录

若在Hive中执行INSERT OVERWRITE重写同一个表的数据时,有可能会造成数据丢失。

如 INSERT OVERWRITE TABLE table_name SELECT * FROM table_name


一、新建一张分区表

create table test_chj_cols (id string, name string, age string) partitioned by (ds string) stored as textfile;

二、插入一条记录

insert into test_chj_cols partition (ds='20181224') values ('1','chj','18');

三、确认表数据及结构

> select * from test_chj_cols;
OK
test_chj_cols.id        test_chj_cols.name      test_chj_cols.age       test_chj_cols.ds
1       chj     18      20181224
 

> desc formatted test_chj_cols partition (ds='20181224');
OK
col_name        data_type       comment
# col_name              data_type               comment             
                 
id                      string                                      
name                    string                                      
age                     string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
ds                      string                                      
                 
# Detailed Partition Information                 
Partition Value:        [20181224]               
Database:               hduser05db               
Table:                  test_chj_cols            
CreateTime:             Mon Dec 24 19:35:28 CST 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Location:               hdfs://bdphdp02/user/hive/warehouse/hduser05/hduser05db.db/test_chj_cols/ds=20181224     
Partition Parameters:            
        COLUMN_STATS_ACCURATE   true                
        numFiles                1                   
        numRows                 1                   
        rawDataSize             8                   
        totalSize               17                  
        transient_lastDdlTime   1545651329          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        serialization.format    1                   
Time taken: 0.099 seconds, Fetched: 37 row(s)

四、在表中间新增字段

alter table test_chj_cols replace columns (id string, name string, money string, age string);
 

> desc formatted test_chj_cols;
OK
col_name        data_type       comment
# col_name              data_type               comment             
                 
id                      string                                      
name                    string                                      
money                   string                                      
age                     string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
ds                      string                                      
                 
# Detailed Table Information             
Database:               hduser05db               
Owner:                  hadoop                   
CreateTime:             Mon Dec 24 19:34:46 CST 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://bdphdp02/user/hive/warehouse/hduser05/hduser05db.db/test_chj_cols         
Table Type:             MANAGED_TABLE            
Table Parameters:                
        last_modified_by        hadoop              
        last_modified_time      1545651722          
        transient_lastDdlTime   1545651722          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        serialization.format    1                   
Time taken: 0.051 seconds, Fetched: 36 row(s)

五、重写数据

insert overwrite table test_chj_cols partition (ds='20181224') select id,name,age,name from

test_chj_cols;

六、age字段数据丢失

> select * from test_chj_cols;
OK
test_chj_cols.id        test_chj_cols.name      test_chj_cols.age       test_chj_cols.money     test_chj_cols.ds
1       chj     NULL    NULL    20181224
posted @ 2019-01-06 17:23  George_sz  Views(...)  Comments(...Edit  收藏