用Python写了个用LOAD命令将文件导入Hive的程序,开始代码写成下面这样:
1 def loadToHive(bakFilePath, tbName): 2 try: 3 transport = TSocket.TSocket(HIVE_SERVER, HIVE_PORT) 4 transport = TTransport.TBufferedTransport(transport) 5 protocol = TBinaryProtocol.TBinaryProtocol(transport) 6 client = ThriftHive.Client(protocol) 7 transport.open() 8 client.execute("LOAD DATA LOCAL INPATH '" + bakFilePath + "' INTO TABLE " + tbName) 9 print "LOAD DATA LOCAL INPATH '" + bakFilePath + "' INTO TABLE " + tbName 10 transport.close() 11 except Thrift.TException, tx: 12 print '%s' % (tx.message) 13 14 15 def test(): 16 try: 17 bak = file( 'tmp.bak', 'w') #需要打开文件处理下 18 ........ #处理文件 19 loadToHive('tmp.bak', 'test') #load到test表 20 except IOError as err: 21 print('File Error: '+ str(err)) 22 finally: 23 bak.close()
运行发现hive提示:
Copying data from file:****/tmp.bak
Copying file: file:****/tmp.bak
Loading data to table default.test100
OK,
即插入成功,但是,去hive查test表发现根本没load进去!
后来发现需要在执行 loadToHive('tmp.bak', 'test') 前关闭文件,把bak.close()放到 loadToHive('tmp.bak', 'test') 前面就成功导入了。
看来hive判断是否导入成功是有问题的啊。

浙公网安备 33010602011771号