Writing to HBase via Spark: Task not serializable
| 
 I'm trying to write some simple data in HBase (0.96.0-hadoop2) using Spark 1.0 but I keep getting getting serialization problems. Here is the relevant code: 
Running the code results in: 
Replacing the foreach with map doesn't crash but I doesn't write either. Any help will be greatly appreciated.  | 
|||||
  | 
| 
 The class  Basically, there are three ways to handle this problem: Open a connection on each of worker nodes.Note the use of  
Note that each of worker nodes must have access to HBase servers and must have required jars preinstalled or provided via  Also note that since the connection pool if opened for each of partitions, it would be a good idea to reduce the number of partitions roughly to the number of worker nodes (with  Serialize all data to a single box and write it to HBaseIt's possible to write all data from an RDD with a single computer, even if it the data doesn't fit to memory. The details are explained in this answer: Spark: Best practice for retrieving big data from RDD to local machine Of course, it would be slower than distributed writing, but it's simple, doesn't bring painful serialization issues and might be the best approach if the data size is reasonable. Use HadoopOutputFormatIt's possible to create a custom HadoopOutputFormat for HBase or use an existing one. I'm not sure if there exists something that fits your needs, but Google should help here. P.S. By the way, the   | 
作者:明翼(XGogo)
-------------
公众号:TSparks
微信:shinelife
扫描关注我的微信公众号感谢
-------------

                
            
浙公网安备 33010602011771号