hadoop下JNI调用c++

声明：本文主要参考http://www.linuxidc.com/Linux/2012-12/75535.htm，如有侵权，请通知删除。

按照上面的参考实现了一遍，碰到了一些问题，还有一些该注意的地方，以下详细阐述。

尝试分两个阶段进行：
阶段一：在linux跑通一个单机版的JNI程序，即用java调用c++。
阶段二：将上面的程序放到hadoop上跑通。

阶段一：

1.创建一个工程文件夹jni1，jni1下创建bin和src文件夹。

[hadoop@Master jni1]$ mkdir bin
[hadoop@Master jni1]$ mkdir src

在src里创建FakeSegmentForJni.java

package FakeSegmentForJni ;  
  
public class FakeSegmentForJni {  
    public static native String SegmentALine (String line);  
    static  
    {  
        System.loadLibrary("FakeSegmentForJni");  
    }  
}

2.回到上层目录，用javac命令编译FakeSegmentForJni类，生成.class文件

[hadoop@Master jni1]$ javac -d ./bin ./src/*.java

你会发现bin目录下多了一个FakeSegmentForJni的文件夹，文件夹下有个FakeSegmentForJni.class的文件

3. 在FakeSegmentForJni.class的基础上，用javah命令生成c++函数的头文件。此时在bin目录下操作。

[hadoop@Master bin]$ javah -jni -classpath . FakeSegmentForJni.FakeSegmentForJni

其中classpath表示.class文件所在目录，“.”表示当前目录（注：这个点和前后都有空格的！）；后面的参数，第一个“FakeSegmentForJni”表示package名称，第二个“FakeSegmentForJni”表示class名称。敲完命令后，就能在当前目录下发现c++函数的头文件FakeSegmentForJni_FakeSegmentForJni.h。打开看一下，主要将java类中的static函数转成了c++接口，内容如下：

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class FakeSegmentForJni_FakeSegmentForJni */

#ifndef _Included_FakeSegmentForJni_FakeSegmentForJni
#define _Included_FakeSegmentForJni_FakeSegmentForJni
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class:     FakeSegmentForJni_FakeSegmentForJni
 * Method:    SegmentALine
 * Signature: (Ljava/lang/String;)Ljava/lang/String;
 */
JNIEXPORT jstring JNICALL Java_FakeSegmentForJni_FakeSegmentForJni_SegmentALine
  (JNIEnv *, jclass, jstring);

#ifdef __cplusplus
}
#endif
#endif

第一行说不要修改这个文件。

4. 生成了一个c++头文件，再生成cpp文件。其实用c++的理由是尽量利用现有的、成熟的代码，所以这一步，一般不是功能性开发，而是写个wrapper包装现有的代码——如果是纯功能性开发，那直接用java的了。

生成cpp文件，还是在bin目录下，cpp文件如下：（注：cpp中的函数声明一定要和头文件里一样！）

#include <jni.h>  
#include <stdio.h>  
#include <string.h>  
#include "FakeSegmentForJni_FakeSegmentForJni.h"  
  
/* * Class:    FakeSegmentForJni_FakeSegmentForJni 
 * * Method:    SegmentALine 
 * * Signature: (Ljava/lang/String;)Ljava/lang/String; 
 * */  
  
JNIEXPORT jstring JNICALL Java_FakeSegmentForJni_FakeSegmentForJni_SegmentALine  
  (JNIEnv *env, jclass obj, jstring line)  
{  
    char buf[128];  
    const char *str = NULL;  
    str = env->GetStringUTFChars(line, false);  
    if (str == NULL)  
        return NULL;  
    strcpy (buf, str);  
    strcat (buf, "--copy that\n");  
    env->ReleaseStringUTFChars(line, str);  
    return env->NewStringUTF(buf);  
}

[hadoop@Master bin]$ ll
total 12
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 10:48 FakeSegmentForJni
-rw-rw-r-- 1 hadoop hadoop 700 Nov 28 11:01 FakeSegmentForJni_FakeSegmentForJni.cpp
-rw-rw-r-- 1 hadoop hadoop 562 Nov 28 10:52 FakeSegmentForJni_FakeSegmentForJni.h

5. 在本地环境下编译出c++动态库。为啥要强调“在本地环境”？字面上的意思就是，你在windows下用JNI，就到windows下编译FakeSegmentForJni_FakeSegmentForJni.cpp文件，生成dll；在linux下，就到linux下（32位还是64位自己搞清楚）编译FakeSegmentForJni_FakeSegmentForJni.cpp文件，生成.so文件。为啥非要这样？这就涉及到动态库的加载过程，每个系统都不一样。

[hadoop@Master bin]$ g++ -I/usr/java/jdk1.6.0_31/include -I/usr/java/jdk1.6.0_31/include/linux FakeSegmentForJni_FakeSegmentForJni.cpp -fPIC -shared -o libFakeSegmentForJni.so

命令有点长，不过意思很容易。“-I”表示要包含的头文件。正常来讲，系统路径都已经为g++设置好了。不过jni.h是java的头文件，不是c++的，g++找不到，只好在编译的时候告诉编译器。

我这里jni.h是在/usr/java/jdk1.6.0_31/include目录下，还要包含/usr/java/jdk1.6.0_31/include/linux，因为jni.h文件也调用也这里面的东西，不然会有错误。

还有库文件的文件名前必须要有lib，不然即使指定了正确的库文件路径也会找不到

这样，就有了libFakeSegmentForJni.so文件

[hadoop@Master bin]$ ll
total 20
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 10:48 FakeSegmentForJni
-rw-rw-r-- 1 hadoop hadoop 700 Nov 28 11:01 FakeSegmentForJni_FakeSegmentForJni.cpp
-rw-rw-r-- 1 hadoop hadoop 562 Nov 28 10:52 FakeSegmentForJni_FakeSegmentForJni.h
-rwxrwxr-x 1 hadoop hadoop 7703 Nov 28 11:03 libFakeSegmentForJni.so

6.本地java程序调用FakeSegmentForJni.so。回到src目录下，创建TestFakeSegmentForJni.java

package FakeSegmentForJni;  
  
import java.io.IOException;  
import java.net.URI;  
import java.net.URL;  
  
/** 
 * This class is for verifying the jni technology.  
 * It call the function defined in FakeSegmentForJni.java 
 *  
 */  

public class TestFakeSegmentForJni {  
      
    public static void main(String[] args) throws Exception {  
  
        System.out.println ("In this project, we test jni!\n");  
          
        String s = FakeSegmentForJni.SegmentALine("now we test FakeSegmentForJni");  
        System.out.print(s);  
          
    }
  
}

回到jni1目录下，执行[hadoop@Master jni1]$ javac -d ./bin ./src/*.java，jni1/bin/FakeSegmentForJni目录下会多出一个TestFakeSegmentForJni.class文件。

7.打成jar包。

回到bin目录下，执行[hadoop@Master bin]$ jar -cvf TestFakeSegmentForJni.jar FakeSegmentForJni，jar命令把当前FakeSegmentForJni文件夹下的所有class文件都打成了一个jar包。所以，bin目录下会多出个.jar文件。

8.运行。

[hadoop@Master bin]$ java -Djava.library.path='.' -cp TestFakeSegmentForJni.jar FakeSegmentForJni.TestFakeSegmentForJni

这里不知道为什么，用-jar来运行的话，会出现“no main manifest attribute, in TestFakeSegmentForJni.jar”的错误，改成-cp就好了。

运行结果如下：

[hadoop@Master bin]$ java -Djava.library.path='.' -cp TestFakeSegmentForJni.jar FakeSegmentForJni.TestFakeSegmentForJni
In this project, we test jni!

now we test FakeSegmentForJni--copy that

阶段二：

1.在eclipse上创建mapreduce工程。前提：hadoop安装好并成功运行，配置好eclipse开发环境，需要装插件开发mapreduce工程，具体参考http://www.cnblogs.com/hequn/articles/3438301.html 中的四。

新建FakeSegmentForJni.java和JniTest.java类，新建类的时候包的地址和包名要和前面生成库的时候一样，不然貌似即使库加载正确，在调用库中函数的时候也会出现error。我也不知道为什么，反正这样就不错了，就是有点不方便，以后再研究。

具体的路径，见下图：

FakeSegmentForJni.java:

package FakeSegmentForJni;

public class FakeSegmentForJni { 
    public static native String SegmentALine (String line); 
    static 
    { 
        System.loadLibrary("FakeSegmentForJni"); 
    } 
}

JniTest.java:

package FakeSegmentForJni;


import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class JniTest {
	
	public static class MapTestJni extends Mapper<Object, Text, Text, Text> {  
        
        protected String s;  
        protected void setup(Context context) throws IOException, InterruptedException  
        {  
 
            s = FakeSegmentForJni.SegmentALine("jni-value");  
        }  
          
        protected void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {  

            context.write(value, new Text(s.toString()));  
        }  
    } 
	
	
	public static class ReduceTestJni extends Reducer<Text, Text, Text, Text> {  
        
        protected void reduce(Text key, Iterable<Text> values, Context context)  
        throws IOException, InterruptedException {  
              
            String outString = "";  
            for (Text value: values)  
            {  
                outString = value.toString();  
                context.write(key, new Text(outString));
            }  
              
              
        }  
    } 
	
	public void runTestJni (String[] args) throws Exception {  
        
        //  the configuration  
        Configuration conf = new Configuration();  
     //   conf.set("mapred.job.tracker", "192.168.178.92:9001");
     //   conf.addResource(new Path("E:\\HadoopWorkPlat\\hadoop-1.1.2\\conf\\hdfs-site.xml"));
     //  String[] ars = { "/user/hadoop/0", "/user/hadoop/colorToGrayOutput2" };
        
        GenericOptionsParser goparser = new GenericOptionsParser(conf, args);  
        String otherargs [] = goparser.getRemainingArgs(); 
        
        removeDir(new Path(args[2]), conf);
          
        // the job  
        Job job;  
        job = new Job(conf, "@here-TestFakeSegmentForJni-hadoopJni");  
        job.setJarByClass(JniTest.class);  
          
        // the mapper  
        job.setMapperClass(MapTestJni.class);  
        job.setMapOutputKeyClass(Text.class);  
        job.setMapOutputValueClass(Text.class);  
          
        // the reducer  
        job.setReducerClass(ReduceTestJni.class);  
        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(Text.class);  
        job.setNumReduceTasks(1);  
          
        // the path  
        FileInputFormat.addInputPath(job, new Path(otherargs[1]));  
        FileOutputFormat.setOutputPath(job, new Path(otherargs[2]));  
          
        job.waitForCompletion(true);  
    } 
	
	
	public static void main(String[] args) throws Exception {  
		  
        System.out.println ("In this project, we test jni!\n");  
          
        // test jni on linux local  
        /*String s = FakeSegmentForJni.SegmentALine("now we test FakeSegmentForJni"); 
        System.out.print(s);*/  
          
        // test jni on hadoop  
        new JniTest().runTestJni(args);  
          
    } // main 
	
	@SuppressWarnings("deprecation")
	public static void removeDir(Path outputPath, Configuration conf)
			throws IOException {
		FileSystem fs = FileSystem.get(conf);
		if (fs.exists(outputPath)) {
			fs.delete(outputPath);
		}
	}

}

其中runTestJni（），GenericOptionsParser那两行是关键，用来分析和执行hadoop命令中传进来的特殊参数。配合命令行中的命令（下文写），他把动态库分发到tasknode上，路径与jar的执行路径相同。

通常用jar包的形式运行hadoop程序，所需的参数，如：输入路径、输出路径、mapper、combiner、reducer等都可以用Job来设置，不需要额外的参数。命令行中少了这些参数，会显得短很多。尤其hadoop命令行一般都挺长，就很方便。相反地，采用c++ streaming的方式来运行程序的时候，就需要用-input、-output等参数来指定相关参数。不过，在jar包中，除了用Job外，也可以用GenericOptionsParser来解析上述命令行中的参数，只要命令行配合有相应的输入，GenericOptionsParser就可以解析。对于-input、-output等来讲，没有必要这样做。还有一个参数是-files，就是把-files后面的文件（多个文件用逗号间隔）同jar包一起分发到tasknode中，这个参数，刚好可以将我们的动态库分发下去。

“goparser.getRemainingArgs();”这条语句，是在GenericOptionsParser解析完特殊参数之后，获得剩下的参数列表，对于我们来讲，剩下的参数就是main函数所在的类名、输入路径和输出路径，参见下面的命令行。

2.把工程打包成jar包。Export->Runnable JAR file->launch configuration（选择相应的工程）->Export destination->Extract required libraries into generated JAR.

3.把生成的jar包传到Master上。这里的jar是JniTest3.jar。

[hadoop@Master ~]$ ll -a | grep JniTest3.jar
-rw-r--r-- 1 hadoop hadoop 27495223 Nov 28 13:14 JniTest3.jar

4.执行。

[hadoop@Master ~]$ hadoop jar JniTest3.jar -files /home/hadoop/Documents/java/jni1/bin/libFakeSegmentForJni.so FakeSegmentForJni.TestFakeSegmentForJni input output

hadoop jar 命令后面跟随的第一个参数一定是打好的jar包，在本例中是TestFakeSegmentForJniHadoop.jar文件及其路径
由于在控制函数中用了GenericOptionsParser，jar包后面就必须紧跟需要设定的参数，这里，我们的参数是“-files /xxx/TestJni/libFakeSegmentForJni.so”，表示把本地路径“/xxx/TestJni/”中的libFakeSegmentForJni.so文件随jar包分发下去。
剩下的就比较容易了，分别是main函数所在的类名、输入路径、输出路径

5.结果。

output中的part-r-00000中的内容是key jni-value--copy that

至此，调用成功。

FAQ:

1.返回int[]型数据。主要代码如下：

JNIEXPORT jintArray JNICALL Java_ArrayTest_initIntArray(JNIEnv *env, jclass cls, int size)
{
 jintArray result;
 result = (*env)->NewIntArray(env, size);
 if (result == NULL) {
     return NULL; /* out of memory error thrown */
 }
 int i;
 // fill a temp structure to use to populate the java int array
 jint fill[256];
 for (i = 0; i < size; i++) {
     fill[i] = 0; // put whatever logic you want to populate the values here.
 }
 // move from the temp structure to the java structure
 (*env)->SetIntArrayRegion(env, result, 0, size, fill);
 return result;
}

貌似没有jint fill[256];返回都是不成功的，有点像析构的感觉。env->NewIntArray（）new出来的都会没了。先放到jint fill[256]中，然后放到jintArray中。

posted on 2013-11-28 10:38 hequn8128 阅读(882) 评论(0) 收藏举报

刷新页面返回顶部

hequn8128