Java和R语言各有侧重,Java作为主流开发语言,擅长系统开发,R语言则擅长统计分析,将二者整合,Java负责系统的构建,R用来做分析引擎,从而实现具有分析功能的应用系统。

在Java代码中调用R,可以通过两种方式:Rserve和JRI

一、 Rserve(远程通信模式)

Rserve是一个基于TCP/IP的服务器,通过二进制协议传输数据,可以提供远程连接,使得客户端语言能够调用R。

1. 配置

目前Rserve作为一个package发布在CRAN上,可以直接使用install.packages("Rserve")进行安装。需要使用时在R控制台下加载该包,然后输入命令Rserve(),开启服务器,就可以供客户端调用。

Eclipse中需要添加REngine.jar和RserveEngine.jar两个包依赖(jar包的获取方式:a. R中安装好Rserve以后,在library\Rserve\java目录下有;b. 网站下载),如果是maven工程,在pom.xml文件中添加如下内容,即可。

        <!-- https://mvnrepository.com/artifact/org.rosuda.REngine/Rserve -->
        <dependency>
            <groupId>org.rosuda.REngine</groupId>
            <artifactId>Rserve</artifactId>
            <version>1.8.1</version>
        </dependency>

2. 基本代码

import org.rosuda.REngine.Rserve.RConnection;  
import org.rosuda.REngine.Rserve.RserveException;  
import org.rosuda.REngine.REXPMismatchException;;  
public class Temp {  
  
    public static void main(String[] args) throws REXPMismatchException {  
        // TODO Auto-generated method stub  
        RConnection connection = null;  
        System.out.println("平均值");  
        try {  
            //创建对象  
            connection = new RConnection();  
            String vetor="c(1,2,3,4)";  
            connection.eval("meanVal<-mean("+vetor+")");  
      
            //System.out.println("the mean of given vector is="+mean);  
            double mean=connection.eval("meanVal").asDouble();  
            System.out.println("the mean of given vector is="+mean);  
            //connection.eval(arg0)  
              
        } catch (RserveException e) {  
            // TODO Auto-generated catch block  
            e.printStackTrace();  
        }  
        System.out.println("执行脚本");  
        try {  
            connection.eval("source('D:/myAdd.R')");//此处路径也可以这样写D:\\\\myAdd.R  
            int num1=20;  
            int num2=10;  
            int sum=connection.eval("myAdd("+num1+","+num2+")").asInteger();  
            System.out.println("the sum="+sum);  
        } catch (RserveException e) {  
            // TODO Auto-generated catch block  
            e.printStackTrace();  
        }  
        connection.close();  
    }  
}  

3. 多线程(unix)

在unix环境中,java应用可以多线程访问一个Rserve实例。对于每一个新connection连接,Rserve都另起一个新进程。每一个新连接、新进程都有自己的工作目录。

示例:

第一步启动Rserve

R cmd Rserve --RS-port 1000

如下是java多线程代码。代码中,Rserve实例需要给四个java线程提供服务。

package com.studytrails.rserve;
 
public class RServeMultiThreadClient {
    public static void main(String[] args) {
        RServeMultiThread thread1 = new RServeMultiThread(1000);
        RServeMultiThread thread2 = new RServeMultiThread(1000);
        RServeMultiThread thread3 = new RServeMultiThread(1000);
        RServeMultiThread thread4 = new RServeMultiThread(1000);
 
        thread1.start();
        thread2.start();
        thread3.start();
        thread4.start();
 
        try {
            thread1.join();
            thread2.join();
            thread3.join();
            thread4.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}
 
package com.studytrails.rserve;
 
import org.rosuda.REngine.REXP;
import org.rosuda.REngine.REXPMismatchException;
import org.rosuda.REngine.Rserve.RConnection;
import org.rosuda.REngine.Rserve.RserveException;
 
public class RServeMultiThread extends Thread {
 
    private int port = 0;
 
    public RServeMultiThread(int port) {
        this.port = port;
    }
 
    public void run() {
        try {
            RConnection c = new RConnection("localhost", port);
            c.eval("N = " + port);
            c.eval("x1=rnorm(N)");
            c.eval("x2 = 1 + x1 + rnorm(N)");
            c.eval("y <- 1 + x1 + x2");
            c.eval("df <- data.frame(y,x1,x2)");
            c.eval("fit <- lm(y ~ x1 + x2, data = df)");
            REXP x1 = c.eval("fit[[1]][2]");
            System.out.println("Thread with port " + port + " result: "
                    + x1.asDouble());
            Thread.sleep(5000);
        } catch (RserveException e1) {
            e1.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (REXPMismatchException e) {
            e.printStackTrace();
        }
    }
}

参考链接:http://www.studytrails.com/r/rserve/r-java-multi-thread-using-rserve-unix/

4. Java代码中自动启动Rserve

实际应用中,手动管理Rserve的开启是不可行的,需要让Java代码能够自动启动Rserve。代码参见R安装目录下library\Rserve\client\java\Rserve\test\StartRserve.java文件,或https://github.com/s-u/REngine/blob/master/Rserve/test/StartRserve.java,代码逻辑如下:

  (1)通过运行RConnection c = new RConnection();看是否抛出异常,得知Rserve是否在运行中;

  (2)如果不在运行,启动Rserve:

    a. Windows下,通过查询注册表找到R的安装路径,然后启动;

                Process rp = Runtime.getRuntime().exec("reg query HKLM\\Software\\R-core\\R");
                StreamHog regHog = new StreamHog(rp.getInputStream(), true);
                rp.waitFor();
                regHog.join();
                installPath = regHog.getInstallPath();
                launchRserve(installPath+"\\bin\\R.exe");

    b. Unix下,尝试如下路径启动(推测:如果命令行可以直接输入R启动R的交互式命令行,可以使用launchRserve("R"))

            (launchRserve("R") || /* try some common unix locations of R */
            ((new File("/Library/Frameworks/R.framework/Resources/bin/R")).exists() && launchRserve("/Library/Frameworks/R.framework/Resources/bin/R")) ||
            ((new File("/usr/local/lib/R/bin/R")).exists() && launchRserve("/usr/local/lib/R/bin/R")) ||
            ((new File("/usr/lib/R/bin/R")).exists() && launchRserve("/usr/lib/R/bin/R")) ||
            ((new File("/usr/local/bin/R")).exists() && launchRserve("/usr/local/bin/R")) ||
            ((new File("/sw/bin/R")).exists() && launchRserve("/sw/bin/R")) ||
            ((new File("/usr/common/bin/R")).exists() && launchRserve("/usr/common/bin/R")) ||
            ((new File("/opt/bin/R")).exists() && launchRserve("/opt/bin/R"))
            );

    launchRserve函数首先使用R命令启动Rserve,然后运行RConnection c new RConnection();看是否抛出异常,判断是否启动成功

            if (osname != null && osname.length() >= 7 && osname.substring(0,7).equals("Windows")) {
                isWindows = true; /* Windows startup */
                p = Runtime.getRuntime().exec("\""+cmd+"\" -e \"library(Rserve);Rserve("+(debug?"TRUE":"FALSE")+",args='"+rsrvargs+"')\" "+rargs);
            } else /* unix startup */
                p = Runtime.getRuntime().exec(new String[] {
                                  "/bin/sh", "-c",
                                  "echo 'library(Rserve);Rserve("+(debug?"TRUE":"FALSE")+",args=\""+rsrvargs+"\")'|"+cmd+" "+rargs
                                  });
        int attempts = 5; /* try up to 5 times before giving up. We can be conservative here, because at this point the process execution itself was successful and the start up is usually asynchronous */
        while (attempts > 0) {
            try {
                RConnection c = new RConnection();
                System.out.println("Rserve is running.");
                c.close();
                return true;
            } catch (Exception e2) {
                System.out.println("Try failed with: "+e2.getMessage());
            }
            /* a safety sleep just in case the start up is delayed or asynchronous */
            try { Thread.sleep(500); } catch (InterruptedException ix) { };
            attempts--;
        }
        return false;

5. Windows下使用Rserve的限制

  • no parallel connections are supported, subsequent connections share the same namespace
  • sessions are not supported - this is a consequence of the fact that parallel connections are not supported

Since the Windows operating system doesn't support fork method for spawning copies of a process, it is not possible to initialize R and use initialized copies for all subsequent connections in parallel. Therefore the Rserve for Windows supports no concurrent connections. This implies that all subsequent connections share the same namespace and sessions (as in >=0.4 version on unix) cannot be supported. It is still possible to start multiple Rserves to handle multiple connections (just make sure you use different port for each one). 

参考链接:http://rforge.net/Rserve/rserve-win.html

 

二、JRI(嵌入式模式)

JRI,全名是Java/R Interface,这是一种完全不同的方式,通过调用R的动态链接库从而利用R中的函数等。

使用方法详见https://www.cnblogs.com/tomcattd/p/3369938.html

此方法没有实践过,不再详述

 

关于两种方式优缺点比较的摘录

摘录一

  1.1 JRI(嵌入式模式)我体会到最大的优点是它对中文的支持较好,但是使用JRI模式下很容易造成整个系统的崩溃,比如在java调用R的时候,中间出现了异常或者错误,这些错误大致都是致命的,导致java虚拟机崩溃,从而导致整个系统崩溃,这是一个可怕的噩梦。

  1.2 Rserve(远程通信模式) 在这种通信模式下,最大的优点是javaWeb项目不需要去维护R的运行,通过TCP/IP协议直接进行通讯,但是有一个很大的缺点是它对中文的支持很弱,尤其是在windows的环境中。基本是不支持中文的,在linux环境下,似乎对中文的支持稍微好些。不是完全支持中文的话,对返回有中文或者输入有中文的系统将是不可用的。

  小结:在项目的开发中,我首先使用的是JRI模式,将项目部署以后,经常出现崩溃问题,所以最终还是放弃了JRI调用模式,随之使用了Rserve远程调用模式,虽然不支持中文,但是项目本身的传参是没有中文的,返回的数据都由R处理以后,返回数据库,只返回一个状态量给web服务器。

摘录二

Instead of RServe, you can use JRI, that is shipped with rJava package.

In my opinion JRI is better than RServe, because instead of creating a separate process it uses native calls to integrate Java and R.

With JRI you don't have to worry about ports, connections, watchdogs, etc... The calls to R are done using an operating system library (libjri).

The methods are pretty similar to RServe, and you can still use REXP objects.

 

posted on 2017-11-18 17:01  guoxiang  阅读(5622)  评论(0编辑  收藏  举报