Total Pageviews

Wednesday 19 August 2015

Integrate of R with Java using Rserve

Introduction

Building Machine Learning based analytics applications require usage of a range of technologies. Java proves to be a great language for building enterprise solutions, however java lacks on analytics front. To compensate this gap we have languages like R which has a rich set of Machine Learning and Statistical libraries. Integrating both these technologies we could create high end Machine Learning based applications. In the previous post  Integrate R with Java using rJava I have explained in detail, what benefits we could achieve by integrating R with Java, and what are the application architectures where such kind of integration is required.
There are two main packages to integrate R with Java:
  1. rJava
  2. Rserve
In the previous post we have discussed the process of integrating R with Java using rJava library. In this post we will be discussing the differences betweenrJava and Rserve packages. We will also be discussing step by step process ofHow to integrate R with Java using Rserve library.

Difference between Rserve and rJava packages

The main differences between rJava and Rserve could be discussed under following headings:
  1. Operating in a server mode.
  2. Ease of use.
Operating in a server mode means whether the library runs as a server, to which a client program could connect and perform the task. Or does the library is used as an API this is directly called from inside the program and has no client server nature. Based on this criterion, rJava is used as an API i.e. it does not involve any client server communication. The program using rJava directly uses it to execute the R code. On the other hand Rserve works in a client server manner. You start an instance of Rserver server and client could communicate to Rserve over TCP/IP (for more information about Rserve library referRserve).
Note: Basically rJava provides a low level or system level communication to R, while Rserve works on TCP/IP communication.
Ease of use is a subjective criterion. But as per my experience I find Rserve better compared to rJava. As we have seen in the previous tutorial that while configuring rJava we have to set various paths, we have to configure various dlls. While in case of Rserve you simple add its library to your R setup and directly use it from your Java code.
Now that we have compared the features of rJava and Rserve we will start with the technicalities of integrating R with Java using Rserve.

How to Integrate R with Java using Rserve package?

The components and their versions used for this tutorial are mentioned below:
  1. Operating System: Windows 7, 32 bit.
  2. JDK: Version 1.7 or above.
  3. Eclipse: Luna.
  4. R Workbench: This is the GUI used to run R scripts. We are using R 3.1.3you could download the latest version from the same link.

Configuring R

Simply install the R workbench downloaded above. Try installing the workbench at a location other than C:\ drive as it has some permission issues. For the purpose of this tutorial R workbench is installed inD:\ProgramFiles\R directory.

Integration Steps

Step-1 (Installing Rserve package)
Open your R workbench and enter following command on your R console
A window with header CRAN mirror will open asking you to select the mirror from which you want to install the package. For the purpose of this tutorial we have chosen USA (KS). You could select any other mirror also. After selecting the mirror press OK. R will start installing the package, and after the package is installed your R console will show following message.
1Fig. 1
Step-2 (Starting Rserve server)
Once you have installed the Rserve package, you need to start the server. To start the server firstly you have to import the package in your current R instance. Type following command on your R console to import Rserve package.
Then type following command to start Rserve server at default port 6311.
Your console will look something like:
2Fig. 2
That is all you need to run an instance of Rserve server.
Step-3 (Creating JAVA Client)
Now that you have Rserve server running, you need a Java program that communicates with R using Rserve and uses R functionality inside Java code. We will be creating the program in eclipse as follows:
  1. Open eclipse Luna.
  2. Create a Java project named RserveProject.
  3. Rserve provides some client jars that are used inside Java program to communicate with R. These jar files are included in the Rserve package that you installed from R console.
  4. For my installation the jar files are located at D:\ProgramFiles\R\R-3.1.3\library\Rserve\java\ however if you installed your R setup somewhere else your path to these files will be<YOUR_R_HOME>\library\Rserve\java\. Main jar files needed are:REngine.jar and Rserve.jar.
  5. You need to include these two jars in your eclipse project. In the Package Explorer section right click on the project and select Build Path > Configure Build Path.3                                                         Fig. 3
  6. In window titled Properties for RserveProject select Libraries tab.4                                                        Fig. 4
  7. Now select Add External JARs button in the right panel. Browse to location <YOUR_R_HOME>\library\Rserve\java\ and select filesREngine.jar and Rserve.jar click Open button on current window andOk button on next window.
  8. Now the structure of your eclipse project RserveProject will look similar to the figure below:                               5                                                        Fig. 5
  9. New create a package named pkg under src folder of RserveProject and create a class Temp.java under pkg.
That is all that is needed with the configuration part of java. Now we need to create Java code.
Step-4 (JAVA client for Rserve)
For Java code we will be using a use-case where we have a R vector c(1,2,3,4)and we want to compute its mean using R. The Java code for the use-case is as follows:
Step-4 (Output of Java program)
As the R vector is c(1,2,3,4) so its mean should be (1+2+3+4)/4=10/4=2.5

Calling User-defined R functions in Java

Above program shows how to use built-in functions of R from Java. However you may face a situation where you have some user defined functions in a R script and want to use those functions from Java code. Lets’ say we need a custom myAdd() function, that adds two integers. To solve this use case proceed as follows:
Step-1 (Create a R script)
Open a text editor and paste following code in it:
Here we are just defining a function myAdd() that takes two parameters x andy and returns their sum. Save this file as MyScript.R on your disk (we have used D:\MyScript.R location)
Step-2 (Java program to call external R script)
Now create a Java program as you created above and use following code:
Here you are firstly importing the code written in D:\MyScript.R in your Rserve context. Then you are using user defined function myAdd(). Running this code should return result 30.
A note on slashes in the path: As you could see that we have used four slashes (\\\\) in the above path. In R if you are using back slash (\) in thesource() command, then you have to escape it with another \. So actual R command is:
Now as this command will be passed as String in Java code, so in Java code you have to escape each slash (\) with another slash (\). java.lang.String format of above R command is:
However if you are using front slash (/) in your path then there is no change is R and Java syntax. Then R command looks like:
Java version is also similar:

A note on multi-threaded nature of Rserve

As Rserve library runs in the form of a server, it could handle multiple requests simultaneously. With this we mean that when we start an instance of Rserve with command
then this instance should handle multiple requests send by different invocations of line
from Java code. Rserve is capable of handling multiple requests simultaneously by creating a separate process for each request using fork() system call.
Linux environment
On Linux environment you could simply launch an instance of Rserve using
command and then use various
calls to create multiple connections to Rserve (as fork() facility is available on Linux environment)
Windows environment
As fork() is not present on windows you could not handle multiple requests using above commands (however for one request at a time it works fine). There is a work around to handle this situation. Suppose you have a Java application that creates 3 threads and all these threads create a connection to R using
this scenario will not work on windows as windows will not be able to create a new thread for each call. To overcome this situation start 3 instances of Rserve from R console:
Now that you have 3 separate instances your 3 threads could easily connect to these 3 instances:
Thread 1:
Thread 2:
Thread 3:

No comments:

Post a Comment