How to analyze a Neo4j graph via Blueprints interface using JUNG in Scala

This is a summary of the 3 part blog series I wrote for the www.clueda.com blog in the last month. Please find the original blog posts here:

http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-1/
http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-2/
http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-3/

Clueda AG

About

Graph databases attending more and more interest as they bring significant advantages for specific problems. Mostly, graph databases are used as storage for interactions like one can see for example in social networks. The biggest advantage of graph databases is presumably the easy query for friends and relations between nodes. Here at Clueda, we use graph databases to store our knowledge graph extracted from medical and financial news. Hence, our requirements for graph databases are a bit different, besides using the graph as storage, we also want to run extensive graph analysis algorithms on it in order to get best insights into the data. In this tutorial, we will learn how to apply the Java Universal Network/Graph (JUNG) Framework on an embedded Neo4j database. As we want to keep the database interface flexible, we will use the Blueprints interface to communicate with the graph database. Clueda loves Scala, but the examples showed can easily be transferred to plain Java.

Preliminaries

This tutorial concentrates on Blueprints, Neo4j and JUNG, especially on how to integrate them. Thus we assume you are familiar with Scala and SBT. If not, there are many good introductions to these topics, for example here:

http://www.scalatutorial.de/

http://www.scala-sbt.org/release/tutorial/Setup.html

In the tutorial, we will use IntelliJ IDEA as IDE on a Ubuntu machine. Please make sure that you have a Java SDK installed, as well as Scala and SBT running on your system.

You can download and install IntelliJ IDEA community edition for free from here: http://www.jetbrains.com/idea/download/

If you prefer developing with Eclipse, examples should be easy to adopt. You might want to skip the first chapter of this tutorial, which explains how to create a new Scala SBT project in IntelliJ IDEA 13.

The tutorial is split into 3 parts:

Part 1: Create a Scala SBT project
Part 2: Install Neo4j
Part 3: Install Blueprints Neo4j and use JUNG

1. Create a Scala SBT project

To get started, we first create a new Scala SBT project in IntelliJ IDEA. On the welcome screen, select “Create New Project

Create new Scala SBT project

Select project type “Scala” on the left side and “SBT” on the right side to create a project backed by SBT:

Select Scala SBT

Give the project a meaningful name, I will call it Neo4JBlueprintJung and select the Java Project SDK that is installed on your machine.
Create the project by clicking the “Finish” button:

Naming a project in IntelliJ IDEA

 

IntelliJ IDEA will open a new project for you. It can take a few seconds until the project is initialized completely and the “src” folder shows up. Right click on the “src/main/scala” folder and select “New” -> “Package” to create a new package:

Create new package

I am calling my package com.clueda”. Next, lets create a new Scala file and simply name it “Main”:

Add new Scala file

Please note to choose kind  to be “Object” in order to get an executable Scala object:

Name Scala file and choose kind

Time to execute our class for the first time in order to see that everything works as expected. Before we can run the class, we need to update it like this:

object Main extends App{
  println("Hello Clueda!")
}

Right click somewhere at the file and choose “Run Main”

Run Main scala object

The run dialog pops up at the bottom of IntelliJ IDEA and should print “Hello Clueda!

Next

In the next part of this tutorial, we will install and configure Neo4j, before we set up the Blueprints interface and analyze the graph in Part 3.

2. Install Neo4j

In order to install Neo4j, we first need to download it from the Neo4j website:

http://neo4j.com/download/

Neo4j comes in a free community edition as well as a paid enterprise edition. For this tutorial, we will go with the free edition.

At this point, we need to pay attention! As we want the Neo4j database be used from Scala via the Blueprints interface, we cannot just install the latest version. We need to check which is the latest version supported by the Neo4j Blueprints implementation. To do so, we check the maven repository for the latest release:

http://search.maven.org/#artifactdetails|com.tinkerpop.Blueprints|Blueprints-neo4j2-graph|2.5.0|jar

Please note that the correct repository for Neo4j versions higher than 2 is called neo4j2 instead of neo4j!

From the maven dependencies we see that the latest version supported for Neo4j is 2.0.1

Maven blueprints-neo4j2-graph repository

Thus, from the Neo4j website we download the latest version 2.0.X, whereas the latest version at the time writing this blog entry was 2.1.2!

Neo4j comes in a compressed archive. For installation, all you need to do is extracting the archive. As we will use Neo4j embedded in the Scala project, I will extract the files inside the project structure of our Neo4jBlueprintJung project.

Extract Neo4j to project

Since IntelliJ IDEA 13, there is an integrated Terminal which can be very useful. You will find it at the bottom left of the window. Of course, all the following can also be done from the system terminal directly.

We first gonna test if neo4j works properly by starting the database for the first time.

Inside the neo4j-community-2.0.4 folder, run this command from the terminal:

./bin/neo4j start

The console will print some logging and once its finished it will present you with the REST starting point at http://localhost:7474. If you open that address in a browser, you will see the Neo4j web interface, which is pretty handy for executing commands and visualizing nodes and relations of the database.  Please see the neo4j documentation for more information:

http://docs.neo4j.org/chunked/stable/tools-webadmin.html

Neo4j web interface

The database created is actually stored in a directory that can be configured inside the /conf/neo4j-server.properties file.

neo4j-server.properties file

From here we see, the database is stored in the neo4j-community-2.0.4 folder under directory root in file graph.db. We could easily change the directory here by replacing the path here.

You can stop the database by calling:

./bin/neo4j stop

Attention! Please make sure that the neo4j database server is NOT running when using it embedded from the Scala project! The server is locking the .db folder which leads to a connection problem when starting it from via Blueprints. If you want to use both in parallel, you need to run the server and connect to it using Rexter instead of embedding it!

OK great, Neo4j is installed and running! Next we need to integrate it into our Scala project.

Next

As we fulfilled all prerequisites and set up everything we need, its time to see the real magic, how we can use Neo4j via Blueprints and run graph analytics on it in the last Part 3.

3. Install Blueprints Neo4j

In order to connect to Neo4j via Blueprints, we first need to add the following dependency to our projects build.SBT file (this is copied from the maven repository as seen above):

libraryDependencies += "com.tinkerpop.blueprints" % "blueprints-neo4j2-graph" % "2.5.0"

After editing the build SBT file, IntelliJ IDEA will ask you to re-import the project. Press “Refresh”

Refresh sbt deoendencies in IntelliJ IDEA

On refreshing, IntelliJ IDEA is updating and downloading SBT dependencies, which may take some minutes. You can see the status in the status bar at the bottom of IntelliJ IDEA.

Connect to Graph Database

Once refreshing is finished, we can create a new Blueprints neo4jGraph by calling:

// get embedded blueprint neo4j graph
val graph:Neo4j2Graph = new Neo4j2Graph("neo4j-community-2.0.4/data/graph.db");

// close connection to graph, important!
graph.shutdown();

Note that the constructor takes a string of the path to the database file that should be used for the embedded version of the Neo4j database.
It is very important to shutdown the graph at the end of the program in order to release the lock on the graph.db file!

Further, we need to add the following import

import com.tinkerpop.Blueprints.impls.neo4j2.Neo4j2Graph

Run the project to see that everything works fine. It should compile and run without any errors.
If you get an error like this:

Exception in thread “main” java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /home/torsten/IdeaProjects/Neo4JBlueprintJung/neo4j-community-2.0.4/data/graph.db
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(Neo4j2Graph.java:163)
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(Neo4j2Graph.java:135)
    at com.clueda.Main$delayedInit$body.apply(Main.scala:17)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at com.clueda.Main$.main(Main.scala:13)
    at com.clueda.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /home/torsten/IdeaProjects/Neo4JBlueprintJung/neo4j-community-2.0.4/data/graph.db
    at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:330)
    at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:63)
    at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:92)
    at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:198)
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(Neo4j2Graph.java:153)
    … 16 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component ‘org.neo4j.kernel.StoreLockerLifecycleAdapter@30a6aae0’ was successfully initialized, but failed to start. Please see attached cause exception.
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:509)
    at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:115)
    at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:307)
    … 20 more
Caused by: org.neo4j.kernel.StoreLockException: Unable to obtain lock on store lock file: neo4j-community-2.0.4/data/graph.db/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)
    at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:82)
    at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:44)
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:503)
    … 22 more
Caused by: java.io.IOException: Unable to lock sun.nio.ch.FileChannelImpl@2be0c055
    at org.neo4j.kernel.impl.nioneo.store.FileLock.wrapFileChannelLock(FileLock.java:38)
    at org.neo4j.kernel.impl.nioneo.store.FileLock.getOsSpecificFileLock(FileLock.java:93)
    at org.neo4j.kernel.DefaultFileSystemAbstraction.tryLock(DefaultFileSystemAbstraction.java:89)
    at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:74)
    … 24 more

you probably forgot to stop the Neo4j server which is locking the graph.db file. Run

./bin/neo4j stop

from a terminal in the Neo4j directory and try running the Scala program again.

If you still get a lock error, you might not have closed the graph properly from the code. In this case you need to kill the process which is holding the lock. See lsof for help on how to find process locks on Unix based systems.

Add Data to Graph

So far, we successfully connected to a Neo4j graph database via Blueprints from Scala. It’s time to add some data to it. As we are using the Blueprints interface, this is pretty easy and straightforward.

First we create three nodes, which are called Vertex in Blueprints

val a: Vertex = graph.addVertex(null)
val b: Vertex = graph.addVertex(null)
val c: Vertex = graph.addVertex(null)

Lets give them names

a.setProperty("name", "Marge")
b.setProperty("name", "Homer")
c.setProperty("name", "Barney")

And finally add some edges between the nodes

val e:Edge = graph.addEdge(null, a, b, "knows")
e.setProperty("since", 1989)
val e2:Edge = graph.addEdge(null, b, c, "knows")
e2.setProperty("since", 1989)

Lets check how our graph looks like by starting the Neo4j server from terminal again and opening the web interface at localhost:7474 again. You can get an overview of all nodes by querying

MATCH (n) RETURN n

Graph in Neo4j web interface

Cool, we just created some nodes and edges in Neo4j using Scala with the Blueprints interface.

Of course, we can also fire Cypher queries at the graph directly without using Blueprints. To do so, we need to extract the raw Neo4j graph from the Neo4j Blueprints graph we just created. That is easily done by calling:

graph.getRawGraph

To query for example all nodes as we just did from the web interface, we could do something like:

val engine = new ExecutionEngine(graph.getRawGraph)
val result:ExecutionResult = engine.execute("MATCH (n) RETURN n")
val it = result.columnAs[Node]("n")
val lst = it.toList
result.close()
val tx:Transaction = graph.getRawGraph().beginTx()
try{
   for (node

and add imports

import org.neo4j.cypher.{ExecutionResult, ExecutionEngine} 
import org.neo4j.graphdb.{Transaction, Node}

The output will look like

Hello Clueda!
Node: Marge 
Node: Homer 
Node: Barney

First, a new ExecutionEngine is generated knowing the raw Neo4j graph. On this, we can execute a Cypher command and select a column from it. Using an iterator, we go through the results and extract the “name” property from the nodes and print it to the console. Please note that all communication has to be done inside a Transaction!

Install JUNG

Now as we have a “nice” graph in place, we are going to run some general graph analysis tasks on it. Blueprints comes with a nice approach for graph algorithms, called “Furnace”.

https://github.com/tinkerpop/furnace/wiki

Unfortunately, at the time writing this post, development has just started. I am really looking forward to see how this project develops in future. But fortunately there is a very good, stable and well developed graph algorithms library, the Java Universal Network/Graph Framework JUNG

http://jung.sourceforge.net/

And even more luckily, there is a little tweak on how we can use the JUNG framework to work on our Blueprints graph. First, we need to add another dependency to our build.SBT file

"com.tinkerpop.blueprints" % "blueprints-graph-jung" % "2.5.0"

IntelliJ IDEA may want to refresh the project again.
If we have a look a the corresponding maven repository, we see that what I called tweak, actually is a Blueprints wrapper of JUNG.
And thus, we can easily instantiate a new JungGraph form an existing Blueprints graph, in our case, the neo4j graph:

val jungGraph = new GraphJung(graph)

On the JUNG graph, we can thus easily perform graph algorithms. For example calculating the Dijkstra distance:

val dj:DijkstraDistance[Vertex,Edge] = new DijkstraDistance(jungGraph)

val distanceMargeHomer = dj.getDistance(graph.getVertex("0"), graph.getVertex("1"))
val distanceMargeBarney = dj.getDistance(graph.getVertex("0"), graph.getVertex("2"))

println(s"Distance between Marge and Homer: $distanceMargeHomer")
println(s"Distance between Marge and Barney: $distanceMargeBarney")

Note that we can use the original Neo4j Blueprints Graph (graph instead of jungGraph) to select the nodes, which is great!
Make sure you import

import edu.uci.ics.jung.algorithms.shortestpath.DijkstraDistance

From the output we see that Marge and Homer have distance 1.0, whereas Marge and Barney have distance 2.0 as they are connected via Homer.

Distance between Marge and Homer: 1.0 
Distance between Marge and Barney: 2.0

Summary

In this tutorial, we learned how to integrate a graph database into a Scala project using the Blueprints interface. The interface allows us, at least in theory, to replace Neo4j with any other graph database supporting Blueprints easily. First, we created a new Scala project using IntelliJ IDEA 13. We saw how to install, run and configure an embedded Neo4j database, inserted some simple data and queried it from the database. Finally, we used the JUNG framework to run graph analysis on our Blueprints graph database.

Graph databases are gaining more and more interest as they provide obvious ways to store data that is best represented as a network, such as social network data or company interactions. The hype just started and more and more projects are being initialized in order to provide extended functionality for graph databases, such as the Blueprints interface. Graph databases are mainly made for storage, but what brings the real benefit, at least from the view of a Big Data Scientist as I am, is the possibility to analyze the graph in order to get valuable insights and information out of nodes and relations. Being able to use the JUNG Framework is yet a great step and I am really looking forward to what is up to come in the next years!

Directions to go from here

Rexter

In this tutorial, we used an embedded version of the neo4j database. In many real word scenarios, the graph database is running on a separate server. In order to connect to this server, Blueprints interface provides Rexter, a graph server that exposos any Blueprints graph through REST and a binary protocol called RexPro.

https://github.com/tinkerpop/rexster/wiki

AnormCypther

Another very interesting project is AnormCypher

https://github.com/AnormCypher/AnormCypher

A Neo4j client library for the HTTP Cypher endpoints. During this tutorial, I explained how to use Cypher directly on the raw Neo4j graph and how to parse the results within a Transaction. When using Neo4J as a separate server, AnormCypher is taking care for all this communication and brings great advantage.

Files

build.sbt

name := "Neo4JBlueprintJung"

version := "1.0"

libraryDependencies ++= Seq(
  "com.tinkerpop.blueprints" % "blueprints-neo4j2-graph" % "2.5.0",
  "com.tinkerpop.blueprints" % "blueprints-graph-jung" % "2.5.0"
)

Main.scala

package com.clueda

import com.tinkerpop.blueprints.oupls.jung.GraphJung
import com.tinkerpop.blueprints.{Vertex, Edge}
import com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph
import edu.uci.ics.jung.algorithms.shortestpath.DijkstraDistance
import org.neo4j.cypher.{ExecutionResult, ExecutionEngine}
import org.neo4j.graphdb.{Transaction, Node}

/**
 * Created by Torsten on 09.07.14.
 */
object Main extends App{
  println("Hello Clueda!")

  // get embedded blueprint neo4j graph
  val graph:Neo4j2Graph = new Neo4j2Graph("neo4j-community-2.0.4/data/graph.db");

  val engine = new ExecutionEngine(graph.getRawGraph)
  val result:ExecutionResult = engine.execute("MATCH (n) RETURN n")
  val it = result.columnAs[Node]("n")
  val lst = it.toList
  result.close()
  val tx:Transaction = graph.getRawGraph().beginTx()
  try{
    for (node <- lst) yield println("Node: " + node.getProperty("name"))
    tx.success();
  }

  val jungGraph = new GraphJung(graph)

  val dj:DijkstraDistance[Vertex,Edge] = new DijkstraDistance(jungGraph)

  val distanceMargeHomer = dj.getDistance(graph.getVertex("0"), graph.getVertex("1"))
  val distanceMargeBarney = dj.getDistance(graph.getVertex("0"), graph.getVertex("2"))

  println(s"Distance between Marge and Homer: $distanceMargeHomer")
  println(s"Distance between Marge and Barney: $distanceMargeBarney")

  // uncomment the following lines to add data to the graph database
//  val a: Vertex = graph.addVertex(null)
//  val b: Vertex = graph.addVertex(null)
//  val c: Vertex = graph.addVertex(null)
//
//  a.setProperty("name", "Marge")
//  b.setProperty("name", "Homer")
//  c.setProperty("name", "Barney")
//
//  val e:Edge = graph.addEdge(null, a, b, "knows")
//  e.setProperty("since", 1989)
//  val e2:Edge = graph.addEdge(null, b, c, "knows")
//  e2.setProperty("since", 1989)

  // close connection to graph, important!
  graph.shutdown();
} 

Downloads

Please also find the complete project as a zip compressed file at Clueda’s Website:

http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-3/