How to analyze a Neo4j graph via Blueprints interface using JUNG in Scala

This is a summary of the 3 part blog series I wrote for the blog in the last month. Please find the original blog posts here:

Clueda AG


Graph databases attending more and more interest as they bring significant advantages for specific problems. Mostly, graph databases are used as storage for interactions like one can see for example in social networks. The biggest advantage of graph databases is presumably the easy query for friends and relations between nodes. Here at Clueda, we use graph databases to store our knowledge graph extracted from medical and financial news. Hence, our requirements for graph databases are a bit different, besides using the graph as storage, we also want to run extensive graph analysis algorithms on it in order to get best insights into the data. In this tutorial, we will learn how to apply the Java Universal Network/Graph (JUNG) Framework on an embedded Neo4j database. As we want to keep the database interface flexible, we will use the Blueprints interface to communicate with the graph database. Clueda loves Scala, but the examples showed can easily be transferred to plain Java.


This tutorial concentrates on Blueprints, Neo4j and JUNG, especially on how to integrate them. Thus we assume you are familiar with Scala and SBT. If not, there are many good introductions to these topics, for example here:

In the tutorial, we will use IntelliJ IDEA as IDE on a Ubuntu machine. Please make sure that you have a Java SDK installed, as well as Scala and SBT running on your system.

You can download and install IntelliJ IDEA community edition for free from here:

If you prefer developing with Eclipse, examples should be easy to adopt. You might want to skip the first chapter of this tutorial, which explains how to create a new Scala SBT project in IntelliJ IDEA 13.

The tutorial is split into 3 parts:

Part 1: Create a Scala SBT project
Part 2: Install Neo4j
Part 3: Install Blueprints Neo4j and use JUNG

1. Create a Scala SBT project

To get started, we first create a new Scala SBT project in IntelliJ IDEA. On the welcome screen, select “Create New Project

Create new Scala SBT project

Select project type “Scala” on the left side and “SBT” on the right side to create a project backed by SBT:

Select Scala SBT

Give the project a meaningful name, I will call it Neo4JBlueprintJung and select the Java Project SDK that is installed on your machine.
Create the project by clicking the “Finish” button:

Naming a project in IntelliJ IDEA


IntelliJ IDEA will open a new project for you. It can take a few seconds until the project is initialized completely and the “src” folder shows up. Right click on the “src/main/scala” folder and select “New” -> “Package” to create a new package:

Create new package

I am calling my package com.clueda”. Next, lets create a new Scala file and simply name it “Main”:

Add new Scala file

Please note to choose kind  to be “Object” in order to get an executable Scala object:

Name Scala file and choose kind

Time to execute our class for the first time in order to see that everything works as expected. Before we can run the class, we need to update it like this:

object Main extends App{
  println("Hello Clueda!")

Right click somewhere at the file and choose “Run Main”

Run Main scala object

The run dialog pops up at the bottom of IntelliJ IDEA and should print “Hello Clueda!


In the next part of this tutorial, we will install and configure Neo4j, before we set up the Blueprints interface and analyze the graph in Part 3.

2. Install Neo4j

In order to install Neo4j, we first need to download it from the Neo4j website:

Neo4j comes in a free community edition as well as a paid enterprise edition. For this tutorial, we will go with the free edition.

At this point, we need to pay attention! As we want the Neo4j database be used from Scala via the Blueprints interface, we cannot just install the latest version. We need to check which is the latest version supported by the Neo4j Blueprints implementation. To do so, we check the maven repository for the latest release:|com.tinkerpop.Blueprints|Blueprints-neo4j2-graph|2.5.0|jar

Please note that the correct repository for Neo4j versions higher than 2 is called neo4j2 instead of neo4j!

From the maven dependencies we see that the latest version supported for Neo4j is 2.0.1

Maven blueprints-neo4j2-graph repository

Thus, from the Neo4j website we download the latest version 2.0.X, whereas the latest version at the time writing this blog entry was 2.1.2!

Neo4j comes in a compressed archive. For installation, all you need to do is extracting the archive. As we will use Neo4j embedded in the Scala project, I will extract the files inside the project structure of our Neo4jBlueprintJung project.

Extract Neo4j to project

Since IntelliJ IDEA 13, there is an integrated Terminal which can be very useful. You will find it at the bottom left of the window. Of course, all the following can also be done from the system terminal directly.

We first gonna test if neo4j works properly by starting the database for the first time.

Inside the neo4j-community-2.0.4 folder, run this command from the terminal:

./bin/neo4j start

The console will print some logging and once its finished it will present you with the REST starting point at http://localhost:7474. If you open that address in a browser, you will see the Neo4j web interface, which is pretty handy for executing commands and visualizing nodes and relations of the database.  Please see the neo4j documentation for more information:

Neo4j web interface

The database created is actually stored in a directory that can be configured inside the /conf/ file. file

From here we see, the database is stored in the neo4j-community-2.0.4 folder under directory root in file graph.db. We could easily change the directory here by replacing the path here.

You can stop the database by calling:

./bin/neo4j stop

Attention! Please make sure that the neo4j database server is NOT running when using it embedded from the Scala project! The server is locking the .db folder which leads to a connection problem when starting it from via Blueprints. If you want to use both in parallel, you need to run the server and connect to it using Rexter instead of embedding it!

OK great, Neo4j is installed and running! Next we need to integrate it into our Scala project.


As we fulfilled all prerequisites and set up everything we need, its time to see the real magic, how we can use Neo4j via Blueprints and run graph analytics on it in the last Part 3.

3. Install Blueprints Neo4j

In order to connect to Neo4j via Blueprints, we first need to add the following dependency to our projects build.SBT file (this is copied from the maven repository as seen above):

libraryDependencies += "com.tinkerpop.blueprints" % "blueprints-neo4j2-graph" % "2.5.0"

After editing the build SBT file, IntelliJ IDEA will ask you to re-import the project. Press “Refresh”

Refresh sbt deoendencies in IntelliJ IDEA

On refreshing, IntelliJ IDEA is updating and downloading SBT dependencies, which may take some minutes. You can see the status in the status bar at the bottom of IntelliJ IDEA.

Connect to Graph Database

Once refreshing is finished, we can create a new Blueprints neo4jGraph by calling:

// get embedded blueprint neo4j graph
val graph:Neo4j2Graph = new Neo4j2Graph("neo4j-community-2.0.4/data/graph.db");

// close connection to graph, important!

Note that the constructor takes a string of the path to the database file that should be used for the embedded version of the Neo4j database.
It is very important to shutdown the graph at the end of the program in order to release the lock on the graph.db file!

Further, we need to add the following import

import com.tinkerpop.Blueprints.impls.neo4j2.Neo4j2Graph

Run the project to see that everything works fine. It should compile and run without any errors.
If you get an error like this:

Exception in thread “main” java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /home/torsten/IdeaProjects/Neo4JBlueprintJung/neo4j-community-2.0.4/data/graph.db
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(
    at com.clueda.Main$delayedInit$body.apply(Main.scala:17)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at com.clueda.Main$.main(Main.scala:13)
    at com.clueda.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at com.intellij.rt.execution.application.AppMain.main(
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /home/torsten/IdeaProjects/Neo4JBlueprintJung/neo4j-community-2.0.4/data/graph.db
    at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(
    at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(
    at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(
    … 16 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component ‘org.neo4j.kernel.StoreLockerLifecycleAdapter@30a6aae0’ was successfully initialized, but failed to start. Please see attached cause exception.
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(
    at org.neo4j.kernel.lifecycle.LifeSupport.start(
    … 20 more
Caused by: org.neo4j.kernel.StoreLockException: Unable to obtain lock on store lock file: neo4j-community-2.0.4/data/graph.db/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)
    at org.neo4j.kernel.StoreLocker.checkLock(
    at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(
    … 22 more
Caused by: Unable to lock
    at org.neo4j.kernel.DefaultFileSystemAbstraction.tryLock(
    at org.neo4j.kernel.StoreLocker.checkLock(
    … 24 more

you probably forgot to stop the Neo4j server which is locking the graph.db file. Run

./bin/neo4j stop

from a terminal in the Neo4j directory and try running the Scala program again.

If you still get a lock error, you might not have closed the graph properly from the code. In this case you need to kill the process which is holding the lock. See lsof for help on how to find process locks on Unix based systems.

Add Data to Graph

So far, we successfully connected to a Neo4j graph database via Blueprints from Scala. It’s time to add some data to it. As we are using the Blueprints interface, this is pretty easy and straightforward.

First we create three nodes, which are called Vertex in Blueprints

val a: Vertex = graph.addVertex(null)
val b: Vertex = graph.addVertex(null)
val c: Vertex = graph.addVertex(null)

Lets give them names

a.setProperty("name", "Marge")
b.setProperty("name", "Homer")
c.setProperty("name", "Barney")

And finally add some edges between the nodes

val e:Edge = graph.addEdge(null, a, b, "knows")
e.setProperty("since", 1989)
val e2:Edge = graph.addEdge(null, b, c, "knows")
e2.setProperty("since", 1989)

Lets check how our graph looks like by starting the Neo4j server from terminal again and opening the web interface at localhost:7474 again. You can get an overview of all nodes by querying


Graph in Neo4j web interface

Cool, we just created some nodes and edges in Neo4j using Scala with the Blueprints interface.

Of course, we can also fire Cypher queries at the graph directly without using Blueprints. To do so, we need to extract the raw Neo4j graph from the Neo4j Blueprints graph we just created. That is easily done by calling:


To query for example all nodes as we just did from the web interface, we could do something like:

val engine = new ExecutionEngine(graph.getRawGraph)
val result:ExecutionResult = engine.execute("MATCH (n) RETURN n")
val it = result.columnAs[Node]("n")
val lst = it.toList
val tx:Transaction = graph.getRawGraph().beginTx()
   for (node

and add imports

import org.neo4j.cypher.{ExecutionResult, ExecutionEngine} 
import org.neo4j.graphdb.{Transaction, Node}

The output will look like

Hello Clueda!
Node: Marge 
Node: Homer 
Node: Barney

First, a new ExecutionEngine is generated knowing the raw Neo4j graph. On this, we can execute a Cypher command and select a column from it. Using an iterator, we go through the results and extract the “name” property from the nodes and print it to the console. Please note that all communication has to be done inside a Transaction!

Install JUNG

Now as we have a “nice” graph in place, we are going to run some general graph analysis tasks on it. Blueprints comes with a nice approach for graph algorithms, called “Furnace”.

Unfortunately, at the time writing this post, development has just started. I am really looking forward to see how this project develops in future. But fortunately there is a very good, stable and well developed graph algorithms library, the Java Universal Network/Graph Framework JUNG

And even more luckily, there is a little tweak on how we can use the JUNG framework to work on our Blueprints graph. First, we need to add another dependency to our build.SBT file

"com.tinkerpop.blueprints" % "blueprints-graph-jung" % "2.5.0"

IntelliJ IDEA may want to refresh the project again.
If we have a look a the corresponding maven repository, we see that what I called tweak, actually is a Blueprints wrapper of JUNG.
And thus, we can easily instantiate a new JungGraph form an existing Blueprints graph, in our case, the neo4j graph:

val jungGraph = new GraphJung(graph)

On the JUNG graph, we can thus easily perform graph algorithms. For example calculating the Dijkstra distance:

val dj:DijkstraDistance[Vertex,Edge] = new DijkstraDistance(jungGraph)

val distanceMargeHomer = dj.getDistance(graph.getVertex("0"), graph.getVertex("1"))
val distanceMargeBarney = dj.getDistance(graph.getVertex("0"), graph.getVertex("2"))

println(s"Distance between Marge and Homer: $distanceMargeHomer")
println(s"Distance between Marge and Barney: $distanceMargeBarney")

Note that we can use the original Neo4j Blueprints Graph (graph instead of jungGraph) to select the nodes, which is great!
Make sure you import

import edu.uci.ics.jung.algorithms.shortestpath.DijkstraDistance

From the output we see that Marge and Homer have distance 1.0, whereas Marge and Barney have distance 2.0 as they are connected via Homer.

Distance between Marge and Homer: 1.0 
Distance between Marge and Barney: 2.0


In this tutorial, we learned how to integrate a graph database into a Scala project using the Blueprints interface. The interface allows us, at least in theory, to replace Neo4j with any other graph database supporting Blueprints easily. First, we created a new Scala project using IntelliJ IDEA 13. We saw how to install, run and configure an embedded Neo4j database, inserted some simple data and queried it from the database. Finally, we used the JUNG framework to run graph analysis on our Blueprints graph database.

Graph databases are gaining more and more interest as they provide obvious ways to store data that is best represented as a network, such as social network data or company interactions. The hype just started and more and more projects are being initialized in order to provide extended functionality for graph databases, such as the Blueprints interface. Graph databases are mainly made for storage, but what brings the real benefit, at least from the view of a Big Data Scientist as I am, is the possibility to analyze the graph in order to get valuable insights and information out of nodes and relations. Being able to use the JUNG Framework is yet a great step and I am really looking forward to what is up to come in the next years!

Directions to go from here


In this tutorial, we used an embedded version of the neo4j database. In many real word scenarios, the graph database is running on a separate server. In order to connect to this server, Blueprints interface provides Rexter, a graph server that exposos any Blueprints graph through REST and a binary protocol called RexPro.


Another very interesting project is AnormCypher

A Neo4j client library for the HTTP Cypher endpoints. During this tutorial, I explained how to use Cypher directly on the raw Neo4j graph and how to parse the results within a Transaction. When using Neo4J as a separate server, AnormCypher is taking care for all this communication and brings great advantage.



name := "Neo4JBlueprintJung"

version := "1.0"

libraryDependencies ++= Seq(
  "com.tinkerpop.blueprints" % "blueprints-neo4j2-graph" % "2.5.0",
  "com.tinkerpop.blueprints" % "blueprints-graph-jung" % "2.5.0"


package com.clueda

import com.tinkerpop.blueprints.oupls.jung.GraphJung
import com.tinkerpop.blueprints.{Vertex, Edge}
import com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph
import edu.uci.ics.jung.algorithms.shortestpath.DijkstraDistance
import org.neo4j.cypher.{ExecutionResult, ExecutionEngine}
import org.neo4j.graphdb.{Transaction, Node}

 * Created by Torsten on 09.07.14.
object Main extends App{
  println("Hello Clueda!")

  // get embedded blueprint neo4j graph
  val graph:Neo4j2Graph = new Neo4j2Graph("neo4j-community-2.0.4/data/graph.db");

  val engine = new ExecutionEngine(graph.getRawGraph)
  val result:ExecutionResult = engine.execute("MATCH (n) RETURN n")
  val it = result.columnAs[Node]("n")
  val lst = it.toList
  val tx:Transaction = graph.getRawGraph().beginTx()
    for (node <- lst) yield println("Node: " + node.getProperty("name"))

  val jungGraph = new GraphJung(graph)

  val dj:DijkstraDistance[Vertex,Edge] = new DijkstraDistance(jungGraph)

  val distanceMargeHomer = dj.getDistance(graph.getVertex("0"), graph.getVertex("1"))
  val distanceMargeBarney = dj.getDistance(graph.getVertex("0"), graph.getVertex("2"))

  println(s"Distance between Marge and Homer: $distanceMargeHomer")
  println(s"Distance between Marge and Barney: $distanceMargeBarney")

  // uncomment the following lines to add data to the graph database
//  val a: Vertex = graph.addVertex(null)
//  val b: Vertex = graph.addVertex(null)
//  val c: Vertex = graph.addVertex(null)
//  a.setProperty("name", "Marge")
//  b.setProperty("name", "Homer")
//  c.setProperty("name", "Barney")
//  val e:Edge = graph.addEdge(null, a, b, "knows")
//  e.setProperty("since", 1989)
//  val e2:Edge = graph.addEdge(null, b, c, "knows")
//  e2.setProperty("since", 1989)

  // close connection to graph, important!


Please also find the complete project as a zip compressed file at Clueda’s Website:

Statistics made easy using dotplot designer


There are many reasons why people are using statistics software, verifying results for scientific papers, generating business reports or trying to get insight into data to gather new knowledge. Although statistics is an engineering field of its own, it is required to be used in nearly all other areas. Thus, there is a very high demand for proper statistics software and therefore a bunch of these software tools shows up over the last years including the nowadays most popular IBM SPSS and SAS desktop applications that are widely used for enterprise necessities. On the other hand, probably the most popular open source software is R, which indeed is not really an application but rather a programming language for statistical analysis. While SPSS and SAS are very expensive and not easily affordable for personal users, small organisations or university departments, R is rather hard to understand and learn for non computer scientist or programmers. Further, all three and also most other big players are limited to the computational power of the users local machine.

There is a need for a more modern, easy to learn, powerful and cheap software, that’s why dotplot introduced the ‘dotplot designer‘ in 2013, a cloud analytics software that is free for personal usage.


In this post, I want to introduce the basics of dotplot designer to give an easy access to the software. First, let’s have a quick look at the benefits and why its worth giving it a try:

  • Data Analysis Modelling: There is an easy understandable graphical user interface where the statistical process is is modelled by a flow diagram
  • Cloud Analytics: The power of theoretically endless CPU and storage
  • Accessible from everywhere and on any OS, not only on your local Computer
  • It’s free
  • Huge amount of functionality
  • Predefined solutions: Examples managed by dotplot experts and the community

Once you’ve registered, the designer can be started from the top right of the website by pressing the ‘Launch Designer’ button. Once the designer load completely, which may take quite a while, there is a welcome dialogue where you can enter the application in different ways, loading an existing project, opening a solution etc. Whatever you choose, you will be faced with the common user interface which should look something like (the software is pretty new and the design is updated from time to time):


We can roughly divide the interface into 4 parts:



Part 1

The toolbar is located at the top and allows to open, save or create new projects. Further, the reporter can be opened and access is given to solutions and to the help system.

Part 2

All functions that can be used for modelling are located in the so called function tree or function repository. The tree is expandable and organized to model a common data analysis workflow. Starting with data IO and exploration of the data, followed by preprocessing steps and analysis. Further, functions to create plots and graphs can be found in group ‘Visualisation’.

Part 3

The middle of the screen shows open projects in different tabs. Each project runs in its own environment and thus, several projects can be used in parallel.

Part 4

The right side of the user interface shows project specific settings, configuration values for selected function cells and their results.

Having a closer look at the function tree, one can see that dotplot addresses a wide group of users. The first groups of the tree are ready made functions that can be used easily also by unexperienced users. We will see how easy these functions can be used in one of my next posts. The last group ‘Packages‘ is especially made for R experienced users and provides visualisation for R packages. R functions can be used in exactly the same way as in R, but within a graphical modeller.

Getting started

To get started with dotplot designer, it is recommended to make the interactive tutorial that can be started directly from the welcome dialogue or from the help menu in the top toolbar.

But generally, using dotplot designer is pretty easy and straight forward. Functions can be add to the project canvas by either a double click or by drag and drop. For a first test of the application, I suggest using one of the datasets provided by dotplot. You will find them in  the ‘Data Management / Data Repository’ group of the function tree:

Screen Shot 2014-01-01 at 21.01.22


Functions typically have inputs and outputs to plunge in data, do some processing on it and providing the generated results as output nodes. To plot a histogram for example, we add the iris dataset and connect the output of the cell to the input of the histogram function. Some functions require specific parameters to be set in the function configurations in part 4 of the interface. The generated model could look like this:

Screen Shot 2014-01-01 at 21.10.19The resulting plot can be seen in the result window on the bottom right of the screen. Or by double clicking the Histogram’s output node.

Screen Shot 2014-01-01 at 21.11.57


And here you go, that’s all you need to do. Drop a dataset and connect it to some statistic functions. Of course, there are much more functions and statistic or data analysis tasks that require a more complex model. But I hope that little introduction gave a good overview of how dotplot designer works and why its worth trying it. Have fun with it!


Sample instances from a BIF XML file using WEKA

The WEKA machine learning library (version 3.7.3) offers the class BayesNetGenerator to generate Bayesian networks artificially. The class provides a method called generateInstances; to sample instances from a randomly generated network. However, for me it was not clear how to use this code to sample instances from a given Bayesian network stored in a BIF XML file.
Assume that your network file is stored at path filename and your generated instances should be saved in ARFF format to target, you can use the following code:

public void generateDataFromXmlFile(String filename, String taget) throws Exception
    // create a weka BayesNetGenerator
    BayesNetGenerator generator = new BayesNetGenerator();

    // clear stack

    // set the bif xml file

    // define the number of instances to be sampled

    // generate a "random" network
    // is internally generated with the bif xml file

    // generates the instances

    // write the instances to a ARFF file
    StringBuffer text = new StringBuffer();
    FileWriter outfile = new FileWriter(taget);

First, the BayesNetGenerator object is created and the stack is cleared (not sure if this is needed?).
A BIF XML file can be set as template to the generator by calling


where filename is the path to the BIF XML file. After setting the number of samples to be created to 1000, the generator is asked to create a random network. Note that the network is NOT generated randomly, but read form the BIF XML file as we defined the file before.
Lastly, the instances are written to a ARFF file and we’re done :-)