Am I worth nothing? Math says YES!

I am a scientist! And as a scientist, I always try to study things, understand them and invent something new, something that changes the world, makes it a bit better or makes something easier. Or maybe just discover some knowledge that leads other researches to invent something useful. We seek to understand the world we live in, and we want to contribute something.

Albert Einstein and friends of the past did tremendously valuable stuff that changed the whole direction of technical evolution in the last decades. And before, there have been other smart guys, inventing something game changing like how to make fire, or that a wheel is turning more easily than a square.

This fact lead me think of what my work is worth for humanity? But then I thought, humanity only exists since a very short time related to the age of the earth? To be more precise, earth exists since about 4.54 Billion (109) years whereas humans exist since roughly only 200.000 years. This picture illustrates the relation a bit more colourful.

Age of world and humans

It shows the age of our earth in blue and the time portion in which we exist in black. So even if I gonna invent something that is really cool, in the history of earth, it just disappears.

But wait, for how long does our planet exist? And isn’t it just one little tiny planet in the universe? The age of the universe is being estimated by calculating the time elapsed since the big bang, as this is the best we can guess. Which is 13.7 Billion years, nearly 3 times as old as our earth. And the universe has some more planets than the earth. So if I start asking myself what I am worth, I cannot ask what I am worth for the world, I better should ask, what is my value for general existence?

Ok, lets start some calculations. Image I would invent the most brilliant stuff that humans have ever seen, this is just affecting mankind and thus only

\frac{2 * 10^5}{13.7 * 10^9} = 0.0000146

of all time since now, which is a dimension of 10-5. This is not too bad, I can still believe, that earth and humans will exists for a few more billion years where they can profit from my discovery. But whatever happens, finally in 7.5 Billion years, sun will absorb earth.

But what happens if earth dies? What is the influence on existence at all. Scary, but the answer is: Absolutely nothing. There are about 1024 planets in the universe, so who cares if one of them dies? You do not want to believe that? On earth, there live about 1016 ants, this is not even one millionth of the estimated number of planets, and do you care when an ant dies?

This reduces my influence dramatically, namely by 1024, together with the time influence, we have an impact in the dimension of 10-29, which is very less:

\frac{1}{100,000,000,000,000,000,000,000,000,000}

Just to get a better feeling of this number, if one would print the letter ‘A’ 1029 times next to each other in Times New Roman at font size 12, it would be about 1015 times the way to the sun and back. Light would take about 100,000 years to travel the document of A letters. Assuming further, that it takes one byte to store a character, the text file of As would be so big, that the whole storage available on the world would by far not be able to save this file, there is not even a name for this dimension, it ends with Yottabyte which is 1024.

Well, the result of asking myself what my influence to the overall existence is, is not very motivating, isn’t it? What do you think?

To all you theoretical physicist and computer storage experts out there: Please feel free to correct my numbers and calculations!

Some references:

How to analyze a Neo4j graph via Blueprints interface using JUNG in Scala

This is a summary of the 3 part blog series I wrote for the www.clueda.com blog in the last month. Please find the original blog posts here:

http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-1/
http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-2/
http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-3/

Clueda AG

About

Graph databases attending more and more interest as they bring significant advantages for specific problems. Mostly, graph databases are used as storage for interactions like one can see for example in social networks. The biggest advantage of graph databases is presumably the easy query for friends and relations between nodes. Here at Clueda, we use graph databases to store our knowledge graph extracted from medical and financial news. Hence, our requirements for graph databases are a bit different, besides using the graph as storage, we also want to run extensive graph analysis algorithms on it in order to get best insights into the data. In this tutorial, we will learn how to apply the Java Universal Network/Graph (JUNG) Framework on an embedded Neo4j database. As we want to keep the database interface flexible, we will use the Blueprints interface to communicate with the graph database. Clueda loves Scala, but the examples showed can easily be transferred to plain Java.

Preliminaries

This tutorial concentrates on Blueprints, Neo4j and JUNG, especially on how to integrate them. Thus we assume you are familiar with Scala and SBT. If not, there are many good introductions to these topics, for example here:

http://www.scalatutorial.de/

http://www.scala-sbt.org/release/tutorial/Setup.html

In the tutorial, we will use IntelliJ IDEA as IDE on a Ubuntu machine. Please make sure that you have a Java SDK installed, as well as Scala and SBT running on your system.

You can download and install IntelliJ IDEA community edition for free from here: http://www.jetbrains.com/idea/download/

If you prefer developing with Eclipse, examples should be easy to adopt. You might want to skip the first chapter of this tutorial, which explains how to create a new Scala SBT project in IntelliJ IDEA 13.

The tutorial is split into 3 parts:

Part 1: Create a Scala SBT project
Part 2: Install Neo4j
Part 3: Install Blueprints Neo4j and use JUNG

1. Create a Scala SBT project

To get started, we first create a new Scala SBT project in IntelliJ IDEA. On the welcome screen, select “Create New Project

Create new Scala SBT project

Select project type “Scala” on the left side and “SBT” on the right side to create a project backed by SBT:

Select Scala SBT

Give the project a meaningful name, I will call it Neo4JBlueprintJung and select the Java Project SDK that is installed on your machine.
Create the project by clicking the “Finish” button:

Naming a project in IntelliJ IDEA

 

IntelliJ IDEA will open a new project for you. It can take a few seconds until the project is initialized completely and the “src” folder shows up. Right click on the “src/main/scala” folder and select “New” -> “Package” to create a new package:

Create new package

I am calling my package com.clueda”. Next, lets create a new Scala file and simply name it “Main”:

Add new Scala file

Please note to choose kind  to be “Object” in order to get an executable Scala object:

Name Scala file and choose kind

Time to execute our class for the first time in order to see that everything works as expected. Before we can run the class, we need to update it like this:

object Main extends App{
  println("Hello Clueda!")
}

Right click somewhere at the file and choose “Run Main”

Run Main scala object

The run dialog pops up at the bottom of IntelliJ IDEA and should print “Hello Clueda!

Next

In the next part of this tutorial, we will install and configure Neo4j, before we set up the Blueprints interface and analyze the graph in Part 3.

2. Install Neo4j

In order to install Neo4j, we first need to download it from the Neo4j website:

http://neo4j.com/download/

Neo4j comes in a free community edition as well as a paid enterprise edition. For this tutorial, we will go with the free edition.

At this point, we need to pay attention! As we want the Neo4j database be used from Scala via the Blueprints interface, we cannot just install the latest version. We need to check which is the latest version supported by the Neo4j Blueprints implementation. To do so, we check the maven repository for the latest release:

http://search.maven.org/#artifactdetails|com.tinkerpop.Blueprints|Blueprints-neo4j2-graph|2.5.0|jar

Please note that the correct repository for Neo4j versions higher than 2 is called neo4j2 instead of neo4j!

From the maven dependencies we see that the latest version supported for Neo4j is 2.0.1

Maven blueprints-neo4j2-graph repository

Thus, from the Neo4j website we download the latest version 2.0.X, whereas the latest version at the time writing this blog entry was 2.1.2!

Neo4j comes in a compressed archive. For installation, all you need to do is extracting the archive. As we will use Neo4j embedded in the Scala project, I will extract the files inside the project structure of our Neo4jBlueprintJung project.

Extract Neo4j to project

Since IntelliJ IDEA 13, there is an integrated Terminal which can be very useful. You will find it at the bottom left of the window. Of course, all the following can also be done from the system terminal directly.

We first gonna test if neo4j works properly by starting the database for the first time.

Inside the neo4j-community-2.0.4 folder, run this command from the terminal:

./bin/neo4j start

The console will print some logging and once its finished it will present you with the REST starting point at http://localhost:7474. If you open that address in a browser, you will see the Neo4j web interface, which is pretty handy for executing commands and visualizing nodes and relations of the database.  Please see the neo4j documentation for more information:

http://docs.neo4j.org/chunked/stable/tools-webadmin.html

Neo4j web interface

The database created is actually stored in a directory that can be configured inside the /conf/neo4j-server.properties file.

neo4j-server.properties file

From here we see, the database is stored in the neo4j-community-2.0.4 folder under directory root in file graph.db. We could easily change the directory here by replacing the path here.

You can stop the database by calling:

./bin/neo4j stop

Attention! Please make sure that the neo4j database server is NOT running when using it embedded from the Scala project! The server is locking the .db folder which leads to a connection problem when starting it from via Blueprints. If you want to use both in parallel, you need to run the server and connect to it using Rexter instead of embedding it!

OK great, Neo4j is installed and running! Next we need to integrate it into our Scala project.

Next

As we fulfilled all prerequisites and set up everything we need, its time to see the real magic, how we can use Neo4j via Blueprints and run graph analytics on it in the last Part 3.

3. Install Blueprints Neo4j

In order to connect to Neo4j via Blueprints, we first need to add the following dependency to our projects build.SBT file (this is copied from the maven repository as seen above):

libraryDependencies += "com.tinkerpop.blueprints" % "blueprints-neo4j2-graph" % "2.5.0"

After editing the build SBT file, IntelliJ IDEA will ask you to re-import the project. Press “Refresh”

Refresh sbt deoendencies in IntelliJ IDEA

On refreshing, IntelliJ IDEA is updating and downloading SBT dependencies, which may take some minutes. You can see the status in the status bar at the bottom of IntelliJ IDEA.

Connect to Graph Database

Once refreshing is finished, we can create a new Blueprints neo4jGraph by calling:

// get embedded blueprint neo4j graph
val graph:Neo4j2Graph = new Neo4j2Graph("neo4j-community-2.0.4/data/graph.db");

// close connection to graph, important!
graph.shutdown();

Note that the constructor takes a string of the path to the database file that should be used for the embedded version of the Neo4j database.
It is very important to shutdown the graph at the end of the program in order to release the lock on the graph.db file!

Further, we need to add the following import

import com.tinkerpop.Blueprints.impls.neo4j2.Neo4j2Graph

Run the project to see that everything works fine. It should compile and run without any errors.
If you get an error like this:

Exception in thread “main” java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /home/torsten/IdeaProjects/Neo4JBlueprintJung/neo4j-community-2.0.4/data/graph.db
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(Neo4j2Graph.java:163)
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(Neo4j2Graph.java:135)
    at com.clueda.Main$delayedInit$body.apply(Main.scala:17)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at com.clueda.Main$.main(Main.scala:13)
    at com.clueda.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /home/torsten/IdeaProjects/Neo4JBlueprintJung/neo4j-community-2.0.4/data/graph.db
    at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:330)
    at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:63)
    at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:92)
    at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:198)
    at com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph.<init>(Neo4j2Graph.java:153)
    … 16 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component ‘org.neo4j.kernel.StoreLockerLifecycleAdapter@30a6aae0’ was successfully initialized, but failed to start. Please see attached cause exception.
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:509)
    at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:115)
    at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:307)
    … 20 more
Caused by: org.neo4j.kernel.StoreLockException: Unable to obtain lock on store lock file: neo4j-community-2.0.4/data/graph.db/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)
    at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:82)
    at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:44)
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:503)
    … 22 more
Caused by: java.io.IOException: Unable to lock sun.nio.ch.FileChannelImpl@2be0c055
    at org.neo4j.kernel.impl.nioneo.store.FileLock.wrapFileChannelLock(FileLock.java:38)
    at org.neo4j.kernel.impl.nioneo.store.FileLock.getOsSpecificFileLock(FileLock.java:93)
    at org.neo4j.kernel.DefaultFileSystemAbstraction.tryLock(DefaultFileSystemAbstraction.java:89)
    at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:74)
    … 24 more

you probably forgot to stop the Neo4j server which is locking the graph.db file. Run

./bin/neo4j stop

from a terminal in the Neo4j directory and try running the Scala program again.

If you still get a lock error, you might not have closed the graph properly from the code. In this case you need to kill the process which is holding the lock. See lsof for help on how to find process locks on Unix based systems.

Add Data to Graph

So far, we successfully connected to a Neo4j graph database via Blueprints from Scala. It’s time to add some data to it. As we are using the Blueprints interface, this is pretty easy and straightforward.

First we create three nodes, which are called Vertex in Blueprints

val a: Vertex = graph.addVertex(null)
val b: Vertex = graph.addVertex(null)
val c: Vertex = graph.addVertex(null)

Lets give them names

a.setProperty("name", "Marge")
b.setProperty("name", "Homer")
c.setProperty("name", "Barney")

And finally add some edges between the nodes

val e:Edge = graph.addEdge(null, a, b, "knows")
e.setProperty("since", 1989)
val e2:Edge = graph.addEdge(null, b, c, "knows")
e2.setProperty("since", 1989)

Lets check how our graph looks like by starting the Neo4j server from terminal again and opening the web interface at localhost:7474 again. You can get an overview of all nodes by querying

MATCH (n) RETURN n

Graph in Neo4j web interface

Cool, we just created some nodes and edges in Neo4j using Scala with the Blueprints interface.

Of course, we can also fire Cypher queries at the graph directly without using Blueprints. To do so, we need to extract the raw Neo4j graph from the Neo4j Blueprints graph we just created. That is easily done by calling:

graph.getRawGraph

To query for example all nodes as we just did from the web interface, we could do something like:

val engine = new ExecutionEngine(graph.getRawGraph)
val result:ExecutionResult = engine.execute("MATCH (n) RETURN n")
val it = result.columnAs[Node]("n")
val lst = it.toList
result.close()
val tx:Transaction = graph.getRawGraph().beginTx()
try{
   for (node

and add imports

import org.neo4j.cypher.{ExecutionResult, ExecutionEngine} 
import org.neo4j.graphdb.{Transaction, Node}

The output will look like

Hello Clueda!
Node: Marge 
Node: Homer 
Node: Barney

First, a new ExecutionEngine is generated knowing the raw Neo4j graph. On this, we can execute a Cypher command and select a column from it. Using an iterator, we go through the results and extract the “name” property from the nodes and print it to the console. Please note that all communication has to be done inside a Transaction!

Install JUNG

Now as we have a “nice” graph in place, we are going to run some general graph analysis tasks on it. Blueprints comes with a nice approach for graph algorithms, called “Furnace”.

https://github.com/tinkerpop/furnace/wiki

Unfortunately, at the time writing this post, development has just started. I am really looking forward to see how this project develops in future. But fortunately there is a very good, stable and well developed graph algorithms library, the Java Universal Network/Graph Framework JUNG

http://jung.sourceforge.net/

And even more luckily, there is a little tweak on how we can use the JUNG framework to work on our Blueprints graph. First, we need to add another dependency to our build.SBT file

"com.tinkerpop.blueprints" % "blueprints-graph-jung" % "2.5.0"

IntelliJ IDEA may want to refresh the project again.
If we have a look a the corresponding maven repository, we see that what I called tweak, actually is a Blueprints wrapper of JUNG.
And thus, we can easily instantiate a new JungGraph form an existing Blueprints graph, in our case, the neo4j graph:

val jungGraph = new GraphJung(graph)

On the JUNG graph, we can thus easily perform graph algorithms. For example calculating the Dijkstra distance:

val dj:DijkstraDistance[Vertex,Edge] = new DijkstraDistance(jungGraph)

val distanceMargeHomer = dj.getDistance(graph.getVertex("0"), graph.getVertex("1"))
val distanceMargeBarney = dj.getDistance(graph.getVertex("0"), graph.getVertex("2"))

println(s"Distance between Marge and Homer: $distanceMargeHomer")
println(s"Distance between Marge and Barney: $distanceMargeBarney")

Note that we can use the original Neo4j Blueprints Graph (graph instead of jungGraph) to select the nodes, which is great!
Make sure you import

import edu.uci.ics.jung.algorithms.shortestpath.DijkstraDistance

From the output we see that Marge and Homer have distance 1.0, whereas Marge and Barney have distance 2.0 as they are connected via Homer.

Distance between Marge and Homer: 1.0 
Distance between Marge and Barney: 2.0

Summary

In this tutorial, we learned how to integrate a graph database into a Scala project using the Blueprints interface. The interface allows us, at least in theory, to replace Neo4j with any other graph database supporting Blueprints easily. First, we created a new Scala project using IntelliJ IDEA 13. We saw how to install, run and configure an embedded Neo4j database, inserted some simple data and queried it from the database. Finally, we used the JUNG framework to run graph analysis on our Blueprints graph database.

Graph databases are gaining more and more interest as they provide obvious ways to store data that is best represented as a network, such as social network data or company interactions. The hype just started and more and more projects are being initialized in order to provide extended functionality for graph databases, such as the Blueprints interface. Graph databases are mainly made for storage, but what brings the real benefit, at least from the view of a Big Data Scientist as I am, is the possibility to analyze the graph in order to get valuable insights and information out of nodes and relations. Being able to use the JUNG Framework is yet a great step and I am really looking forward to what is up to come in the next years!

Directions to go from here

Rexter

In this tutorial, we used an embedded version of the neo4j database. In many real word scenarios, the graph database is running on a separate server. In order to connect to this server, Blueprints interface provides Rexter, a graph server that exposos any Blueprints graph through REST and a binary protocol called RexPro.

https://github.com/tinkerpop/rexster/wiki

AnormCypther

Another very interesting project is AnormCypher

https://github.com/AnormCypher/AnormCypher

A Neo4j client library for the HTTP Cypher endpoints. During this tutorial, I explained how to use Cypher directly on the raw Neo4j graph and how to parse the results within a Transaction. When using Neo4J as a separate server, AnormCypher is taking care for all this communication and brings great advantage.

Files

build.sbt

name := "Neo4JBlueprintJung"

version := "1.0"

libraryDependencies ++= Seq(
  "com.tinkerpop.blueprints" % "blueprints-neo4j2-graph" % "2.5.0",
  "com.tinkerpop.blueprints" % "blueprints-graph-jung" % "2.5.0"
)

Main.scala

package com.clueda

import com.tinkerpop.blueprints.oupls.jung.GraphJung
import com.tinkerpop.blueprints.{Vertex, Edge}
import com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph
import edu.uci.ics.jung.algorithms.shortestpath.DijkstraDistance
import org.neo4j.cypher.{ExecutionResult, ExecutionEngine}
import org.neo4j.graphdb.{Transaction, Node}

/**
 * Created by Torsten on 09.07.14.
 */
object Main extends App{
  println("Hello Clueda!")

  // get embedded blueprint neo4j graph
  val graph:Neo4j2Graph = new Neo4j2Graph("neo4j-community-2.0.4/data/graph.db");

  val engine = new ExecutionEngine(graph.getRawGraph)
  val result:ExecutionResult = engine.execute("MATCH (n) RETURN n")
  val it = result.columnAs[Node]("n")
  val lst = it.toList
  result.close()
  val tx:Transaction = graph.getRawGraph().beginTx()
  try{
    for (node <- lst) yield println("Node: " + node.getProperty("name"))
    tx.success();
  }

  val jungGraph = new GraphJung(graph)

  val dj:DijkstraDistance[Vertex,Edge] = new DijkstraDistance(jungGraph)

  val distanceMargeHomer = dj.getDistance(graph.getVertex("0"), graph.getVertex("1"))
  val distanceMargeBarney = dj.getDistance(graph.getVertex("0"), graph.getVertex("2"))

  println(s"Distance between Marge and Homer: $distanceMargeHomer")
  println(s"Distance between Marge and Barney: $distanceMargeBarney")

  // uncomment the following lines to add data to the graph database
//  val a: Vertex = graph.addVertex(null)
//  val b: Vertex = graph.addVertex(null)
//  val c: Vertex = graph.addVertex(null)
//
//  a.setProperty("name", "Marge")
//  b.setProperty("name", "Homer")
//  c.setProperty("name", "Barney")
//
//  val e:Edge = graph.addEdge(null, a, b, "knows")
//  e.setProperty("since", 1989)
//  val e2:Edge = graph.addEdge(null, b, c, "knows")
//  e2.setProperty("since", 1989)

  // close connection to graph, important!
  graph.shutdown();
} 

Downloads

Please also find the complete project as a zip compressed file at Clueda’s Website:

http://www.clueda.com/blog/how-to-analyze-a-neo4j-graph-via-blueprints-using-jung-in-scala-part-3/

Import data to dotplot designer

dotplot designer is a new statistical cloud analytic tool based on R which not only offers visualization to common R functions, but also provides a huge repository of ready made functions for different statistical tasks.

Although dotplot is working on improving the usability of their products, it is not very obvious on how to import your own data for inexperienced users.

To learn how to import your own data, there is a new tutorial video:

Note that the import wizard works in a similar manner for Excel, SPSS or WEKA’s arff files.

Remember that their is a basic distinction between your local file system and the cloud file system of dotplot. All files you want to use within dotplot designer need to be uploaded to the cloud first. Also, files generated by dotplot are stored in the cloud as well, but can be downloaded to your local file system.

Once your data is imported, you can used it with any function provided in the function tree.

Hint: Try getting started with the Summary function of group ‘Exploration‘ to get a quick overview of your data. A well rendered output can be seen by double clicking its ‘Report’ output node.

 

Statistics made easy using dotplot designer

Introduction

There are many reasons why people are using statistics software, verifying results for scientific papers, generating business reports or trying to get insight into data to gather new knowledge. Although statistics is an engineering field of its own, it is required to be used in nearly all other areas. Thus, there is a very high demand for proper statistics software and therefore a bunch of these software tools shows up over the last years including the nowadays most popular IBM SPSS and SAS desktop applications that are widely used for enterprise necessities. On the other hand, probably the most popular open source software is R, which indeed is not really an application but rather a programming language for statistical analysis. While SPSS and SAS are very expensive and not easily affordable for personal users, small organisations or university departments, R is rather hard to understand and learn for non computer scientist or programmers. Further, all three and also most other big players are limited to the computational power of the users local machine.

There is a need for a more modern, easy to learn, powerful and cheap software, that’s why dotplot introduced the ‘dotplot designer‘ in 2013, a cloud analytics software that is free for personal usage.

Dotplot

In this post, I want to introduce the basics of dotplot designer to give an easy access to the software. First, let’s have a quick look at the benefits and why its worth giving it a try:

  • Data Analysis Modelling: There is an easy understandable graphical user interface where the statistical process is is modelled by a flow diagram
  • Cloud Analytics: The power of theoretically endless CPU and storage
  • Accessible from everywhere and on any OS, not only on your local Computer
  • It’s free
  • Huge amount of functionality
  • Predefined solutions: Examples managed by dotplot experts and the community

Once you’ve registered, the designer can be started from the top right of the website by pressing the ‘Launch Designer’ button. Once the designer load completely, which may take quite a while, there is a welcome dialogue where you can enter the application in different ways, loading an existing project, opening a solution etc. Whatever you choose, you will be faced with the common user interface which should look something like (the software is pretty new and the design is updated from time to time):

dp1

We can roughly divide the interface into 4 parts:

dp2

 

Part 1

The toolbar is located at the top and allows to open, save or create new projects. Further, the reporter can be opened and access is given to solutions and to the help system.

Part 2

All functions that can be used for modelling are located in the so called function tree or function repository. The tree is expandable and organized to model a common data analysis workflow. Starting with data IO and exploration of the data, followed by preprocessing steps and analysis. Further, functions to create plots and graphs can be found in group ‘Visualisation’.

Part 3

The middle of the screen shows open projects in different tabs. Each project runs in its own environment and thus, several projects can be used in parallel.

Part 4

The right side of the user interface shows project specific settings, configuration values for selected function cells and their results.

Having a closer look at the function tree, one can see that dotplot addresses a wide group of users. The first groups of the tree are ready made functions that can be used easily also by unexperienced users. We will see how easy these functions can be used in one of my next posts. The last group ‘Packages‘ is especially made for R experienced users and provides visualisation for R packages. R functions can be used in exactly the same way as in R, but within a graphical modeller.

Getting started

To get started with dotplot designer, it is recommended to make the interactive tutorial that can be started directly from the welcome dialogue or from the help menu in the top toolbar.

But generally, using dotplot designer is pretty easy and straight forward. Functions can be add to the project canvas by either a double click or by drag and drop. For a first test of the application, I suggest using one of the datasets provided by dotplot. You will find them in  the ‘Data Management / Data Repository’ group of the function tree:

Screen Shot 2014-01-01 at 21.01.22

 

Functions typically have inputs and outputs to plunge in data, do some processing on it and providing the generated results as output nodes. To plot a histogram for example, we add the iris dataset and connect the output of the cell to the input of the histogram function. Some functions require specific parameters to be set in the function configurations in part 4 of the interface. The generated model could look like this:

Screen Shot 2014-01-01 at 21.10.19The resulting plot can be seen in the result window on the bottom right of the screen. Or by double clicking the Histogram’s output node.

Screen Shot 2014-01-01 at 21.11.57

 

And here you go, that’s all you need to do. Drop a dataset and connect it to some statistic functions. Of course, there are much more functions and statistic or data analysis tasks that require a more complex model. But I hope that little introduction gave a good overview of how dotplot designer works and why its worth trying it. Have fun with it!

 

Howto Write a Thesis using LaTeX: Custom commands

The tweaks presented in this post are in my opinion the most useful ones. When writing a thesis, there are typically a few expression and words that appear at many many places in across the document. I wrote my thesis about a slime mold called Physarum Polycephalum and I can’t count how often I needed to write this name in my thesis. Lazy as I am, I didn’t want to write this nasty word each time and also I wanted to avoid misspelling of it.

Fortunately, LaTeX gives the opportunity to define commands by yourself. This means, you can define a new command which is inserting a self defined string and use it across your document. To do so, simply write:

\newcommand{\phys}{\textit{Physarum Polycephalum}}

The expression in the first curly braces defines the command name, I called it \phys, and the expression in the second curly braces is defining the command to be executed. Here I want the string Physarum Polycephalum to be printed in italic style. So far so good, but where exactly is the benefit?

At any place in your document where you want to insert Physarum Polycephalum, no matter if it within a text section or a caption of an image, you simple write

 My topic is about \phys{}, which is a slime mold.

Please note the the curly braces after the command are very necessary in order to get LaTeX manage white spaces after the inserted text correctly!

Despite from saving time and reducing misspellings, there is another great benefit of using this technique. It allows you to easily change these word at only one position. Image I finished writing my whole thesis and after that, my Prof tells me that polycephalum should be written in lower case, I am unbelievably thankful that I do NOT need to go through my hundreds of pages and search for positions where I need to replace it. I only need to change it at one simple position!

By the way, one typical command I use in all my documents is

\newcommand{\etal}{\textit{et.~al.}}

Another trick that is somehow related to this topic, turned out to be very useful for me while writing the thesis. I defined three little commands and used them to give nodes to myself in the document:

\newcommand{\note}[1]{\color{red}(#1!)\color{black}}
\newcommand{\missref}{\note{[REF]}}
\newcommand{\todo}[1]{\textcolor{blue}{[Todo: #1]}}

The first command inserts a note for me in red color, so that I cannot miss the note on reviewing the document. The second one is used for indicating that there is a reference missing and that I need to insert it later on. But please, I really recommend to insert references on the fly! Believe me, after weeks or month of writing your thesis, the very last thing you want to do is inserting 100+ references before submitting your thesis. I used the third command to remind myself that there is something left to do. Have a look at the screenshot of the document using these commands, isn’t that really helpful for reviewing?

GlobalCommands

The command of this text part is looking like

Sch\"on \etal{} \missref{} have been scientifically proven that the 
psychological stress during writing your PhD thesis leaves a disorder 
from that you never will retire completely \note{Better double check 
that!}. 
\todo{Search for and include some examples here, maybe a graphic!}

Howto Write a Thesis using LaTeX: Algorithms

People writing their thesis, paper or any other report with LaTeX tend to have some connection to programming. Thus, it is not uncommon to include some algorithms in the document. I prefer using the algorithmic package for providing source code. To include the package, add the following lines to the preamble in file Thesis.tex.

\usepackage{algorithm}
\usepackage{algorithmic}

Inserting an algorithm is then pretty straight forward, you only need to know the algorithmic specific commands, for a description, see for example http://en.wikibooks.org/wiki/LaTeX/Algorithms.

\begin{algorithm}[ht]
\caption{See how easy it is to provide algorithms}
\label{myFirstAlgorithm}
\begin{algorithmic}
\REQUIRE $a$
\STATE $b = 0$
\STATE $x \leftarrow 1:10$
\FORALL{x}
	\STATE $b = b+a$
\ENDFOR
\RETURN $b$
\end{algorithmic}
\end{algorithm}

Algorithms can be referenced like an image, by calling:

??

If you like to include a list of algorithms at the end of your thesis, simply write

\listofalgorithms

at the position you want the list to appear.

Howto Write a Thesis using LaTeX: Export Scalable Graphs from Excel to Inkscape

The trick with creating nice looking and scalable graphs is the same as discussed in Howto Write a Thesis using LaTeX: Generate Resolution Independent Figures Using Inkscape. Instead of saving the graph as ‘common’ image, meaning as bitmap, jpeg or png, graphs should also be generated as vector graphics.

Unfortunately, Excel does not provide an obvious way to export graphs as vector graphics (at least, I do not know any). But, fortunately, there is a little workaround which I will describe in this little tutorial.

First, assume we have some data present in an Excel sheet and generated any plot from the data. In the example below, I put in some data from the top of my head (without any scientific verification!) on how probable it is to went totally crazy, formatting your hard drive, destroy all your lab equipment and move to any abbey in Tibet for the rest of your live dependent on the amount of time you spend already to prepare your thesis.
PFreakingOut

To export this nice and somehow scary graph from Excel and to import it in our LaTeX document, we first need to select the graph by clicking on it, you should see a border around the figure. Note that it is important to select the figure only as we just want to export the graph and not the whole document!

Next, we select File->Save As … from the context menu and save it as PDF! See the screenshot below (Sorry, my installation of Excel is German)

PFreakingOutSaveAsPDF

That’s all that needs to be done in Excel. Actually, we would be easily able to import the PDF just generated into our LaTeX document, but there is some beautification needed which is best to be done with Inkscape.

To do so, we first open Inkscape and select File->Open (do NOT use Import!) from the context menu. Browse to the recently generated PDF file and press the Open button.
OpenGraphInInkscape

It might take a few seconds until the next dialogue should pop up. Do not touch any configuration at the import dialogue, simply press OK. The graph may be placed like that:
GraphImported

Note that the imported graph behaves like it would have been created with Inkscape meaning that you can easily click elements, remove or edit them. I personally do not like the grey border around the graph, so I remove it by clicking the border and removing it. You might want to change the colour, axis titles, font size or whatever…

Next, we go to File->Document Properties… to set the page size to fit the graph with some spacing of 10px.
Inkscape_resize_to_content

Last, we save the plot as PDF file by selecting File->Save as from the context menu.
Remind that you need to select PDF as file type. That’s it, we just created a resolution independent scalable graph with Excel and Inkscape. Of course, this PDF can be included in our LaTeX document in the same way as any other image, namely by writing:

A scary and maybe exaggerated Graph is shown in Figure ??.
\begin{figure}[ht]
	\centering
    \includegraphics[width=0.5\textwidth]{fig/FreakingOut.pdf}
    \caption[Short caption]{Detailed caption}
    \label{fig:imageGraph}
\end{figure}

Also, the graphicx package should be added to the preamble of the Thesis.tex file:

\usepackage{graphicx}

Howto Write a Thesis using LaTeX: Excel to LaTeX Table

Generating nice tables in plain LaTeX can be really annoying as it is very hard to get an overview of columns and rows for raw text. One possibility would be to use a WYSIWYG editor that comes with many development environments.

You can find a nice tool for generating tables in TexMaker by selecting Quick Tabular from the Wizard menu item.
TabularWizard

But, in most cases, we do not want to insert tables manually. Instead, most of the data already exists in any other program and we would like to generate a table from our existing data. Also, there are much more specialised application for editing and generating tables, one and probably the most common one is Microsoft Excel.

Fortunately, there is a great tool that let’s you export your existing Excel table to LaTeX code, called  Excel2LaTeX!

1. Install Software

Go to http://www.ctan.org/tex-archive/support/excel2latex/ and download the latest Excel2LaTeX.xla file. Next, open the file with Excel.
You might get asked if you want to activate Makros and this is a potential security issue. As we know what we are about to do, we accept Makros. And that’s it, the add on is already installed!

You will find two additional buttons in the Add-Ins tab of your Excel Toolbar.
Excel2LaTeX_buttons

Note that the Add-Ins tab needs to be activated for some Excel versions separately!

Now comes the easy part, select the area of your table you want to export to LaTeX and click the Convert table to LaTeX button. The following dialog pops up:
ExportToLatex

Click either the Copy to Clipboard button to copy the LaTeX text or save it to a file by choosing Save to File:.
For some reason, copying the text snippet did not work for me on Windows 8, so I had to copy it manually!

Next, as we have the table as LaTeX code in our Clipboard, we only need to paste it to our LaTeX file. Navigate to the position where you want to insert the table in your TexMaker file and paste the content. Note that you might need to load the following packages in the preamble depending on how fancy your table is styled:

\usepackage{booktabs}
\usepackage{color}

The generated code for the example table looks like:

% Table generated by Excel2LaTeX from sheet 'Tabelle1'
\begin{table}[htbp]
  \centering
  \caption{Add caption}
    \begin{tabular}{rrr}
    \toprule
    \multicolumn{1}{c}{\textbf{Name}} & \multicolumn{1}{c}{\textbf{Age}} & \multicolumn{1}{c}{\textbf{Score}} \\
    \midrule
    Maria & 23    & 1 \\
    Thomas & 21    & 0.78 \\
    \textit{Alicia} & 19    & 0.27 \\
    Mark  & 31    & 0.45 \\
          &       &  \\
    \bottomrule
    \end{tabular}%
  \label{tab:addlabel}%
\end{table}%

Note that you also might want to update the table caption:
insert_table

Cool, we just inserted a table from Excel to our LaTeX document! Wasn’t that much easier than typing it yourself?

Please also note, that the newly inserted table is automatically listed in the list of tables we inserted at the end of the document.

Howto write a thesis using LaTeX, Part 3: Tips and Tricks

We’ve already seen in part 1 and part 2 of this tutorial, how to install and set up software components and howto organize folders and files. Also, we have created a basic structure of our thesis. In this final part 3, I will give you some tips and tricks that made my live easier while creating my PhD thesis.

To be easier found by search engines, I’m gonna create a separate post for each single tip, trick or hack, whatever name you prefer. But, in order to preserve the scope of this tutorial, we will see how to include the generated stuff into our test thesis.

So, here you go, a list of tips, tricks and hacks helping you with your thesis:

Assume you have generated an image like described in the link above, you can use the following code to include it to your thesis:

An example figure is shown in Figure ??.
\begin{figure}[ht]
	\centering
    \includegraphics[width=0.5\textwidth]{fig/image.pdf}
    \caption[Short caption]{Detailed caption}
    \label{fig:image}
\end{figure}

And of course, we need to import the graphicx package in the preamble of the Thesis.tex file:

\usepackage{graphicx}

Source files generated within this tutorial can be download here:
Thesis Template

If you are a student of the faculty “Biologie und Vorklinische Medizin der Universität Regensburg” in Germany, you can also download the predefined title page:
UR-Titlepage
Thesis Template with UR title page included

I hope you enjoyed reading this tutorial. Now you are all set, you are prepared to write a thesis so fancy that you supervisor is forced to give you the best mark available without reading even a single word of you’ve written. I’m just kidding, of course, content is most important, but at least, you do not have to figure out every problem with LaTeX on your own and you can invest your time on writing text.

If you like this tutorial, please recommend it and leave a command, if you don’t like, tell me what you want me to improve or just don’t tell anybody 😉