Querying Neo4j with Cypher
This article is the final installment of a three-part series on programming Neo4j with Java. Part 1, “Introduction to Neo4j,” presented an overview of Neo4j, discussing why you should use it, what problems it solves, and how to set up a local Neo4j environment. Part 2, “Programming Neo4j with Java,” reviewed two of the three primary interfaces for working with Neo4j in Java: the Core Java API and the Traversal API. This article addresses the thirdand most powerfulway to work with Neo4j: the Cypher query language.
Introduction to Cypher
The Cypher query language allows you to use human-readable text to describe how you want the traverser to retrieve your results. A traditional Cypher query begins with a pattern that is used to match starting nodes, and then a set of relationships the traverser will follow to find your results. You might also add matching conditions, filters, and so on to fine-tune your results.
Let's begin by using Cypher queries to find nodes in the user/movie example presented in the previous articles in this series. For our first example, let's find all movies that a user in our database has seen. As a reminder, the following snippet shows how we found users by using the Core Java API:
Node michael = graphDB.findNode( Labels.USER, "name", "Michael" ); // Find all of Michael's movies System.out.println( "Michael's movies: " ); for( Relationship relationship : michael.getRelationships( Direction.OUTGOING, RelationshipTypes.HAS_SEEN ) ) { System.out.println( ( String )relationship.getOtherNode( michael ).getProperty( "name" ) ); }
We found our starting node by searching for the node with the USER label and the name property of “Michael”; then we retrieved all of Michael's HAS_SEEN relationships and iterated over the results, displaying the movie names.
Here's how we found a user's movies by using the Traversal API:
Node michael = graphDB.findNode( Labels.USER, "name", "Michael" ); // Find all movies that Michael has seen TraversalDescription myMovies = graphDB.traversalDescription() .breadthFirst() .relationships( RelationshipTypes.HAS_SEEN ) .evaluator( Evaluators.atDepth( 1 ) ); traverser = myMovies.traverse( michael ); System.out.println( "Michael's movies: " ); for( Node movie : traverser.nodes() ) { System.out.println( "\t" + movie.getProperty( "name" ) ); }
Using the Traversal API, we created a TraversalDescription, through which we told the traverser to traverse the graph using a breadth-first algorithm and follow all HAS_SEEN relationships to a depth of 1 (immediate children of the starting node). We invoked the TraversalDescription's traverse() method, passing it the Michael node to create a traverser, and then we iterated over the results.
Let's review how we would accomplish the same objective by using a Cypher query:
System.out.println( "Michael's movies" ); try( Transaction txn = graphDB.beginTx(); Result results = graphDB.execute( "match( michael:USER { name: 'Michael' } )-[:HAS_SEEN]-(movie) return movie.name " ) ) { while( results.hasNext() ) { Map<String,Object> result = results.next(); System.out.println( "\t" + result.get( "movie.name" ) ); } txn.success(); }
The core aspect in this example is the Cypher query:
match( michael:USER { name: 'Michael' } )-[:HAS_SEEN]-(movie) return movie.name
A Cypher query usually begins with a match clause that tells the Cypher engine which node(s) to start searching. In this example, we want to find all nodes with the USER label and a name property of “Michael” (which is our Michael user). The match clause and the resultant nodes are enclosed within parentheses, and relationships are enclosed by square brackets. Relationships are named with a colon (:) prefix, and the direction is described using ASCII arrows, in the format shown in the following table.
Relationship Navigation |
Instruction |
()-[:RELATIONSHIP_NAME]-() |
Follow the specified relationship in both directions (inbound or outbound). |
()-[:RELATIONSHIP_NAME]->() |
Follow outbound relationships. |
()<-[:RELATIONSHIP_NAME]-() |
Follow inbound relationships. |
In this case, we start with the Michael node and follow both inbound and outbound HAS_SEEN relationships to a movie node. The Michael node is named “michael” so that we can access it later in the query, and the resultant node is named “movie”, which we access in the return statement.
The return statement specifies the values we want to returnin this case, just the value of the movie's name property.
We execute our Cypher query by invoking the GraphDatabaseService's execute() method to obtain a Results object. The Results object follows the next()/hasNext() pattern of iteration; each result is a Map<String,Object>. This map contains a mapping of the requested keys (movie.name in this case), and the resulting value from the node. Note that the result is an object; our node property values are not restricted to just String values, but can include primitive types and arrays.
Next, we iterate over the results and display the movie.name values:
Michael's movies Cinderella Big Hero 6
Now let's extend this example to find all movies that Michael's friends have seen:
match( michael:USER { name: 'Michael' } )-[:IS_FRIEND_OF]-(friend)-[:HAS_SEEN]->(movie) return movie.name
We start by matching the Michael node, as in the previous example, but this time we follow the IS_FRIEND_OF relationships (both incoming and outgoing) to a friend node and then follow that friend's outbound HAS_SEEN relationships to the movies he or she has seen. Because we're not using the “michael” or “friend” variables directly, we could write this query as follows:
match( :USER { name: 'Michael' } )-[:IS_FRIEND_OF]-()-[:HAS_SEEN]->(movie) return movie.name
We still need :USER to identify that the node has the USER label, but we can completely omit friend in the middle of the relationship. (I'm including the variable names for clarity.) The output for this query is the following:
Michael's friend's movies Cinderella Big Hero 6 Divergent Cinderella Big Hero 6 The Interview Cinderella Big Hero 6 Divergent
This query certainly finds all movies that Michael's friends have seen, but I'm not sure that's what we really wanted. Like the Core Java API and Traversal API concept of uniqueness, Cypher has the notion of distinct nodes. We can rewrite the query as follows:
match( michael:USER { name: 'Michael' } )-[:IS_FRIEND_OF]-(friend)-[:HAS_SEEN]->(movie) return DISTINCT movie.name
As with SQL queries, we can add the DISTINCT keyword to tell the Cypher engine that we want distinct (unique) results. Our new results are as follows:
Michael's friend's movies (distinct) Cinderella Big Hero 6 Divergent The Interview
Now let's sort our results with the ORDER BY clause:
match( michael:USER { name: 'Michael' } )-[:IS_FRIEND_OF]-(friend)-[:HAS_SEEN]->(movie) return DISTINCT movie.name ORDER BY movie.name
The ORDER BY clause tells the Cypher engine how to sort its results:
Michael's friend's movies (sorted by name) Big Hero 6 Cinderella Divergent The Interview
Likewise, you can sort in descending order by using the DESC keyword:
match( michael:USER { name: 'Michael' } )-[:IS_FRIEND_OF]-(friend)-[:HAS_SEEN]->(movie) return DISTINCT movie.name ORDER BY movie.name DESC
which yields the following output:
Michael's friend's movies (sorted by name descending) The Interview Divergent Cinderella Big Hero 6
Now let's try something a little more complicated: we'll find all movies that Michael's friends have seen, but Michael hasn't:
match( michael:USER { name: 'Michael' } )-[:IS_FRIEND_OF]-(friend)-[:HAS_SEEN]->(movie) WHERE NOT (michael)-[:HAS_SEEN]-(movie) return DISTINCT movie.name ORDER BY movie.name
In this example, we have the same match and return clauses, but we added a new WHERE clause. The WHERE clause allows us to specify the condition(s) that must remain true to keep the result in the result set; for instance, checking the value of properties (such as user.age > 18), or, as in this case, checking that the result is not in the specified sub-query. Here we've combined the WHERE cause with the NOT directive, in order to reverse the logic.
The sub-query in this example uses the michael variable from the match clause and follows all of his HAS_SEEN relationships to a movie node. It is important that the movie variable in this section matches the movie variable in the match clause so that Cypher knows where to make the comparison. Internally, Cypher finds all friends' movies and all of Michael's movies, and only movies that are not in Michael's movies will be returned in the result set:
Michael's friend's movies (that he hasn't seen) Divergent The Interview