Big Data Architectures – NoSQL Use Cases for Graph Databases
Again, regarding NoSQL databases: when would we use this? This post will be about the standard use cases for “graph databases,” building on my last blog post which was on “key value stores.”
Relational databases are used ubiquitously in organizations these days to solve all structured data storage problems; but, in fact, they may not be the optimal solution for some applications. I believe most of our relational database solutions are perfectly appropriate, unless you are having some particular problems. Then there are probably fabulous solutions in software or hardware that don’t require a change in database. Such a solution may be the use of Flash disk. Most of the NoSQL database use cases are in new areas that may not currently have relational database back ends anyway, if they even exist yet in your organization. So don’t panic! The NoSQL police will not come and take away your relational database.
Graph databases are very good for storing information about the relationships between things where the relationship between two items in the database is at least as important as the items themselves. Don’t confuse this strength with relational databases. Graph databases are very good for analyzing how closely things are related, how many steps are required to get from one point to another. Analyzing relationships between people in social media such as in LinkedIn, Facebook, and Twitter are typical use cases for graph databases. How many “degrees of separation” are there between two people?
In the analysis of relationships between people to discover terrorist cells, surely the NSA uses graph databases to infer relationships from phone records, social media interactions, and email correspondence. Sometimes entities in the analysis are not known, for example, the real person behind a Twitter handle or a phone number, but the relationships may be more important toward identifying groups that interact or how central an individual is to a group. In a less scary way, organizations may use a very similar approach to identify “influencers” in a group that they want to target as potential customers, and may find that acquiring the key influencers can drive the entire group to their products.
The analysis shown above is of Twitter relationships. It was created using Greenplum, a relational database especially good for high-volume analytics, but it visually demonstrates the idea of items (or people) and their relationships. This shows the Twitter relationships I had in 2012 with people who were also tweeting about Big Data. It displays how well-connected people were within that particular “community.”
Semantic analysis, or trying to understand the meaning of things, favors graph database solutions. Graph databases are frequently implemented using the triple store concept of object, predicate, and relationship, a basic concept in semantic analysis. Semantic technology may be a key enabler of Big Data solutions if the volumes of information are so great, and the formats so various, that the meanings must be automatically inferred rather than manually documented.
Routing, dispatch, and logistics type of applications are also implemented using graph databases. How can packages or service people be distributed most efficiently to most quickly respond to requests, pick up or distribute, or get from point A to B most swiftly?
Graph databases are not the best solutions for updating sets of data or, unlike most classes of NoSQL data store, for very large volumes of data. Keep the relational databases for complex and secure financial transactions.
Example graph databases include Neo4j, Infinite Graph, OrientDB, and FlockDB.