Introduction
In the earlier article titled Data Schema of Relational Table Importing to Graph Database – Dimensionality, we explored how to observe real-world data and transform it into a digitised format within a Relational Database. The greater the dimensions we account for, the higher the fidelity with which the Relational Database represents reality.
In this article, we are going to transform the data stored in the Relational Database , to the Graph Database.
Why to transform data from Relational Database to Graph Database
While of course there are many ways for us to directly input the data into the Graph Database, there are still day to day scenario which we should input data into Relational Database before and transform them into Graph Database:
Data already recorded in the Relational Database.
Relational Database is much more popular than Graph Database. Most of the systems in most of the companies stored their data in Relational Database or in tabular format.
Common Form as the Input Interface
While we can use Cypher to input the data directly into Graph Database, the user needs to go through a long learning curve before they can master the new computer language Cypher. To provide a commonly used Form-style Input interface for the user for their CRUD activities will encourage the user to engage the system.
Problem Pattern – Solution – Data Schema Trio
The next question you may ask is : So why do we need to use Graph Database instead of Relational Database?
To answer this question, I will prefer to write in a trio of (1) Relational Database Problem Pattern – (2) Graph Database Solution – (3) Graph Database Data Schema such that all the 3 pieces of information will be interrelated, even though it may not map the content with the title of (1),(2) and (3) explicitly.
Relational Database Problem Pattern – Lack of Recursive Relationships
Table Citizen
Citizen# | First Name | Last Name |
---|---|---|
1001 | Barbie | Stereotypical |
1002 | Barbie | Weird |
1003 | Kenneth | Carson |
While table Citizen
is a typical relational data table which perfectly records the information (i.e. the 3 properties) of the 3 people (i.e. there are 3 records), can you imagine how to record the relationships among the records inside the same table?
For example, what if I want to record the facts:
- Feb : Stereotypical Barbie in relationship with Kenneth
- March : Stereotypical Barbie broke up with Kenneth
- April : Weird Barbie in relationship with Kenneth
- May : Stereotypical Barbie also reunion with Kenneth at the same time
May be you have throught of to append new columns is...of
and Target
at the end of the table Citizen
as below:
Citizen# | First Name | Last Name | is….of | Target | Date |
---|---|---|---|---|---|
1001 | Barbie | Stereotypical | Girl friend | Kenneth | Feb |
1002 | Barbie | Weird | Girl friend | Kenneth | April |
1003 | Kenneth | Carson | Boy friend | Stereotypical | Feb |
You will immediately realize that you cannot record both fact#1 and #2 at the same time. If you record #1 in Feb and modify it to #2 in March, you will miss the historical record of their relationship.
On top of that , you will also need to modify both Record#1001 and #1003 at the same time after they broke up as both #1001 and #1003 are in fact describing the same fact in different directions. (i.e. Stereotypical is Girl friend of Kenneth, and Kenneth is the Boy friend of Stereotypical), which we called this update anomaly in relational database.
Besides, also we cannot record both fact #3 and #4 at the same time because there is only 1 value able to be recorded in is...of
Column.
It seems that a normal tabular table cannot handle the fact descripting the relationship among different records inside the same table. This data pattern is regarded as Recursive Relationship.
In order to remedy the shortcoming, a new table, which is classifie as a Bridge Table, Love-Relationship
is necessary to be created to record the recursive relationships among the records in the same table Citizen
as below:
Love-Relationship# | Date | Subject | is….of | Object |
---|---|---|---|---|
2001 | Feb | Stereotypical Barbie | Girl friend | Kenneth |
2002 | Feb | Kenneth | Boy friend | Stereotypical Barbie |
2003 | March | Stereotypical Barbie | Ex-Girl Friend | Kenneth |
2004 | March | Kenneth | Ex-Boy friend | Stereotypical Barbie |
2005 | April | Weird Barbie | Girl friend | Kenneth |
2006 | April | Kenneth | Boy friend | Weird Barbie |
2007 | May | Stereotypical Barbie | Girl friend | Kenneth |
2008 | May | Kenneth | Boy friend | Stereotypical Barbie |
2009 | May | Stereotypical Barbie | Rival | Weird Barbie |
Graph Database Solution
In Figure 1 which is built by a Graph Database, you can cleary and easily address all the facts #1,#2,#3,#4 mentioned previously. All fidelity is preserved via Graph Database.

You may realize that some of the relationships inside the Graph Database may be duplicated or redundant. For example, there is no need to record both directions of the Rival relationship between the 2 Barbies. While we call that Direction in Graph Database, this is not our focus in this article and we will leave the discussion in the later session of this paragraph.
Graph Database Data Schema
Below is teh Graph Database Data Schema in Cypher, which creates the graph shown in Figure 1 :
// Figure 1 - Create nodes with Label: Citizen
CREATE (:Citizen {firstName: "Stereotypical", lastName: "Barbie"});
CREATE (:Citizen {firstName: "Weird", lastName: "Barbie"});
CREATE (:Citizen {firstName: "Kenneth", lastName: "Carson"});
// Relationships for February
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
(kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Girlfriend_of {month: "Feb"}]->(kenneth),
(kenneth)-[:Boyfriend_of {month: "Feb"}]->(stereotypical);
// Relationships for March
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
(kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Ex_Girlfriend_of {month: "March"}]->(kenneth),
(kenneth)-[:Ex_Boyfriend_of {month: "March"}]->(stereotypical);
// Relationships for April
MATCH (weird:Citizen {firstName: "Weird", lastName: "Barbie"}),
(kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (weird)-[:Girlfriend_of {month: "April"}]->(kenneth),
(kenneth)-[:Boyfriend_of {month: "April"}]->(weird);
// Relationships for May
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
(weird:Citizen {firstName: "Weird", lastName: "Barbie"}),
(kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Girlfriend_of {month: "May"}]->(kenneth),
(kenneth)-[:Boyfriend_of {month: "May"}]->(stereotypical),
(stereotypical)-[:Rival_of {month: "May"}]->(weird);
The comparison between of the Dimensionality of the Data Schema between Relational Database and Graph Database are as below:
Dimension | Relational Database Data Schema | Graph Database Data Schema |
---|---|---|
1-D | Attribute (Column) | Node Properties |
2-D | Records (Row) | Node |
3-D | Table | Label |
4-D | Bridge Table | Type (i.e. Edge) |
5-D | Attribute in Bridge Table | Type Properties |
The Graph Database perfectly caters the recursive relationships between different records inside the same Table in Relational Database.
Relational Database Problem Pattern – Data Duplication
In fact, when we created a new Bridge Table, Love Relationship
Table in this case, you will find that both the names of Stereotypical Barbie
, Weird Barbie
and Kenneth Carson
had been shown up more than once inside the Love Relationship
Bridge Table, as well as duplicating with the records inside the Citizen
Table. (e..g you can find Stereotypical Barbie in both Love Relationship
Table and Citizen
Table)
This data duplication made the description about the reality lose its fidelity that while the database records Stereotypical Barbie
(and all other people) more than once, in reality there is only one Stereotypical Barbie
. There is discrepancy between the records (i.e. the Model) and the reality.
The Graph Database , on the contrary , records Stereotypical Barbie
once, which precisely describes the fact that there is one and only one Stereotypical Barbie
in reality.
Relational Database Problem Pattern – Lack of Functional Dependency
Recall in the previous paragraph that a Recursive Relationship (or self-referential relationship) is the relationship betweens any of the 2 (or more) records inside the same Relational Table.
If Recursive Relationship is describing the vertical dimension of a relationship, (i.e. whenever you add a new record in a table, the length of the table will be extended vertically.) Functional Dependency , on the contrary, is describing the horizontal dimension of the relationship. (i.e. whenever you add a new column (i.e. attribute) in a Table, the length of that Table will be extended horizontally).
Functional Dependency is referring to a specific column (attribute) in a table ,is dependent on another column (attribute) in the same Table.
Let’s illustrate the concept Functional Dependency with the example table Citizen
in below:
Citizen# | First Name | Last Name | Gender |
---|---|---|---|
1001 | Barbie | Stereotypical | F |
1002 | Baribe | Weird | F |
1003 | Kenneth | Carson | M |
With common sense, we can infer by the attribute First Name
that Baribe
should be a Female, while Kenneth
should be a Male. We can say that the attribute Gender
is dependent on the attribute First Name
. (regardless of their Last Name
, of course).
This kind of dependency is called Functional Dependency.
Thanks to the transformation of SQL, the 2 most popular Relational Databases, MySQL and MariaDB , started supporting the SQL Keyword CHECK
after the version 8.0.16 and 10.3.10 respectively. By using SQL Keyword CHECK
, we can apply the functional dependency by adding the CONSTRAINT
in the SQL statement as Figure 2 in below:
// Figure 2 - Create Relational Table and associated Constraints
CREATE TABLE Citizen (
Citizen# INT PRIMARY KEY,
First_Name VARCHAR(50),
Last_Name VARCHAR(50),
Gender CHAR(1),
CONSTRAINT chk_gender_ken CHECK (First_Name = 'Ken' AND Gender = 'M' OR First_Name <> 'Ken'),
CONSTRAINT chk_gender_barbie CHECK (First_Name = 'Barbie' AND Gender = 'F' OR First_Name <> 'Barbie')
);
The compatibility of the SQL Keyword CHECK
is in fact a good move and big move in relational database world which makes our coding life much easier, until you realize that you have to hard code the constraints (i.e. the rules) into the SQL.
What if there are 10,000 known First Names in the world and I want to turn them all to the constraint?
Obviously it is extremely hard , if not impossible , for any programmer to hard code the constraint into the SQL statement, not to mention these 10,000 additional SQL CONSTRAINT
statements will significantly drag down the performance of the query.
Moreover, whenever an end-user of the system discovers a new First Name and wants to add it into the CONSTRAINT
fleet, there is no way for the end user to insert the new CONSTRAINT
as you will not expect him/her to write the SQL statement himself / herself. The Extensibility of the system is suck.
In order to cater the extensibility problem, how about if we create an additional lookup Table FirstNameGenderRule
to restore all the rules as below:
Rules# | First Name | Gender |
---|---|---|
3001 | Barbie | F |
3002 | Baribe | F |
3003 | Kenneth | M |
In this sense, every time before a new record is inserted into the Table Citizen
, a constraint lookup to the Table FirstNameGenderRule
will be triggered in order to validate the value in the Gender of the record. Whenever a new First Name
is found, the end-user can append a new record in this FirstNameGenderRule
Table to a new rule via the user Form.
While this method makes perfect sense and served the functional dependency as well as solved the extensibility problem of the system, the nature of this FirstNameGenderRule
is similar to the Bridge Table we have mentioned previously in this article, which the data redundancy is happened again due to the fact that the value of both First Name
and Gender
stored twice in both Table Citizen
and FirstNameGenderRule
.
Meanwhile, in the Graph Database, in order to cater both the functional dependency objective and cater the extensibility problem of the system, we come up a Cypher solution in Figure 3 below:
// Figure 3 - Create Rule Node
CREATE (:Rule {FirstName: 'Ken', Gender: 'Male'}),
(:Rule {FirstName: 'Barbie', Gender: 'Female'}),
(:Rule {FirstName: 'Sam', Gender: 'Non-Binary'});
// Figure 3 - Validation
MATCH (c:Citizen)
OPTIONAL MATCH (r:Rule {FirstName: c.FirstName})
WHERE c.Gender <> r.Gender
RETURN c.CitizenID AS ViolatingCitizenID, c.FirstName AS FirstName, c.Gender AS CitizenGender, r.Gender AS ExpectedGender;
Based on the 3 Nodes (i.e. 3 records) under the Citizen Label we have already created in Figure 1, simply create 2 Nodes for the 2 constraint rules. You can of course put these 2 Nodes under the Label FNGenderRule
to categorize the rule Nodes
.
In the future, whenever a new First Name is found, we can simply add a new Node under the FNGenderRule
Label and that is!
Once you run the WHERE
clause in Figure 3, all the unvalid entry will be filtered out. (and modify automatically by using the SET keyword if you want. But i will skip this part.)
Unfortunately , when we look closely at the newly created FNGenderRule
Node, we may realize that in fact the property of First Name
and Gender
still exists and is duplicated with that of the columns in Citizen
. We cannot fix the data redundancy in Graph Database either.
Maybe you think crazily enough like me to externalize every single properties to become a node. Nevertheless, even though you can write the Cypher to execute like we have done in Figure 4 below:

While in the first insight it sounds we have served the functional dependency and extensibility requirements without sacrificing the data redundancy, in fact we have just created another worm-hole.
The question we should ask : how do we interpret from Figure 4 that {Kenneth, Carson, Male} is in fact the property of a Citizen
?
Unfortunately, there is no way for us to link up the Nodes Kenneth
, Carson
and Male
. Even though if you can, you may spend much more time than the benefit you gaining from solving the data redundancy problem.
And therefore, you keep optimizing your data model by using the Citizen#
as the Label of each leaf Nodes such that you can filter out a specific Citizen
based on the Citizen#
inside the Label like Figure 5 in below:

While technically and theoretically it is feasible, before you model your data in this way, think about what if there are 1,000,000 citizens, and what if you need to update/delete/modify the value of a record? And what if Kenneth changed his name from Kenneth to Ken? You need to firstly add a new Node Ken and then delete the Node Kenneth.
This operation is just not worth compared with the benefit brought from the data redundancy.
Leave a Reply