Relation Database Importing Data To Graph Database

Data Schema of Relational Table Importing to Graph Database – Execution

Jan Tang Avatar

( modified at

) by

in category

Introduction

In the earlier article titled Data Schema of Relational Table Importing to Graph Database – Dimensionality, we explored how to observe real-world data and transform it into a digitised format within a Relational Database. The greater the dimensions we account for, the higher the fidelity with which the Relational Database represents reality.

In this article, we are going to transform the data stored in the Relational Database , to the Graph Database.


Why to transform data from Relational Database to Graph Database

While of course there are many ways for us to directly input the data into the Graph Database, there are still day to day scenario which we should input data into Relational Database before and transform them into Graph Database:

Data already recorded in the Relational Database.

Relational Database is much more popular than Graph Database. Most of the systems in most of the companies stored their data in Relational Database or in tabular format. 

Common Form as the Input Interface 

While we can use Cypher to input the data directly into Graph Database, the user needs to go through a long learning curve before they can master the new computer language Cypher. To provide a commonly used Form-style Input interface for the user for their CRUD activities will encourage the user to engage the system. 


Problem Pattern – Solution – Data Schema Trio

The next question you may ask is : So why do we need to use Graph Database instead of Relational Database?

To answer this question, I will prefer to write in a trio of (1) Relational Database Problem Pattern – (2) Graph Database Solution – (3) Graph Database Data Schema such that all the 3 pieces of information will be interrelated, even though it may not map the content with the title of (1),(2) and (3) explicitly.


Relational Database Problem Pattern – Lack of Recursive Relationships

Table Citizen

Citizen#First NameLast Name
1001BarbieStereotypical
1002BarbieWeird
1003KennethCarson

While table Citizen is a typical relational data table which perfectly records the information (i.e. the 3 properties) of the 3 people (i.e. there are 3 records), can you imagine how to record the relationships among the records inside the same table?

For example, what if I want to record the facts:

  1. Feb : Stereotypical Barbie in relationship with Kenneth
  2. March : Stereotypical Barbie broke up with Kenneth
  3. April : Weird Barbie in relationship with Kenneth
  4. May : Stereotypical Barbie also reunion with Kenneth at the same time

May be you have throught of to append new columns is...of and Target at the end of the table Citizenas below:

Citizen#First NameLast Nameis….ofTargetDate
1001BarbieStereotypical Girl friendKennethFeb
1002BarbieWeird Girl friendKennethApril
1003KennethCarsonBoy friendStereotypicalFeb

You will immediately realize that you cannot record both fact#1 and #2 at the same time. If you record #1 in Feb and modify it to #2 in March, you will miss the historical record of their relationship.

On top of that , you will also need to modify both Record#1001 and #1003 at the same time after they broke up as both #1001 and #1003 are in fact describing the same fact in different directions. (i.e. Stereotypical is Girl friend of Kenneth, and Kenneth is the Boy friend of Stereotypical), which we called this update anomaly in relational database.

Besides, also we cannot record both fact #3 and #4 at the same time because there is only 1 value able to be recorded in is...of Column.

It seems that a normal tabular table cannot handle the fact descripting the relationship among different records inside the same table. This data pattern is regarded as Recursive Relationship.

In order to remedy the shortcoming, a new table, which is classifie as a Bridge Table, Love-Relationship is necessary to be created to record the recursive relationships among the records in the same table Citizen as below:

Love-Relationship#DateSubjectis….ofObject
2001FebStereotypical BarbieGirl friendKenneth
2002FebKennethBoy friendStereotypical Barbie
2003MarchStereotypical BarbieEx-Girl FriendKenneth
2004MarchKennethEx-Boy friendStereotypical Barbie
2005AprilWeird BarbieGirl friendKenneth
2006AprilKennethBoy friendWeird Barbie
2007MayStereotypical BarbieGirl friendKenneth
2008MayKennethBoy friendStereotypical Barbie
2009MayStereotypical BarbieRivalWeird Barbie

Graph Database Solution

In Figure 1 which is built by a Graph Database, you can cleary and easily address all the facts #1,#2,#3,#4 mentioned previously. All fidelity is preserved via Graph Database.

You may realize that some of the relationships inside the Graph Database may be duplicated or redundant. For example, there is no need to record both directions of the Rival relationship between the 2 Barbies. While we call that Direction in Graph Database, this is not our focus in this article and we will leave the discussion in the later session of this paragraph.

Graph Database Data Schema

Below is teh Graph Database Data Schema in Cypher, which creates the graph shown in Figure 1 :

// Figure 1 - Create nodes with Label: Citizen
CREATE (:Citizen {firstName: "Stereotypical", lastName: "Barbie"});
CREATE (:Citizen {firstName: "Weird", lastName: "Barbie"});
CREATE (:Citizen {firstName: "Kenneth", lastName: "Carson"});

// Relationships for February
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Girlfriend_of {month: "Feb"}]->(kenneth),
       (kenneth)-[:Boyfriend_of {month: "Feb"}]->(stereotypical);

// Relationships for March
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Ex_Girlfriend_of {month: "March"}]->(kenneth),
       (kenneth)-[:Ex_Boyfriend_of {month: "March"}]->(stereotypical);

// Relationships for April
MATCH (weird:Citizen {firstName: "Weird", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (weird)-[:Girlfriend_of {month: "April"}]->(kenneth),
       (kenneth)-[:Boyfriend_of {month: "April"}]->(weird);

// Relationships for May
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
      (weird:Citizen {firstName: "Weird", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Girlfriend_of {month: "May"}]->(kenneth),
       (kenneth)-[:Boyfriend_of {month: "May"}]->(stereotypical),
       (stereotypical)-[:Rival_of {month: "May"}]->(weird);

The comparison between of the Dimensionality of the Data Schema between Relational Database and Graph Database are as below:

DimensionRelational Database Data SchemaGraph Database Data Schema
1-DAttribute (Column)Node Properties
2-DRecords (Row)Node
3-DTableLabel
4-DBridge TableType (i.e. Edge)
5-DAttribute in Bridge TableType Properties

The Graph Database perfectly caters the recursive relationships between different records inside the same Table in Relational Database. 


Relational Database Problem Pattern – Data Duplication

In fact, when we created a new Bridge Table, Love Relationship Table in this case, you will find that both the names of Stereotypical Barbie, Weird Barbie and Kenneth Carson had been shown up more than once inside the Love Relationship Bridge Table, as well as duplicating with the records inside the Citizen Table. (e..g you can find Stereotypical Barbie in both Love Relationship Table and Citizen Table)

This data duplication made the description about the reality lose its fidelity that while the database records Stereotypical Barbie (and all other people) more than once, in reality there is only one Stereotypical Barbie. There is discrepancy between the records (i.e. the Model) and the reality.

The Graph Database , on the contrary , records Stereotypical Barbie once, which precisely describes the fact that there is one and only one Stereotypical Barbie in reality. 


Relational Database Problem Pattern – Lack of Functional Dependency

Recall in the previous paragraph that a Recursive Relationship (or self-referential relationship) is the relationship betweens any of the 2 (or more) records inside the same Relational Table. 

If Recursive Relationship is describing the vertical dimension of a relationship, (i.e. whenever you add a new record in a table, the length of the table will be extended vertically.) Functional Dependency , on the contrary, is describing the horizontal dimension of the relationship.  (i.e. whenever you add a new column (i.e. attribute) in a Table, the length of that Table will be extended horizontally).

Functional Dependency is referring to a specific column (attribute) in a table ,is dependent on another column (attribute) in the same Table.

Let’s illustrate the concept Functional Dependency with the example table Citizen in below:

Citizen#First NameLast NameGender
1001BarbieStereotypicalF
1002BaribeWeirdF
1003KennethCarsonM

With common sense, we can infer by the attribute First Name that Baribe should be a Female, while Kenneth should be a Male. We can say that the attribute Gender is dependent on the attribute First Name. (regardless of their Last Name, of course). 

This kind of dependency is called Functional Dependency.

Thanks to the transformation of SQL, the 2 most popular Relational Databases, MySQL and MariaDB , started supporting the SQL Keyword CHECK after the version 8.0.16 and 10.3.10 respectively. By using SQL Keyword CHECK, we can apply the functional dependency by adding the CONSTRAINT in the SQL statement as Figure 2 in below:

// Figure 2 - Create Relational Table and associated Constraints
CREATE TABLE Citizen (
    Citizen# INT PRIMARY KEY,
    First_Name VARCHAR(50),
    Last_Name VARCHAR(50),
    Gender CHAR(1),
    CONSTRAINT chk_gender_ken CHECK (First_Name = 'Ken' AND Gender = 'M' OR First_Name <> 'Ken'),
    CONSTRAINT chk_gender_barbie CHECK (First_Name = 'Barbie' AND Gender = 'F' OR First_Name <> 'Barbie')
);


The compatibility of the SQL Keyword CHECKis in fact a good move and big move in relational database world which makes our coding life much easier, until you realize that you have to hard code the constraints (i.e. the rules) into the SQL.

What if there are 10,000 known First Names in the world and I want to turn them all to the constraint? 

Obviously it is extremely hard , if not impossible , for any programmer to hard code the constraint into the SQL statement, not to mention these 10,000 additional SQL CONSTRAINT statements will significantly drag down the performance of the query.

Moreover, whenever an end-user of the system discovers a new First Name and wants to add it into the CONSTRAINT fleet, there is no way for the end user to insert the new CONSTRAINT as you will not expect him/her to write the SQL statement himself / herself. The Extensibility of the system is suck.

In order to cater the extensibility problem, how about if we create an additional lookup Table FirstNameGenderRule to restore all the rules as below:

Rules#First NameGender
3001BarbieF
3002BaribeF
3003KennethM

In this sense, every time before a new record is inserted into the Table Citizen, a constraint lookup to the Table FirstNameGenderRule will be triggered in order to validate the value in the Gender of the record.  Whenever a new First Name is found, the end-user can append a new record in this FirstNameGenderRule Table to a new rule via the user Form.

While this method makes perfect sense and served the functional dependency as well as solved the extensibility problem of the system, the nature of this FirstNameGenderRule is similar to the Bridge Table we have mentioned previously in this article, which the data redundancy is happened again due to the fact that the value of both First Name and Gender stored twice in both Table Citizen and FirstNameGenderRule.

Meanwhile, in the Graph Database, in order to cater both the functional dependency objective and cater the extensibility problem of the system, we come up a Cypher solution in Figure 3 below:

// Figure 3 - Create Rule Node
CREATE (:Rule {FirstName: 'Ken', Gender: 'Male'}),
       (:Rule {FirstName: 'Barbie', Gender: 'Female'}),
       (:Rule {FirstName: 'Sam', Gender: 'Non-Binary'});
// Figure 3 - Validation
MATCH (c:Citizen)
OPTIONAL MATCH (r:Rule {FirstName: c.FirstName})
WHERE c.Gender <> r.Gender
RETURN c.CitizenID AS ViolatingCitizenID, c.FirstName AS FirstName, c.Gender AS CitizenGender, r.Gender AS ExpectedGender;

Based on the 3 Nodes (i.e. 3 records) under the Citizen Label we have already created in Figure 1, simply create 2 Nodes for the 2 constraint rules. You can of course put these 2 Nodes under the Label FNGenderRule to categorize the rule Nodes

In the future, whenever a new First Name is found, we can simply add a new Node under the FNGenderRule Label and that is!

Once you run the WHEREclause in Figure 3, all the unvalid entry will be filtered out. (and modify automatically by using the SET keyword if you want. But i will skip this part.)

Unfortunately , when we look closely at the newly created FNGenderRule Node, we may realize that in fact the property of First Name and Gender still exists and is duplicated with that of the columns in Citizen. We cannot fix the data redundancy in Graph Database either.

Maybe you think crazily enough like me to externalize every single properties to become a node. Nevertheless, even though you can write the Cypher to execute like we have done in Figure 4 below:

Figure 4 bGraph Functional Dependency Citizen Properties Not Linked
Figure 4 bGraph Functional Dependency Citizen Properties Not Linked

While in the first insight it sounds we have served the functional dependency and extensibility requirements without sacrificing the data redundancy, in fact we have just created another worm-hole.

The question we should ask : how do we interpret from Figure 4 that {Kenneth, Carson, Male} is in fact the property of a Citizen?

Unfortunately, there is no way for us to link up the Nodes Kenneth, Carson and Male. Even though if you can, you may spend much more time than the benefit you gaining from solving the data redundancy problem.

And therefore, you keep optimizing your data model by using the Citizen# as the Label of each leaf Nodes such that you can filter out a specific Citizen based on the Citizen# inside the Label like Figure 5 in below: 

Figure 4 bGraph Functional Dependency Citizen Properties Linked By Label
Figure 5 bGraph Functional Dependency Citizen Properties Linked By Label

While technically and theoretically it is feasible, before you model your data in this way, think about what if there are 1,000,000 citizens, and what if you need to update/delete/modify the value of a record? And what if Kenneth changed his name from Kenneth to Ken? You need to firstly add a new Node Ken and then delete the Node Kenneth.  

This operation is just not worth compared with the benefit brought from the data redundancy.


Relational Database Problem Pattern – Data Duplication


Relational Database Problem Pattern – Data Duplication

Conclusion


Footnotes




Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Diamond Digital Marketing International