Category: bGraph

Data Schema of Relational Table Importing to Graph Database – Execution

Introduction

In the earlier article titled Data Schema of Relational Table Importing to Graph Database – Dimensionality, we explored how to observe real-world data and transform it into a digitised format within a Relational Database. The greater the dimensions we account for, the higher the fidelity with which the Relational Database represents reality.

In this article, we are going to transform the data stored in the Relational Database , to the Graph Database.

Why to transform data from Relational Database to Graph Database

While of course there are many ways for us to directly input the data into the Graph Database, there are still day to day scenario which we should input data into Relational Database before and transform them into Graph Database:

Data already recorded in the Relational Database.

Relational Database is much more popular than Graph Database. Most of the systems in most of the companies stored their data in Relational Database or in tabular format.

Common Form as the Input Interface

While we can use Cypher to input the data directly into Graph Database, the user needs to go through a long learning curve before they can master the new computer language Cypher. To provide a commonly used Form-style Input interface for the user for their CRUD activities will encourage the user to engage the system.

Problem Pattern – Solution – Data Schema Trio

The next question you may ask is : So why do we need to use Graph Database instead of Relational Database?

To answer this question, I will prefer to write in a trio of (1) Relational Database Problem Pattern – (2) Graph Database Solution – (3) Graph Database Data Schema such that all the 3 pieces of information will be interrelated, even though it may not map the content with the title of (1),(2) and (3) explicitly.

Relational Database Problem Pattern – Lack of Recursive Relationships

Table Citizen

Citizen#	First Name	Last Name
1001	Barbie	Stereotypical
1002	Barbie	Weird
1003	Kenneth	Carson

While table Citizen is a typical relational data table which perfectly records the information (i.e. the 3 properties) of the 3 people (i.e. there are 3 records), can you imagine how to record the relationships among the records inside the same table?

For example, what if I want to record the facts:

Feb : Stereotypical Barbie in relationship with Kenneth
March : Stereotypical Barbie broke up with Kenneth
April : Weird Barbie in relationship with Kenneth
May : Stereotypical Barbie also reunion with Kenneth at the same time

May be you have throught of to append new columns is...of and Target at the end of the table Citizenas below:

Citizen#	First Name	Last Name	is….of	Target	Date
1001	Barbie	Stereotypical	Girl friend	Kenneth	Feb
1002	Barbie	Weird	Girl friend	Kenneth	April
1003	Kenneth	Carson	Boy friend	Stereotypical	Feb

You will immediately realize that you cannot record both fact#1 and #2 at the same time. If you record #1 in Feb and modify it to #2 in March, you will miss the historical record of their relationship.

On top of that , you will also need to modify both Record#1001 and #1003 at the same time after they broke up as both #1001 and #1003 are in fact describing the same fact in different directions. (i.e. Stereotypical is Girl friend of Kenneth, and Kenneth is the Boy friend of Stereotypical), which we called this update anomaly in relational database.

Besides, also we cannot record both fact #3 and #4 at the same time because there is only 1 value able to be recorded in is...of Column.

It seems that a normal tabular table cannot handle the fact descripting the relationship among different records inside the same table. This data pattern is regarded as Recursive Relationship.

In order to remedy the shortcoming, a new table, which is classifie as a Bridge Table, Love-Relationship is necessary to be created to record the recursive relationships among the records in the same table Citizen as below:

Love-Relationship#	Date	Subject	is….of	Object
2001	Feb	Stereotypical Barbie	Girl friend	Kenneth
2002	Feb	Kenneth	Boy friend	Stereotypical Barbie
2003	March	Stereotypical Barbie	Ex-Girl Friend	Kenneth
2004	March	Kenneth	Ex-Boy friend	Stereotypical Barbie
2005	April	Weird Barbie	Girl friend	Kenneth
2006	April	Kenneth	Boy friend	Weird Barbie
2007	May	Stereotypical Barbie	Girl friend	Kenneth
2008	May	Kenneth	Boy friend	Stereotypical Barbie
2009	May	Stereotypical Barbie	Rival	Weird Barbie

Graph Database Solution

In Figure 1 which is built by a Graph Database, you can cleary and easily address all the facts #1,#2,#3,#4 mentioned previously. All fidelity is preserved via Graph Database.

Person Relationship In Graph Database — Figure 1 – Citizen Relationship In Graph Database

You may realize that some of the relationships inside the Graph Database may be duplicated or redundant. For example, there is no need to record both directions of the Rival relationship between the 2 Barbies. While we call that Direction in Graph Database, this is not our focus in this article and we will leave the discussion in the later session of this paragraph.

Graph Database Data Schema

Below is teh Graph Database Data Schema in Cypher, which creates the graph shown in Figure 1 :

Copy Code

// Figure 1 - Create nodes with Label: Citizen
CREATE (:Citizen {firstName: "Stereotypical", lastName: "Barbie"});
CREATE (:Citizen {firstName: "Weird", lastName: "Barbie"});
CREATE (:Citizen {firstName: "Kenneth", lastName: "Carson"});

// Relationships for February
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Girlfriend_of {month: "Feb"}]->(kenneth),
       (kenneth)-[:Boyfriend_of {month: "Feb"}]->(stereotypical);

// Relationships for March
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Ex_Girlfriend_of {month: "March"}]->(kenneth),
       (kenneth)-[:Ex_Boyfriend_of {month: "March"}]->(stereotypical);

// Relationships for April
MATCH (weird:Citizen {firstName: "Weird", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (weird)-[:Girlfriend_of {month: "April"}]->(kenneth),
       (kenneth)-[:Boyfriend_of {month: "April"}]->(weird);

// Relationships for May
MATCH (stereotypical:Citizen {firstName: "Stereotypical", lastName: "Barbie"}),
      (weird:Citizen {firstName: "Weird", lastName: "Barbie"}),
      (kenneth:Citizen {firstName: "Kenneth", lastName: "Carson"})
CREATE (stereotypical)-[:Girlfriend_of {month: "May"}]->(kenneth),
       (kenneth)-[:Boyfriend_of {month: "May"}]->(stereotypical),
       (stereotypical)-[:Rival_of {month: "May"}]->(weird);

The comparison between of the Dimensionality of the Data Schema between Relational Database and Graph Database are as below:

Dimension	Relational Database Data Schema	Graph Database Data Schema
1-D	Attribute (Column)	Node Properties
2-D	Records (Row)	Node
3-D	Table	Label
4-D	Bridge Table	Type (i.e. Edge)
5-D	Attribute in Bridge Table	Type Properties

The Graph Database perfectly caters the recursive relationships between different records inside the same Table in Relational Database.

Relational Database Problem Pattern – Data Duplication

In fact, when we created a new Bridge Table, Love Relationship Table in this case, you will find that both the names of Stereotypical Barbie, Weird Barbie and Kenneth Carson had been shown up more than once inside the Love Relationship Bridge Table, as well as duplicating with the records inside the Citizen Table. (e..g you can find Stereotypical Barbie in both Love Relationship Table and Citizen Table)

This data duplication made the description about the reality lose its fidelity that while the database records Stereotypical Barbie (and all other people) more than once, in reality there is only one Stereotypical Barbie. There is discrepancy between the records (i.e. the Model) and the reality.

The Graph Database , on the contrary , records Stereotypical Barbie once, which precisely describes the fact that there is one and only one Stereotypical Barbie in reality.

Relational Database Problem Pattern – Lack of Functional Dependency

Recall in the previous paragraph that a Recursive Relationship (or self-referential relationship) is the relationship betweens any of the 2 (or more) records inside the same Relational Table.

If Recursive Relationship is describing the vertical dimension of a relationship, (i.e. whenever you add a new record in a table, the length of the table will be extended vertically.) Functional Dependency , on the contrary, is describing the horizontal dimension of the relationship. (i.e. whenever you add a new column (i.e. attribute) in a Table, the length of that Table will be extended horizontally).

Functional Dependency is referring to a specific column (attribute) in a table ,is dependent on another column (attribute) in the same Table.

Let’s illustrate the concept Functional Dependency with the example table Citizen in below:

Citizen#	First Name	Last Name	Gender
1001	Barbie	Stereotypical	F
1002	Baribe	Weird	F
1003	Kenneth	Carson	M

With common sense, we can infer by the attribute First Name that Baribe should be a Female, while Kenneth should be a Male. We can say that the attribute Gender is dependent on the attribute First Name. (regardless of their Last Name, of course).

This kind of dependency is called Functional Dependency.

Thanks to the transformation of SQL, the 2 most popular Relational Databases, MySQL and MariaDB , started supporting the SQL Keyword CHECK after the version 8.0.16 and 10.3.10 respectively. By using SQL Keyword CHECK, we can apply the functional dependency by adding the CONSTRAINT in the SQL statement as Figure 2 in below:

Copy Code

// Figure 2 - Create Relational Table and associated Constraints
CREATE TABLE Citizen (
    Citizen# INT PRIMARY KEY,
    First_Name VARCHAR(50),
    Last_Name VARCHAR(50),
    Gender CHAR(1),
    CONSTRAINT chk_gender_ken CHECK (First_Name = 'Ken' AND Gender = 'M' OR First_Name <> 'Ken'),
    CONSTRAINT chk_gender_barbie CHECK (First_Name = 'Barbie' AND Gender = 'F' OR First_Name <> 'Barbie')
);

The compatibility of the SQL Keyword CHECKis in fact a good move and big move in relational database world which makes our coding life much easier, until you realize that you have to hard code the constraints (i.e. the rules) into the SQL.

What if there are 10,000 known First Names in the world and I want to turn them all to the constraint?

Obviously it is extremely hard , if not impossible , for any programmer to hard code the constraint into the SQL statement, not to mention these 10,000 additional SQL CONSTRAINT statements will significantly drag down the performance of the query.

Moreover, whenever an end-user of the system discovers a new First Name and wants to add it into the CONSTRAINT fleet, there is no way for the end user to insert the new CONSTRAINT as you will not expect him/her to write the SQL statement himself / herself. The Extensibility of the system is suck.

In order to cater the extensibility problem, how about if we create an additional lookup Table FirstNameGenderRule to restore all the rules as below:

Rules#	First Name	Gender
3001	Barbie	F
3002	Baribe	F
3003	Kenneth	M

In this sense, every time before a new record is inserted into the Table Citizen, a constraint lookup to the Table FirstNameGenderRule will be triggered in order to validate the value in the Gender of the record. Whenever a new First Name is found, the end-user can append a new record in this FirstNameGenderRule Table to a new rule via the user Form.

While this method makes perfect sense and served the functional dependency as well as solved the extensibility problem of the system, the nature of this FirstNameGenderRule is similar to the Bridge Table we have mentioned previously in this article, which the data redundancy is happened again due to the fact that the value of both First Name and Gender stored twice in both Table Citizen and FirstNameGenderRule.

Meanwhile, in the Graph Database, in order to cater both the functional dependency objective and cater the extensibility problem of the system, we come up a Cypher solution in Figure 3 below:

Copy Code

// Figure 3 - Create Rule Node
CREATE (:Rule {FirstName: 'Ken', Gender: 'Male'}),
       (:Rule {FirstName: 'Barbie', Gender: 'Female'}),
       (:Rule {FirstName: 'Sam', Gender: 'Non-Binary'});

Copy Code

// Figure 3 - Validation
MATCH (c:Citizen)
OPTIONAL MATCH (r:Rule {FirstName: c.FirstName})
WHERE c.Gender <> r.Gender
RETURN c.CitizenID AS ViolatingCitizenID, c.FirstName AS FirstName, c.Gender AS CitizenGender, r.Gender AS ExpectedGender;

Based on the 3 Nodes (i.e. 3 records) under the Citizen Label we have already created in Figure 1, simply create 2 Nodes for the 2 constraint rules. You can of course put these 2 Nodes under the Label FNGenderRule to categorize the rule Nodes.

In the future, whenever a new First Name is found, we can simply add a new Node under the FNGenderRule Label and that is!

Once you run the WHEREclause in Figure 3, all the unvalid entry will be filtered out. (and modify automatically by using the SET keyword if you want. But i will skip this part.)

Unfortunately , when we look closely at the newly created FNGenderRule Node, we may realize that in fact the property of First Name and Gender still exists and is duplicated with that of the columns in Citizen. We cannot fix the data redundancy in Graph Database either.

Maybe you think crazily enough like me to externalize every single properties to become a node. Nevertheless, even though you can write the Cypher to execute like we have done in Figure 4 below:

Figure 4 bGraph Functional Dependency Citizen Properties Not Linked

While in the first insight it sounds we have served the functional dependency and extensibility requirements without sacrificing the data redundancy, in fact we have just created another worm-hole.

The question we should ask : how do we interpret from Figure 4 that {Kenneth, Carson, Male} is in fact the property of a Citizen?

Unfortunately, there is no way for us to link up the Nodes Kenneth, Carson and Male. Even though if you can, you may spend much more time than the benefit you gaining from solving the data redundancy problem.

And therefore, you keep optimizing your data model by using the Citizen# as the Label of each leaf Nodes such that you can filter out a specific Citizen based on the Citizen# inside the Label like Figure 5 in below:

Figure 4 bGraph Functional Dependency Citizen Properties Linked By Label — Figure 5 bGraph Functional Dependency Citizen Properties Linked By Label

While technically and theoretically it is feasible, before you model your data in this way, think about what if there are 1,000,000 citizens, and what if you need to update/delete/modify the value of a record? And what if Kenneth changed his name from Kenneth to Ken? You need to firstly add a new Node Ken and then delete the Node Kenneth.

This operation is just not worth compared with the benefit brought from the data redundancy.

Relational Database Problem Pattern – Data Duplication

Conclusion

Footnotes

March 10, 2025

Data Schema of Relational Table Importing to Graph Database – Dimensionality
Definition

Relational Table

The Table is the fundamental components inside a Relational Database which stores data in tabular format using Column and Row to coordinate a specific Value (i.e. the Cell). Form and Table (i.e. Tabular format) are everywhere in your daily life.

Tabular Relational Table

Data Schema in Relational Database

A data schema in a Relational Database is a blueprint or structure that defines how data is organized, stored, and related within the database. It describes the database’s logical design and covering when and what to create Table, Columns , Index, Contraints, Relationship and Data Type inside a Table in a Relational Database

Graph Database

Graph Created By Graph Database

A graph database is a type of database that uses graph structures to represent and store data. Instead of organizing data into rows and columns like relational databases, graph databases focus on relationships and connections between data points.

Objective of Importing Data From Relational Table to Graph Database

While Graph Database can be acting as the Enterprise Knowledge Base which can speed up the learning curve of both staff and client , how to import the data into the Graph Database is a challenge.

As data with Tabular format is dominant in the world, it is inevitable for us to import data from a Relational Database to a Graph Database. There are few ways that we can import the data into the graph database:
1. Directly coding by Cypher Cypher is a Graph Query Language , as if the SQL in relational database
2. Import both the data schema (i.e. the metadata. E.g. datatype, column constraint) and the business data (i.e. a normal record inside a table in a relational database) directly from the relational database)
While graph databases are not common in the general public, it seems that both of the methods required some kind of expertise in order to get the job done. Besides, the methods above are good for batch importing. If you want to import the data in piecemeal, it is not handy.

It is necessary to have a normal input Form (just like the Form you can see every day in your life) which has zero learning curve for the user to import the data manually into the graph database.

Data Types

The wording “Data Type” has a different meaning. it can be referring to the description of the data format , e.g. a column in a Form which can only input a value of Integer , decimal ,text, or a autonumber.

Instead, Data Type has another meaning which is used to classify data as Business Data, Metadata and Model Data. For details about the difference between the 3 , it is strongly recommended you to read the article bGraph Architecture – Model Data beforehand.

The focus of this article is for catering the latter.

Interpreting Relational and Graph Database in a Dimensional Perspective

Before we dive into the problem pattern and its paired solution on importing the data from a relational table into a graph database, the concept of interpreting data in terms of dimension is crucial for us to understand what and why we are solving the problem in the way we did.

What is Dimension

By definition, In physics, a Dimension is a space that can be measured and extended. For example:

Dimension (x-D) Example of Unit Example in Reality
0-D N/A a point (or a Spot)
1-D cm a Line
2-D cm² a Plane (or an Area)
3-D cm³ a Volume
4-D Hour Time
5-D ?? ??

Relationship Between Dimension and Vector

A practical purpose of the concept “Dimension” is to coordinate something.

For example, can you tell me where the alphabet “A” is in the Line below?

______A___

While the “A” is definitely not in the middle, it is not in the starting or ending point of the line either. You can say it is right-stewed , but cannot exactly tell.

How about now?

123456A89

Now you can confidently say that the “A” is located in the position “7” of this 9-unit-long Line.

The Line is regarded as “1 dimension” because you can measure the length of the line (1-9) in width (and only width) , but not in height or in depth or in time.

Dimension Co-ordinate of Data Point Example
0-D N/A
1-D {1}
2-D {1,2}
3-D {1,2,8}
4-D {1,2,8,10}

So it is quite easy to understand that the amount (or you can say the “digit”) of data points is exactly the same as the degree of dimension. Whenever you want to coordinate an additional dimension, simply add an additional data point (i.e. the digit) into the array.

In mathematics , we call this long number and extensible array (e.g. {1,2,8,10}) as Vector, which is the concept under the mathematics branch of Linear Algebra.

While as a human being we cannot imagine a material object in a 5-D physical world, there is no limitation on how many dimensions you can add in mathematical world. In fact , you can add 100 , 1,000 or 10,000 data points into the Vector if needed, as long as you can think of the use case (e.g. to coordinate something) and provided that you have sufficient computing power to do the vector calculation and operation.

That’s the reason why even the Graph Theory coined almost 300 years before, and computer existed almost 90 years, we still didn’t hear of Graph Database until this 1 or 2 decades (may be because you were not even born!) due to the limitation of the computational power and infrastructure.

Relationship Between Dimension and Database

0-Dimension in Database

As mentioned before, the main purpose of a Vector is to coordinate a data point. In fact , coordinating something is the initial step of “Searching”.

What is the benchmark of whether we can successfully coordinate something mostly depends on whether the coordination can refer to one and only one outcome (i.e. singleton)

Allow me to start with an example based on 2023 Barbie film.

Imagine in a country ,Barbie Land, Barbie¹ is the one and only one citizen. To record Barbie as the citizen of this country (most likely Barbie will be the one who carries out this recording task!), Barbie can simply record like below:

Barbie

That is!

While there is one and only one citizen in this country, every time anyone talk about “Barbie” in this country, the word “Barbie” can uniquely identity the material substance of the person who is referring to , which means it can perfectly and good enough to “coordinate” the one and only one material substance citizen “Barbie” in Barbie Land.

You can regard this “Cell” as 0-Dimension because there is one and only one value, and there is no any direction for you to extend horizontally or vertically. (Remember the definition of Dimension?)

0-Dimension : In terms of Data Structure, only one and only one Cell which stored the Value.

1-Dimension

When someone outside Barbie describes Barbie, he/she may say that Barbie is 29 cm height , the citizen of Barbie Land. If height = 29cm , citizenship = Barbie Land, then X = Barbie , what is the “X”?

Obviously the X = Name.

In order to faciliate others to describe the material substance object (i.e. Barbie herself), we can add an additional Cell on top of the Cell Barbie as below.

Name
Barbie

By applying the definition of Dimension, you can realize that the Cell (i.e. 0-Dimension) had been added an additional Cell Name and turn it to a vertical line (i.e. a Column). Therefore, the data structure had been transformed from 0-Dimension to 1-Dimension.

From Now on , you can call this data structure as Column.

1-Dimension : In terms of Data Structure, when a “Column Name” is appended on top of the 0-Dimension Cell, it can be regarded as 1-Dimension.

2-Dimension

One day, when Barbie experienced the imprefection of herself , she realized that she is in fact only a stereotype. She therefore gave herself a Last Name as below:

First Name Last Name
Barbie Stereotypical

When a column Last Name is added, the word “Name” is not enough to differentiate between the two. And to uniquely identify the two , the “First” and “Last” are added in front of the “Name”.

You may find the pattern that whenever you want to uniquely identify some objects, simply attach some attributes (e.g. Last Name) to that object, in order to let the Name refers to one and only one instance. (This is the definition of the word “Definition”!)

Back to our database use case, as now an additional Column is attached to Barbie, you can see the data structure is in fact extended from a Column (1-Dimension) to a Plane (2-Dimension). If you remember the vector example, you can write it the vector as {Barbie, Stereotypical} .

From now on, you can call this Plane as Table.

2-Dimension : In terms of data structure, whenever there are 2 columns with a Column Name and 1 Record, it can be regarded as 2-Dimension.

Adding a New Record in 2-Dimension

As time goes by, the population of the Barbie Land increased 100% from 1 person to 2 people , of which the newbie is also named Barbie.

First Name Last Name
Barbie Stereotypical
Barbie Weird

Applying the same logic of the definition of Dimension, although an additional record , i.e. Barbie Weird, is appended to the list, the data structure is still in 2-Dimension as the new record by itself does not extend to any new direction. (it extended vertically which this direction already existed before.) That means no matter how many records you added, the Table is still in 2-Dimension.

Adding a New Column in 2-Dimension

Although logically anyone can classify 2 Barbies with different First Name and Last Name, in reality as the 2 Barbies will not seal their name in their forehead, there is no way to teach a person who never see them before to differentiate between the two.

Having discussed with this problem, in order to uniquely identify themselves visually, the 2 Barbies agreed to give a Hair Style describe to themslves as below:

In this case, a obversable character is needed to enrich the data structure, letting the information inside the data structure more close to the reality.

First Name Last Name Hair Style
Barbie Stereotypical Floating
Barbie Weird Quirky

Applying the same logic of the definition of Dimension, although an additional column , i.e. Hair Style, is appended to the Table, the data structure is still in 2-Dimension as the new Column by itself does not extend to any new direction. (it extended horizontally which this direction already existed before.) That means no matter how many Columns you added, the Tabel is still in 2-Dimension.

3-Dimension

It seems the 2-Dimension Table can describe the reality of the Barbie Land well until both Barbies want to create a sunglasses list which can help her to manage all your eyewear in their wardrobe.

After hours of effort , they created the sunglasses list as below:

As a new Data Table is created, here raised another question : How to uniquely identitiy the 2 Tables during communication?

Well, as you can think of, simply giving the name to each of the 2 Tables can solve the problem.

Sunglasses# Sunglasses Style Sunglasses Frame Color
S01 Cat-Eye Frame Orange
S02 Aviator Deep Blue
S03 Sporty Silver

Therefore, Stereotypical Barbie name the 2 Tables as Citizenand Sunglassesrespectively.

Same as your life, the 2 barbies just open a can of worms. While both Citizenand Sunglasses by itself are 2-dimension objects (i.e. a Table), listing out 2 Tables , referring to the definition of Dimension, will create another new Dimension (2 Tables are definitely measurable and you can extend as many objects (i.e. Tables) as you want in the same direction).

And therefore, a 3-Dimension data structure is just born! Now you can call this 3-Dimension data structure as Database.

Testing The Theory From 1 to 3 Dimension

It’s time for us to take a deep breath before we dive into the 4-Dimension data world. Limited by the human brain’s structure, any dimension higher than 3 will be hard to visualise and project in our brain. Therefore we have to make sure we are acquainted with the 0-3 Dimension concepts well.

The next question is, how do we prove that our 0-3 Dimension data structure theory is correct and practical?

Let us back to the basics of why we have to build the data structure. This is the fundamental objective of the information system. The logic is as below:
1. Making decision will bring value in the real world
2. Decision Making Quality affects the value of Decision Making.
3. Accurate, timely and relevant Information will facilitate the Decision Making Quality.
4. Information is refined from how we analyze and process (e.g. egress, load and transfer) the Data in the Database.
5. The Data stored in the Database is classified as Descriptive Data and Inferential Data.
6. Descriptive Data is from Real World Observation.
7. Inferential Data is the product of both Descriptive Data and Intelligent (i.e. how we analyse the data, for example using logistic regression model to classify the Descriptive Data).
To streamline the deduction steps above, you can conclude that the purpose of the Information System is that the user of that Information System (i.e. the Database) can simply make the decision by observing the world, leaving all the processing and analysing steps to the Information System.

And hence, whether or not the database is competent or not depends on whether the data stored can reflect the reality in fidelity (i.e. is it descrptive enough) and uniquely identify the underlying objects found with the real world (i.e. co-ordinating) , while leaving the inferential duty to other system. (e.g. an AI facial recognition system)

It is straightforward that we can test the fidelity of the database by asking simple questions in below:

Q1: How many citizens are there in Barbie Land?

A1: Two.

By counting the number of records (i.e. Row) in the Table Citizen, we can easily figure out there are 2 records in the Table Citizen.

The 1-Dimension data structure (i.e. any of the Column inside the Table) perfectly performs the task on description the objects in reality, and hence it can be regarded as a competent database.

Q2: Please describe the citizen in Barbie Land whose last name initialized with “s” ?

A2: Steretypical Barbie is the citizen in Barbie Land who has a floating hair style.

By filtering the Last Name column in the Table Citizen , you can easily describe the object (i.e. the record) by finding out its First Name, Last Name and Hair Style.

This 2-Dimension data structure (i.e. Database) perfectly performs the task on description and co-ordinating the object in reality, and hence it can be regarded as a competent database.

Q3: What object types can be found in Baribe Land? and how many instance in each object?

A3: 2 Citizens and 3 Sunglasses can be found in Barbie Land.

By enumerating all the Tables inside the Database, we can easily answers the question Q3.

This 3-Dimension data structure (i.e. Database) perfectly performs the task on description and co-ordinating the object in reality, and hence it can be regarded as a competent database.

You can see that up to now , the existing 1 to 3-Dimension data structure perfectly answers the 3 questions above, until we start asking another type of questions : 4-Dimension question.

4-Dimension

Let’s start the 4-Dimension session with a question and you will realize the limitation of the 3-Dimension data structure.

Q4: What is the wearing habit of the Citizens in Weekday?

While we already have the Table Citizen and Sunglasses to record each individual citizen and sunglasses, these 2 Tables do not describe the Relationship between the 2.

If we tried to record the mix and match wearing observation in existing Tables, no matter which Table (Citizen or Sunglasses) , it will look like this:

Weekday Sunglasses# Sunglasses Style Sunglasses Frame Color Citizen
Monday S01 Cat-Eye Frame Orange Stereotypical
Monday S03 Sporty Silver Weird
Tuesday S02 Aviator Deep Blue Weird
Tuesday S03 Sporty Silver Stereotypical

The Table above sacrificed the fidelity the reality provided in answer of Q1 (i.e. How many citizens are there in Barbie Land?) as some of the citizen are duplicated in the records such that we cannot simply count the records to answer the question. (i.e. There are 4 records in total but in reality it got only 2 sunglasses and 2 citizens)

In order to solve the problem, instead of appending the additional columns into any of the existing Tables , it is a better way to create 2 additional tables – WeekDay Table

Weekday
Monday
Tuesday
Wednesday
Thursday
Friday

Together with the Mix and Match Table that we just created:

Weekday Sunglasses# Sunglasses Style Sunglasses Frame Color Citizen
Monday S01 Cat-Eye Frame Orange Stereotypical
Monday S03 Sporty Silver Weird
Tuesday S02 Aviator Deep Blue Weird
Tuesday S03 Sporty Silver Stereotypical

Now there are 4 Tables in Total inside the Database:
1. Citizen
2. Sunglasses
3. Weekday
4. Mix and Match
However, when you observe carefully, you can realize that in fact the terms Mix and Match has no records and does not exist in reality. Instead, the concept Mix and Match is a concept (i.e. a Relatoinship) rather than a material substance which can be observed.

If you still have no idea what i am talking about, let’s recall the table Citizen that we have created during the 2-Dimension session:

First Name Last Name Hair Style
Barbie Stereotypical Floating
Barbie Weird Quirky

Can you realize that you cannot exactly find any material substance of “Citizen” inside the Citizen Table? It is because the terms Citizen is a Class , while the 2 Barbies are records are the Instance. There is no material subtance for the term Citizen.

Same phenomenon happended in other Tables that you can only find the Instance instead of the Class inside the records in any tables.

If you think this logic make sense and apply to the table Mix and Match , you may now realize that the concept Mix and Match is a Class instead of an Instance which cannot be found inside the record of the Table Mix and Match.

As we just created a new Class Mix and Match to consolidate all the 3-dimesion Tables Citizen , Sunglasses and Weekend , that means another new Dimenion , 4-Dimension, had been created.

If we visualize all the relationships of all concepts mentioned throughout the paragraph via the Entity Relationship Diagram, it will become the diagram in below:

Barbie Land Mix And Match Entity Relationship Diagram

4-Dimension+

So, it is possible that i can build infinity number of Dimensions of the data structure inside the database? Yes, in theory you can. Whenever you cosolidate all the instance in the same Dimenion and form a list, you created a new Dimension.

As long as you understand how we use the dimensionality in the data structure to describe the real world, we can stop the example in 4-Dimension.

Conclusion

In this section we just introduced how we observe and describe the reality and fit them into a Relational Database via a dimensional way, as well as how we create a new dimension by gathering all instance in lower dimension to form a new class in the upper dimension.

In next article, we are going to address the problems that the Relational Database suffers, and how we fit the Relational Database into the Graph Database to compensate the problems suffered by Relational Database.

Footnotes
1. Please refer to the Barbie film 2023 in order understand why we use the name Barbie as an example. ↩︎
March 3, 2025
What Problem Patterns bGraph Is Going to Solve
Introduction

bGraph is an SaaS developed in-house by Diamond Digital Marketing Group which can be categorised as a GraphRAG web application serving as an Enterprise Knowledge Graph.

To better understand what GraphRAG exactly is, it is imperative for us to start with a real world problem pattern

Real World Problem Patterns

The definition of profit is simply Sales Revenue Minus Cost. While in the legal aspect, an cost item of Labour Cost (e.g. Salary) is good enough to meet the legal duty in terms of financial report, it cannot be in the real world reflects the problem that how the 40 Hours x 4 Weeks working hour for a staff is distributed among different activities throughout his/her daily operation. Instead of presenting the cost in monetary terms, I would like to convert it into Time.

In almost any industries or any business model we can categorize the types of time cost in below:

Production Time Cost

In a service-oriented business, Production Time Cost is simply referring to the time a staff spent on rendering a service to the client. For example, a hair dresser spent 30 minutes on providing hair styling service to the client. This 30 minutes will be categorized as Production Time Cost.

In a sku-oriented business, Production Time Cost can be categorized as the time on any kinds of labour cost incurred between planning to product delivered to the client. For example, even if you sell a Clock online, not only the Time Cost the Product Manager should be spent on designing and manufacturing the clock, the Customer Service Officer also needs to spend time on answering the enquiries from the wholesale or end-user clients.

Communication Time Cost

Communication Time Cost is indispensable in business world. We can easily find the Communication Time Cost in scenarios below:

Communication between staff to client
1. Reporting (e.g. Sales Report, Order History)
2. Documentation (e.g. Invoice , Quotation, Shipping Note)
3. Enquiries from Clients
Communication between staff to staff
1. Reporting (e.g. Monthly Report)
2. Knowledge Transfer – Meeting
  - e.g. In a brainstorm marketing meeting in a digital marketing agency which the salesperson needs to transmit the requirement and the marketing parameters acquired from the client to the marketer team such that the marketer team can formulate a digital marketing strategy based on the input from the salesperson.
3. Knowledge Transfer – Training and Learning
  - e.g. A new staff onboard which he/she can be competent to his/her job duties when time goes by based on below:
    
    Operational Manual Reading – 10%
    
    Advice from supervisior and teammate – 10%
    
    Hands-on practicing – 30%
    
    Trial and Error , feedback and complain from supervisior or client – 50%
4. Knowledge Not Transferred
  - e.g. Imagine 1000 hours had been spent by the staff on designing and developing a new product , system or skill. When the staff quit the organisation, the knowledge that he/she acquired will be left together with him/her if there is lack of knowledge management practice in the organisation.
5. Communication by Optimisation
  - e.g. A client requested to the web designer that he want the font-size in the website “Bigger”. Whenver the web designer modified the font-size from “12pt” to “24” pt, and then the client responsed that the font-size is “Too Big” this time. So the web designer by trial and error to adjust the font between “12pt to 24pt” a few times, and finally optimized in font “18pt”.
6. Instruction Placing
  - e.g. An Marketing Director ordered the Marketing Manager in his/her team to prepare for the Consolidated Marketing Report with the followng specifications which is different each time:
    
    CTR
    
    CPC
    
    CPM
    
    CPA
    
    ROAS
Searching Time Cost

Searching Time Cost can be derived from following scenarios:
1. Bad Naming Convention
  - e.g. a Sales Report File named as “Report.pdf“, which the name in itself cannot differentiate among other reports created in/by different time, purpose and person.
2. Lack of Indexing
  - e.g. Reinvent a new wheel – A staff spent 20 hours to created a comprehensive operational manual covering all his job duties. However, one day when this staff quit and the new staff does not realize the existence of that operational manual and then he spent another 20 hours to write a new one.
3. Not Semantic Search Friendly
  - e.g. One staff member created a “Client List” and put it in the company drive , while the user searched “Customer List” in the company drive and then no result came out as he did not realise that he needed to search “Client” instead of “Customer“. After 15 minutes back and forth with different staff, he finally realized that he should search “Customer“.
4. Lack of centralized data repositary
  - e.g. A Prospect discussed the project with the staff via Personal WhatsApp , WhatsApp group, Email, Phone Call , MS Team Video Meeting and Face to Face Meeting (with minutes). Having also checked with the CRM , Facebook CRM and eDM sending record, the staff spent 1 hour to consolidate all the data into the CRM contact log and let his supervisor review the whole picture before they can decide what to do next for closing a deal.
Error Handling Time Cost

Error Time Cost can be derived from following scenarios:
1. Time Cost on doing wrong thing
  - e.g. Spent 1 hours to go east while it is expected to go west.
2. Time Cost on undo the wrong step
  - e.g. Spend 1 hours to redo the error and go back to the original starting point.
3. Insurance cost on monitoring and addressing the error
  - e.g. In order to address the wrong direction as soon as possible , the driver should report to the head office hourly. Besides, each car should install a GPS system (i.e. cost incurred) to trace the real-time position of the car.
4. Time Cost Ripple Effect – Error brings New Error
  - e.g. While most of the time in a business world, how to do Step 2 will be dependent on the output of Step 1, and Step 3 dependent to Step 2 , so on and so forth, going wrong in Step 1 will trigger a ripple effect to lead Step 2 and Step 3 all go wrong.
5. Time cost on Error Identification
  - Imagine you are using WordPress to build a website in which you have installed 100 plugins to make the WordPress website workable. However, after this 100 times installation, an error message came out which made your whole website shut down. You have no idea which plugins cause the error. The only way you can find out the root cause is to deactivate all the 100 plugins and start installing and observing the plugins one by one to see when the error occurs. The more plugins you have , and the more error messages you get, the exponential the error handling time cost will go.
Research and Development Time Cost

Error Time Cost can be derived from following scenarios:
1. System Development
  - e.g. You want to write an operational manual to describe a step by step guideline on how to run the procedure at a hair-dressing salon from client walk-in to client leave after service rendered.
2. No record on “Dark Matter”
  - e.g. Imagine there is a problem in which your staff have spent 1 day testing out all the 10 possible solutions. Finally you find out there is 1 and only 1 solution that is feasible among the 10. However, the staff quit and he did not record all the 9 remaining “Not Solutions“. A new staff on board and as he wants to improve the existing 1 solution, he starts his research and goes through the other 9 “Not Solutions” again.
You can imagine that among all these Time Cost, only a very small portion of Time Cost is observable and measureable. The Time Cost which is not observable and measureable can never be cut or minimized.

Ways of handling the Time Cost

There are 3 directions on handling the Time Cost

Eliminate the Cost Item

Directly and brutally cut the item derived the cost. For example, streamline the workflow from 10 steps to 9 steps

Minimize the Cost
1. By systemizing the workflow to cut the communication and training cost
2. By automating the workflow to cut the labour cost
Turn Cost from expenses to assets in nature
1. Build a do-once-use-many-times system. For example, once you write a Sales Script covering all the possible scenarios from a conversational sales meeting, this Sales Script can be applied to many sales meetings with same types in the future until the underlying environment changed. This kind of time spending , even though it is still a Cost, can be classified as Assets instead of Expenses because this Cost will generate future revenue.
Applications which can handle all the Problem Patterns

After years of hands-on experience (this is a black box and don’t ask me how and why I know! Thsi is an human intelligence before artificial intelligence dominates this world), you will realize the application can take 3 steps (or directions , to be precise) to handle all the time costs mentioned above:

Enumeration

By observing and modeling the world, you can address the relevant factors , steps, components, concepts that are related to your business.

For example, when you are running an e-shop, you will realize different kinds of transactional emails or reports which will reflect the reality. This procedure is called Modeling.

Modeling of an eshop Purchase Cycle:
1. New User Registration Email
2. New Order Email
3. Invoice Email
4. Receipt Email
5. Delivery Note Email
While the concept is easy to understand, it is extremely difficult to execute as you have to decouple each procedure, workflow, and concept into an executable encapsulated module which you can reuse or execute systematically.

On top of that, it is a challenge for a Business Analyst or System Analyst to observe from the reality to refine the related components which comprehensively describes the model of the business. We called this comprehensive scenario Sample Space.

For example, while every one will understand the concept “Client”, a Business Analyst have to decouple the concept “Client” based on following attributes in order to make it executable and more close to reality:
1. User Journey – Prospect vs Client or 1st Time Client vs VIP
2. Individual Client vs Enterprise Client.
While the comprehensive option value lists of some of the attributes can easily be enumerated , most of the time most of the option value lists for most of the attributes cannot be enumerated in the time spot in which the system is built. More close to reality is that these option value lists, or even the attributes itselfs, are “growing” organically from time to time instead of being addressed in the very beginning.

For example, even the option value of attributes Gender can be classified as Male , Female in old days and an additional option value Transgender nowadays.

Also , what to observe in reality, and whether you think the component is relevant to your business or not highly depends on the level of knowledge of the Business Analyst. For example, in 4-year old you regarded water as water. But in 14-year old you should have realized the water can in fact be further decoupled by 1 H (Hydrogen) and 2 O (Oxygen).

While we will not dive into the problem patterns that we suffered during the enumration process, enuermation by itself is the very beginning of the GraphRAG based Enterprise Knowledge Graph web application.

In a techncial stack, we normally have the following technical components to execute the Enumeration process:
1. Web Scrapper
2. Public APIs (e.g. Public Facebook Post)
3. Private Database and APIs (e.g. Company self-host CRM or Inventory System)
4. UGC – Voice-to-Text conversation log or User submitted Documentations
5. Modeling Blueprint – Meta Data , Data Schema , Business Lgoic, Compliance and Regulations.
Indexing

Indexing is the procedure to facilitate all the things or concepts in the Sample Space to be stored and searchable.

For example, giving a Sales Order an Order Number (e.g. SO20323) is a common and easy way to “Index” an Sales Activities (and conceptualized via the document Sales Order).

However, not anything can be easily indexed as simple as a Sales Order.

While i am not going to dive into the problem patterns we suffered in the indexing procedure, we can in high level describe some of the indexing procedures for a solution application:
1. Text Embedding to Vector Database – Instead of documentation level, indexing down to the level which is to index every single word written or spoken by a client into a vector database its searchability.
2. Business Catalog in Data Lake – Indexing the Meta Data (i.e. the data of data, a Column Name of a Table, for example) via different data source (e.g. CRM / eDM / Inventory System / Booking System / e-Shop). For example, after you realized that there is a new option value “Transgender” under the attributes of Gender, you will need to index this new option value.
3. Data Streaming – instead of indexing every single component in a batch (e.g. once per day), we index the data in real-time instream.
Mapping

Mapping is simply to find the relationship between 2 concepts. The challenge task is that you need to address which relationship is relevant to link up among tens of thousands of combinations. For example, when a customer service office asked the client to provide the Client ID# , the client forgot his/her Client ID# and simply provide a mobile phone number for the customer service office to lookup what the Client ID# is.

In the above example , Client ID# and Client Mobile Phone Number is easy to map due to the fact that most likely that the Client Mobile Phone Number is linked to the Client Table itself. However , this ease does not apply to everything.

For example, how can you figure out an Facebook Username “BillGates” is in fact referring to the same person in Instagram Username “ThisisBillGates” as they are using different wording to refer to the same object (person)?

As usual, while we didn’t dive into details, we describe the techncial stack it normally be applied to do the mapping:
1. Entity Resolution by Graph Database – find out same person with different name or wording under different data source.
2. Link Preduction by Graph Database – find out the relationship between 2 concepts.
3. RAG by Human Know-how – while LLMs is well trained by public knowledge domain, there is some domain specific knowledge which is in private , for example , the know-how of a 3-star Michigan chef on how to produce a perfect Risotto. Some sort of merchanism should allow the human to manually map the know-how into the Knowledge Graph Database so that the LLMs can apply its well-trained intelligent into the human know-how knowledge. (i.e. This is what RAG is performing)
Searching

Once all the concepts are enumerated and indexed , and the relationship among each concepts (we called it “Node“) are well defined and connected, we can start the Searching step.

In fact, regardless of industry, job nature, role, task , business model, anything, as long as you ask a question, you are performing a “Search” activity.

To execute a “Search” is to “Find a Needle (an instance) in Haystack (a pool of instances)”

For example, when the customer service officer received the an enquiry from the client asking “When my Sales Order Delievered”, he then carry out the steps below:
1. Get the Client Mobile Phone Number from the Client.
2. Search (i.e. Lookup) the Client# by the Client Mobile Phone Number
3. Search (i.e. Lookup) the Sales Odrer# by Client#
4. Search (i.e. Lookup) the Shipping Record # by Sales Order#
5. Search the value of the attributes “Shipping Status” and “Expected Delivery Date” inside the Shipping Order record.
While the previous example is happened in a customer service scenario, the “Search” pattern also happens in the production team.

In fact the search theory deserves a whole book to elaborate. We will skip the theory and directly highlighted the technical stack that we are going to use to carry out the Search Function:
1. Search Bar or Chat Bubble – A search bar or Chat Bubble in the frontend interface which let the user to communicate with the system by inserting their search queries or questions.
2. Semantic Search Engine – A search bar which the user can simply input human daily language , even though the search query is not 100% correct or precise. The Semantic Search Engine can still output some similar result even the search query is not 100% precise.
  - For example, when i search “GA4 Instell Guide” , the Semantic Search Engine will output the “Google Analytics 4 installation and configuration Guide” which connected to our Enterprise Knowledge Graph even thought there is a typo in the word “install” and the “GA4” is the synonyms of “Google Analyitcs 4“. (Although it is a norm in your daily to use Google Search Engine in this manner, don’t take this as granted as you can hardly find this semantic search function the search engine other than Google Search Engine.)
  - Another concept is that people in different role will use different language to refer to the same concept. For example, in a website design project, 3 differents role will use different wording to discuss the font size of the homepage:
    
    Client : Please make the word in the title bigger
    
    Marketer : Do you want a Font-Size = 20pt?
    
    Programmer : I would prefer 1.1 em in order to cater both moble and desktop version.
3. LLM – An highly intelligent “brain” (e.g. ChatGPT) which can comprehend and understand both the “Needle” and “Haystack” and perform the search and return the related outcome.
4. LangChain – While LLM (e.g. ChatGPT) is good in understanding the text, in reality we need another AI model to comprehend Image (e.g. a AI model called Llama 3.2). LangChain is used to orchestrate the multimodel to make them work together.
5. RAG Solution – While renowned LLM (e.g. ChatGPT) is good at understanding all the public domain concepts and knowledge at the time the model was trained, they cannot understand the concepts which are under specific knowledge domain. For example, if your business invented a new product name “aldis lds” , the ChatGPT will not recognize it as a product of your company. RAG is to orchestrate the knowledge of the real world, as well as the specific knowledge you provided before it gives you a search outcome. Please understand that while the business operation or business data may change everyday, the cost of “fine-tune” the LLM every day is simply too time consuming.
6. GraphRAG – While the RAG which is based on Vector is good in similarity search, if you have zero fault tolerance and want a 100% precise search result which the underlying know-how knowledge domain which is probably only acquainted by yourself (e.g. as a Cloud Architect Consultant). You hope that you can provide some of your know-how in 100% precision to the system by manual input, and let LLM find out the result to the user based on your personal know-how and the understanding of the knowledge in the public domain. Graph Database perform better than Vector Database in terms of hallucination-free.
7. Adaptive GraphRAG – In old days we learnt from finding the answer. In the era of AI, we learnt by starting at forming a good question. While most of the time the user doesn’t even know what they are asking or searching for. For example, in an interior design consultation meeting, when the client expresses that they want a “japanese style bedroom”, before the designer gives them the answer , he/she will most probably ask the client “what is your budget” or “is it Ancient Japan or contemporary Japan” . You can realize that the question should be “adaptive” (i.e. keep optimising and adjusting) before the useful question is formed. An answer from an meaningless question is expected to be meaningless.
In conclusion, the GraphRAG SaaS Enterprise Knowledge Graph We Application is a solution backed by Enumeration, Indexing and Search functions which can help any individual or organisation to save time on Production , Searching , Error Handling and Communication cost.

Real World Scenario

In order to visualize the power of the GraphRAG Enterprise Knowledge Graph, allow me to demonstrate with a real-world day to day example in digital marketing world.

While Customer Service Chatbot is for sure one of the powerful aspects of saving time cost, I want to put the focus on another more important point which can be bought by the GraphRAG solution. Therefore I will keep the Customer Service level Chatbot description minimal.

Besides, i will also skip all the description regarding some automation in programmatic (i.e. not AI) level. For example, Email auto forwarding with hard coded conditional logic based on the Email Title via Email API whenever an new Email received.

Background

A Individual Client , John, who owned a WordPress website created by us (DDM Group). He received the Spam Contact Us Form Email daily, which he made him feel annoyed. As we (DDM Group) is John’ website adminstrator, he complained to us.

Trigger – Symptom

Although John received the email because of the Spam bot of the Contact Us Form Submission in his website, he does not realize the fact and his complaint email is as below:

Hi DDM,

My email keep receiving rubblish Email daily. Please help to fix.

In most of the time, the client or end-user , like John, can only use human daily language, instead of technical jargon , to describe the problem they faced.

And most important is that, the event which trigger the action (e.g. write a complaint email) is normally come from a Symptom which drive his emotion (e.g. Fear , Annoying , Despair). Most of the time this Symptom is not the cause of the problem and instead is the consequence by itself.

For easy communication, we named this “Trigger” as Symptom.

Symptom = Rubblish Email

Party Involved = Client

Problem Pattern

By asking John to submit the Rubblish Email to the Google Drive, the Technical Support analyzed that the Email is infact triggered by the Contact Us Form Submission in the existing Website.

And therefore, the Technical Support classifed it as a “Spam Form Submission” Problem Pattern which was already reported by different client many times and therefore is named and indexed as “Spam Form Submission“ Problem Pattern in our system

Problem Pattern = Spam Form Submission

Party Involved = Technical Support

Solution and Environment

As stated before, the Spam Form Submission is a well-knowned Problem Pattern , as an experienced website administrator, DDM Group had already indexed and mapped different kinds of Solution for different scenario with the same Problem Pattern.

Following factors affecting the choice of the Solutions:
1. Client Contract Amount
  - VIP Client
  - Standard Client
2. SMTP Sending Server
  - GMAIL API
  - SMTP Sending Server provided by Client itself
3. Web Server
  - Proxy Server – Cloudflare
In order to identify the Client Contract Amount , Web Server as well as the SMTP Sending Server specific to John’s case, the Technical Support and the Customer Service Officer have to access the CRM , as well as Website Development Production Database to lookup the Client Contract Amount , applied SMTP Sending Server and Web Server.

After lookup, we figureed out that John is a VIP Client and using GMAIL API and Cloudflare.

The Technical Support, based on his years of experienced and know-how in cyber security knowledge domain , realized that the main reason of the Submission Contact Us Form being spammed is due to the fact that some kinds of Spam Form Bot constantly crawled John’s website and realized the website is using some popular open-source plugin which triggered vulnerability exposure, leading the Spam Form Bot can easily fill in the form in John’s website automatically. In order to stop being spammed , the best way which the Technical Support can think of is to let the Spam Form Bot cannot even reach John’s website. And hence the Solution of Server Level WAF (Firewall) installation is chosen due to the fact that the Cloudflare Proxy Server supports WAF Firewall.

Environment = VIP Client , GMAIL API, Cloudflare

Party Involved = CRM Manager + Techncial Support

Solution = Server Level Firewall (WAF)

Party Involved = Technial Support

Deliverable (SKU) and SKU Feature

Once the Solution is confirmed, the case is passed to the Account Manager (i.e. Salesperson) to follow up and explain to the client.

While the Technical Support does not quite familar with the SKU Name in the SKU Library, he suggested to the Account Manager to visit the SKU Library in DDM Group and search for search term “Cloudflare WAF”

The SKU Library come out with the following SKU#

SKU Name SKU#
Cloudflare WAF – Standard 5232323
Cloudflare WAF – Premium 5232345
Wordfence (WordPress Firewall) 8475623

Due to the fact that the SKU name by itself cannot faciliate the decision on which SKU to be chosen to solve the problem, the Technical Team further dives into the SKU Feature of the 2 Cloudflare related SKU and realized that only Cloudflare WAF – Premium (#5232345) supports the Legitimate Bot whitelisting feature.

SKU = Cloudflare WAF – Premium

SKU Feature = Legitimate Bot Whitelisting

Party Involved = Account Manager

Target Audience Properties and Sales Trigger

As an seasoned and proactive Account Manager, he realized that it is a good opportunity to upsell another SKU to John due to the fact that the Spam Submission Form alerted him for the cyber security concern.

The Account Manager googled the knowledge and figured out that Login Attempt Attack (i.e. a Problem Pattern) is another common vulnerability which is suffered by lots of eshop like what John is running.

The Account Manager , based on his experience, believed that Fear is a good sales trigger to have the intention purchase. In this sense, he told the potential risk of being login attempt attack by the malicious bot and suggested John to install another 2FA plugin which can effectively protect the unauthorized login.

As the Account Manager that John have no idea on what a Login Attempt Attack is , he visualized the problem pattern by showing the visiting report which logged thousands of visits of the login page of John’s eshop with an hour.

John felt worry about it and took the advice and purchased the SKU of 2FA Plugin Installation for WordPress, while the Account Manager successfully upsell a SKU related to John case.

Target Audience Property = Eshop owner

Sales Trigger = Fear

SKU = 2FA Plugin Installation for WordPress

Problem Pattern = Login Attempt Attack

Sympton = Thousand of visits in Login Page in an hour

Party Involved = Account Manager + Client

Plugin for Production

Once John signs the Sales Contract, the Sales Contract with the involved SKUs is passed to the Production Manager.

While the Sales Contract enumerated the SKU name , it does not limit which plugin to use in order to deliver the SKU.

Having checked with the Plugin Library regarding the error and bug reports for each plugin, the Production Manager decided to use the plugin WordFence for the 2FA related SKU and Cloudflare for the Server WAF related SKU.

SKU = Cloudflare WAF – Premium

Plugin = WordFence

SKU = 2FA Plugin Installation for WordPress

Plugin = Cloudflare

Putting Everything Together

Bipartitle Problem Pattern

When you put everything together, you may realize in fact you are doing the following steps:
1. Enumerating and indexing all the factors (i.e. Column Name) observed from the real world , AND
2. Enumerating and indexing all the option values for each factor
  - e.g. Fear / Hope / Greed in Sales Trigger Column
3. Map the Option Value among each 2 columns under a Bipartite Data Pattern.
4. Searching based on the need of different roles
  - e.g. As a Client or Account Manager role, search by inserting a value in any of the column (e.g. 1,000 login page visited in an hour in Symptom Column)
5. Output the result in different columns based on different role.
  - As a Production Manager, refer to the SKU and suggested the related Plugin Name.
In the real business world, there are thousands of factors (i.e. columns) that can be addressed, with each factors may have thousands of option value involved (e.g. a Sales Order Records), forming a infinity number of nodes and edges of a Graph, which can be only comprehensively memorized and handed by machine.

Conclusion

I hope you can understand the problem pattern involved in the real world and realise that the learning activity of a human being is in fact based on enumerating , indexing , mapping and searching.

By applying the GraphRAG Enterprise Knowledge Graph SaaS Web App (i.e. bGraph), it can automate and speed up the learning of a human being based on following open-source technical stack:
1. Retrieve the Data via
  - Web Scrapper for Public Domain Knowledge
  - APIs for Public Domain Knowledge (e.g. Weather Condition)
  - APIs For Private Domain Knowledge (e.g. in-house CRM)
  - Domain Specific Know-how provided manually by in-house expert
  - Meta Data (e.g. Data Schema , Business Logic) in different data silo (e.g. CRM or POS) from Data Lake
2. Text Embedding the retrieved Data into Vector Database
3. Entity Extraction from the retrieved Data and store in Graph Database
4. LangChain to orchestrate multi AI models (e.g. Image / Text / Voice)
5. LLMs (e.g. ChatGPT) to comprehend the content stored in Vector or Graph Database
6. Adaptive RAG Function to
  - Retrieve the Search Query from the Client
  - Refine their query to useful question
  - Retrieved the data from Step#1
  - Comprehend the Data
7. Semantic Search Engine (Search Bar or Chat Bubble) to allow user to insert their Search Query in
  - Plain English
  - Not precise wording
8. Visualize the Output via Graph Database and Graph Data Science Library to solve the following problems:
  - Shortest Click Path between 2 concepts
  - Centrality
  - Community Detection
9. Output the result via a Chat Bubble by Streamlit.
March 1, 2025

Dimension (x-D)	Example of Unit	Example in Reality
0-D	N/A	a point (or a Spot)
1-D	cm	a Line
2-D	cm²	a Plane (or an Area)
3-D	cm³	a Volume
4-D	Hour	Time
5-D	??	??

Dimension	Co-ordinate of Data Point Example
0-D	N/A
1-D	{1}
2-D	{1,2}
3-D	{1,2,8}
4-D	{1,2,8,10}

Name
Barbie

First Name	Last Name
Barbie	Stereotypical
Barbie	Weird

First Name	Last Name	Hair Style
Barbie	Stereotypical	Floating
Barbie	Weird	Quirky

Sunglasses#	Sunglasses Style	Sunglasses Frame Color
S01	Cat-Eye Frame	Orange
S02	Aviator	Deep Blue
S03	Sporty	Silver

Weekday	Sunglasses#	Sunglasses Style	Sunglasses Frame Color	Citizen
Monday	S01	Cat-Eye Frame	Orange	Stereotypical
Monday	S03	Sporty	Silver	Weird
Tuesday	S02	Aviator	Deep Blue	Weird
Tuesday	S03	Sporty	Silver	Stereotypical

Weekday
Monday
Tuesday
Wednesday
Thursday
Friday

Weekday	Sunglasses#	Sunglasses Style	Sunglasses Frame Color	Citizen
Monday	S01	Cat-Eye Frame	Orange	Stereotypical
Monday	S03	Sporty	Silver	Weird
Tuesday	S02	Aviator	Deep Blue	Weird
Tuesday	S03	Sporty	Silver	Stereotypical

First Name	Last Name	Hair Style
Barbie	Stereotypical	Floating
Barbie	Weird	Quirky

SKU Name	SKU#
Cloudflare WAF – Standard	5232323
Cloudflare WAF – Premium	5232345
Wordfence (WordPress Firewall)	8475623

bGraph Architecture – Model Data

Introduction

The objective of this article is to provide a blueprint which demonstrates and enumerates all the technical stacks used to build the bGraph.

Although using a Graph Database is a perfect tool to illustrate this kind of blueprint, ironically, we cannot use the Graph Database to demonstrate how to build a Graph Database because the Graph Database is not yet built.

Definition

bGraph

bGraph is a DDM terminology , which is assigned as the name of the Enterprise Knowledge Graph (EKG) built on top of Graph Database. You can regard the bGraph as a Knowledge Management System in DDM Group which consolidates all types of data , including Business Data, Meta Data and Model Data, into one place forming a supreme intelligence to answer any questions raised by either Clients or Staffs.

Architecture

Architecture of bGraph refers to all the technical stacks used to build the bGraph, as well as specific tools that we adapted for building the bGraph. You can regard it as a blueprint of the bGraph

Model Data

While there are many components which can be found in the bGraph Architecture, this article is focused on the component of Model Data. The best way to understand Model Data is to compare the Model Data with Meta Data and Business Data.

In the Database world, no matter what business it is running , Data can be classified as 3 categories :

Business Data

The data which reflect the business activities. For example, in an eshop a Watch is sold out to the Client named “Tony” with the price of USD$42, the “Tony” and “USD$42” will be regarded as the Business Data.

In an Excel File, you can regard the Column Name as the Model Data , while each record under the same column as Business Data. For example, If you have an Product Price List an Excel File as below:

Product Type	Product Price (USD)
Watch	42
Shoe	30

Figure 1 – Product Price List

The Column Names Product Type and Product Price (USD) are regarded as Model Data, while the records [Watch,42] and [Shoe, 30] are regarded as Business Data.

Meta Data

It also means “Data of Data” , which the function of Meta Data is to describe the Model Data. With the same example, as an eshop webmaster, before you can sell the product in the eshop, you should have input the price of the product Watch in the Price Field in the backend of the Eshop. Instead of a Text String (i.e. US Dollar Forty-Two) in Data Type, you will expect the Price is filled in Number Format (i.e.42). In this case, the Numer (instead of Text String) is the MetaData which describes the Data Type of the Price field.

Model Data

In a relational table (e.g. a Sheet in Excel File), you can regard the Column Name (or a Field Name in an Form) itself as the Model Data , while each record under the same column as Business Data. For example, in Figure 1 – Product Price List in previous paragraph, The Column Names Product Type and Product Price (USD) is regarded as Model Data, while the record (i.e. the value of the cell) of [Watch,42] and [Shoe,30] are the Business Data.

It is imperative for us to differentiate the 3 categories of data due to the fact that different types of data are intertwined in our communication during the bGraph development cycle.

What Problem Patterns the Model Data Solves

Model Data can narrow the discrepancy between the Reality and Model in following aspects:

Avoid Duplicated description in the Data Model

It is very common to find both the CRM and an Accounting System in any scale of the companies, which means if you want to insert a new record of the First Name and Last Name of a Client, most likely that you have to record it twice in both the CRM and Accounting System.

While in reality the Client shows only once, in the Model it has shown twice in both CRM and Accounting System even though the 2 records in different systems are in fact referring to the same Client, meaning that the Model Data – First Name and Last Name of the Client, are duplicated.

This discrepancy between the Reality and Model lessens the fidelity of the Model.

Model Data is here to kill the discrepancy through duplication.

As an Simple View of Truth Under the DevOps business environment

In a traditional system development cycle, the Reality is being observed once in a particular time spot (most likely in a brain-storm sales meeting ) and this observation will be transformed to an Model, which most likely is presented by an Entity-Relationship Diagram, by the System Analyst.

However, soon or later this System Analyst realized that it is not the case. While in reality the business environment is ever changing, the observation to the Reality, as well as to Modeling the observation, become streaming tasks instead of batch tasks, meaning the observation and modeling tasks should be done continuously in agility, instead of only did once in the very beginning of the system development meeting. We called this concept as DevOps.

Let’s illustrate the example by the Table in below:

Time Period	System Name	Properties (i.e. Field)
Year 1	CRM (built in house)	Client.FirstName Client.LastName Client.Birthday (DDMMYYYY)
Year 2	Accounting System (3rd party Saas)	Client.GivenName Client.FamilyName
Year 3	Eshop (built in house)	Client.FirstName Client.FamilyName Client.Birthday (YYYYMMDD)

Figure 2 – All System Development Timeline

In the infancy status (i.e. Year 1) of a startup company which you are working for, it makes sense to prioritise building a CRM system instead of an Accounting System in order to generate leads and sales revenue before complying with legal bookkeeping and auditing requirements. In a CRM system, your colleague Anna, as a system analyst, can easily observe from the reality that there should be the properties First Name and Last Name attached to a Client. The system analyst (Anna) therefore put the First Name and Last Name in our Model in below:

First Name

Last Name

Client Form (and Table) in the CRM

It works perfectly until in Year 2, after there were quite a lot of sales orders made in Year 1, it is inevitable for the business to have an Accounting System to cater both the bookkeeping and invoicing tasks.

Due to the fact that the System Analyst , Anna, who built the CRM system in Year 1 had already quit, instead of inventing its own wheel, your boss in Year 2 decided to subscribe to a SaaS of a canned Accounting System, which has comprehensive functions catering all the bookkeeping and invoicing needs of the company.

Everything works fine until a fresh grad junior Sales Executive , Ann, who is instructed by you to find out a Client with ID# 302392 in CRM in the historical sales invoice report in the Accounting System. After Ann checked out the CRM through the ID# 302392 and the system showed the First Name and Last Name of the client as Joan Lee. Ann tried to put the First Name and Last Name Joan Lee to the Accounting System to generate the sales order report.

Unfortunately , after 60 minutes of effort, Ann failed to find out the fields First Name and Last Name to filter the sales order in the Accounting System , she requested help from her supervisor, which is you.

After you listened to the question raised by Ann, you are astonished that she did not even realize First Name is a synonym of Given Name and Last Name is a synonym of Family Name. (Please refer to Figure 2 – All System Development Timeline in previous paragraph)

Although you are frustrated , you still gently explain the truth to Ann, which took you another 15 minutes.

Therefore, all in all the company had spent 75 minutes on simply communication and education, which these communication and education costs will not be the last time to be incurred due to the fact that the new fresh grad employee which the company is going to hire in the future is very likely to encounter the same misconception.

Therefore, a centralized library which explain the relationships between any properties of any systems throughout the company is on demand.

The story did not end. With great difficulties the company still survived in Year 3 and would like to expand the business by running an Eshop online for overseas markets.

Your company hired another System Analyst , Joanna, to build the eshop which she had completed the project at lightning speed. After 100 new client registrations in the Eshop, when you want to import these 100 client registrations from the Eshop to the existing CRM, you finally realized that the field Birthday of the CRM is in format (i.e. the Meta Data) of DDMMYYYY , while in the Eshop the format is in YYYYMMDD.

Due to the fact that you realized all the Date related fields throughout the systems of your company are in YYYYMMDD, which is contradict with DDMMYYYY format usually used in your country, you have no choice but to request the newly hired System Analyst Joanna to spend another 1 months (i.e. 22 Working Days!) to turn all the Date related Fields in the Eshop from YYYYMMDD to DDMMYYYY in format.

By studying the example above, we realized that a centralized library (i.e. the repository) which stored all the Model Data (and its associated Metadata) will definitely help the System Analyst to avoid all the mistakes mentioned above by checking all the existing properties (i.e. the Model Data) of the existing systems in advance before the System Analyst started building any new system.

Fill in the gap between planning and execution

Following steps and role are played during the system development cycle:

Time	Procedure	Role
Month 1	Reality Observation	Salesperson End User Business Analyst Business Owner
Month 2	Modeling	Business Analyst System Analyst
Month 3	System Building Execution	System Analyst

Timeline and Role during the System Development Cycle

Consider the following scenario and timeline

Reality Observation

In Month 1, End User feedback to the Salesperson that the CRM lacks of the Salutation field in the Client Form.
Salesperson passes this information to the Business Owner.

Modeling

The Business Owner instructs the Business Analyst to see if the request is valid.
In Month 2, The Business Analyst updated the System Blueprint (i.e. the Entity-Relationship Model) and passes the newly updated version to the System Analyst

System Building Exeuction

In Month 3, the System Analyst studied the affected radius in the entire system and if which Forms and Reports will be affected. For example, the New Client Form , as well as the Lead Report may be needed to add the Salutation Field into them.

If you are detailed-mind enough, you may realize that , as a layman without any system analysis training background, the End User , Salesperson and Business Owner cannot technically and precisely turn their comment via Entity-Relationship Diagram friendly syntax to communicate with the Business Analyst.

Imagine what if all these 4 different parties involved in the communication chains are communicating by different language and wordings, how the fidelity of the reality has been deteriorated , and how much time is wasted on the redundant communication edge. (i.e. A to B , B to C , and C to D)

This means if the comment feedback from End User is not at the same time as the System Analyst to do the coding workload for updating the system , then this piece of comment from End User (i.e. comment on adding a new Salutation Field) should be recorded in somewhere and easily be found by the System Analyst in their system update request job queue.

The Model Data can act as a communication protocol during the whole system development cycle.

Avoid Duplicated Workload between Template and Instance

Model Data in Relational and Graph Database

Model Data in Traditional Relational Database

In a traditional way of building a CRM system, the Object Client may probably be described in the Entity-Relationship Diagrams in different systems as below:

Column Name
First Name
Last Name
Salutation
Email Address

Client Table in ER Diagram of CRM System

When time goes by, another eDM System is introduced into the company with another Client Object inside the system as below:

Column Name
Given Name
Family Name
Salutation
Gender
Email

Client Table in ER Diagram of eDM System

Due to the fact that there are 2 separate systems, you cannot link up the 2 Client Tables of 2 systems in 1 Entity-Relationship Diagram. In fact, you have to draw 2 Entity-Relationship Diagrams, one system per Entity-Relationship Diagram.

This practice makes us can never realizes that in fact these 2 Client Objects in 2 separated systems are actually referring to same concept (i.e. Client) in the reality

Besides, if you find the field Gender is a valid and useful information in eDM, the system analyst may not address that this Gender field should also be added into the Client Table in the CRM system.

Model Data in Graph Database

On the contrary , if we demonstrate the Model Data via Graph created by Graph Database, we can enjoy the following benefit:

Concept Client from CRM and eDM systems can be consolidated and presented in 1 Node. This consolidation can easily be figured out by the Node Labels (i.e. the wording eDM and CRM outside the Node circle)
One can easily find out the relationship of properties from 2 separate systems. For example, the graph explicitly shows that the Given Name in eDM is in fact identical to the First Name in CRM by reading the Relationship Type (i.e. IDENTICAL_TO)
While the property Gender is a useful information which should define (and have defined) in eDM, it makes sense to infer that this Gender property should also be defined in CRM. By using Graph Data Science AI tools, this kind of insight (we called it Label Prediction) can easily be achieved. While the AI algo is out of the scope of this article, we will discuss it somewhere in the future.
In the future, as the business grows, you will realize clients can be classified as Individual Client and Organisation Client, which only Individual Client has the properties of First Name and Last Name (so on and so forth). You can easily modify the relationships among nodes by writing a simple GraphQL Statement.

Limitation of Model Data via Graph Database

Although modeling data with a Graph Database provides greater fidelity, the Graph Database in itself is not good for data input.

While in our daily life most of the Input Form and Report are linked to underlying Tables in a Relational Database, it is hard for us to build the Input Form and Report directly on top of a Graph Database. Although it is technically feasible to do so, due to the compatibility with other existing systems, as well as the human user behavior in both inputting and consuming data, the technical stacks will strike a balance between user experience and Model Fidelity.

In this sense, we decide to keep the Relational Database as an “Abstract Layer” between the Frontend Application (during Input) by both human / machine (i.e. API) users and the Graph Database.

The toughest trade-off of this method is that we need to periodically synchronise , either in mutual (2-way) or in 1-way , the data between the Relational Database and Graph Database, although it’s still manageable .

Technical Stacks of the Model Data

Node Table

Create a Table name “bNode” in Relational Database to store (as records in a Table) all the Nodes.

This Node Table can also serve as a LookUp Data List as if traditional Relational Database does. For example, as the Option List of Gender {Male | Female | Trangender | Unisex} will always be the same no matter under which system or knowledge domain, there is no need to define the Option List of the property Gender for each system (or each instance of a system type). This Option List of property Gender can be looked up by different systems via their Gender Field, this is nothing about Graph Database which already serves the purpose perfectly.

Relationship Table

Create a Table name bRelationship in Relational Database to store (as records in a Table) all the relationships among the Node.

Example Record in a Relationship Table:

Source Node	Relationship Type	Target Node
Client	HAS_ONE	First Name
Client	HAS_MANY	Email Address
First Name	IDENTICAL_TO	Given Name

Importing both Node and Relationship Tables in the Graph Database, whereas:
- Use the feature of Graph Database – Node Label , to classify which Knowledge Domain (a.ka. Namespace or Context) the Node is under. In the CRM example, the Node First Name is under the Knowledge Domain (i.e. Namespace) of CRM, and therefore you can find the Label outside the Node First Name in Figure 3 – Data Model By Graph.
While there are many ways to import the data from a Relational Table into the Graph Database, we will apply the following ways under different scenario:
- By Cyber – Lollipop Model
- By Data Importer GUI – Sunflower Model
Synchronize (either 1-way or 2-way) between Relational Database and Graph Database under CRUD events.
Allow Read/Write with different views (a.k.a Prospective) by End Users via GUI in Graph Database by coding (e.g. Cypher)
Allow the User to Read/Write with different view (a.k.a Prospective) by End Users via GUI in Graph Database semantically by plain English. (or other Language) via Semantic Search Engine.¹

Footnotes

This will be performed by LLM Algo which is out of the scope of this article. ↩︎

January 25, 2025

bGraph for Business Process Management

Introduction

While in article Build a Business Process Management System – Stage of System Building we have defined that the 1st stage of building a system is Modeling, in article Build a Business Process Management System – BFs-WAITER Pivot Table we have further named the content or directions that we should included in the Modeling stage as BFs-WAITER.

No matter how comprehensive the model can reflect the intricacies of the real world, we should have a tool to effectively transform the model to an executable system with a human interface, namely bGraph in Diamond Digital Marketing Group. Before we dive into the functionality of the bGraph, in order to sharpen the effectiveness of the it, it is always a good practice to enumerate the problem patterns that we encountered when using traditional tools.

Problem Patterns in Modeling Stage of BPM Building

Polymorphism in communication

In the very beginning of a Modeling Stage, the Business Analyst (or Consultant, whatever you name it) will conduct an interview to the stakeholders of the target company in order to collect the information relating to the target business process. Any kind of documentation collection, verbal description, or even on front line field observation is carried out by the Business Analyst to be familiar with the target business process.

After the Business Analyst finished the interview, he/she should spend time on organising the data into information and pass it to the System Analyst (and his/her programmer team) and bring the BPM System Building stage to Stage 2 (Standardization).

However, different target businesses, different Business Analysts or different clients, will always use different wording or language to describe the same concept. For example, while the client will refer to the product they are selling as Product, Business Analyst will name the Product as SKU. Another example is that the wording Last Name is a synonym of Surname, which can be used interchangeably.

On the contrary, Business Analyst and Client will use the same wording to refer to different business concepts. One of a typical example is the term “Client“. In a manufacturing industrial chain, no matter if you are the Manufacturer, Distributor or the Retailer ,you will always name your downstream as “Client”. During a BPM System Interview, a Business Analyst needs to pay double attention to figure out who (Manufacturer, Distributor or Retailer) the term “Client” is referred to. As a professional Business Analyst, we will name them as the Brand, the Merchant , the Retailer and the End User respectively in order to uniquely identify them.

This polymorphism in communication not only occurs between the Client and Business Analyst, but also the Business Analyst and the Programmers. The more different wordings are used , the more resistance will be derived during the communication.

Therefore , a communication protocol which can synchronise the wording is necessary.

Duplicated Analysis Wordload with Different Clients

No matter which industry , country or business model the client is in, a CRM system will always share some common properties and features.

For example, the client will expect a standard CRM to have a Contact module which at least has First Name and Last Name as the properties of the object Contact.

As a Business Analyst , in a BPM System Interview, you may not want to waste both you and your client’s time to go through what common properties a CRM System should have, which those common properties may properly went through many times in the previous similar project.

On top of it, it is a must for a CRM to have a Country field for the users to fill in the nationality of the client. As a Business Analyst, you may not want to go through the comprehensive list of countries again and again in different projects.

In this sense, it will be a great time saver if we can have a CRM System Building Template which comprises all the common properties of a standard CRM.

No Trigger On Searching Similar Functions in Previous Project.

Even though you (as a Business Analyst) are driven by public-spiritedness that you already encapsulated a comprehensive Country list as an array for next project use, how can the other Business Analyst , or even the future yourself, remember or realize that you have already created the Country List before?

Even worse, the concept Countrycan and will occur not only in CRM , but also almost any kinds of system like Project Management System, Eshop or Booking system. What will make (i.e. trigger) the programmer who is going to build a Booking system think that he can refer to the previously built CRM system to find out the Country List? If he/she cannot realize that a Country list already existed in some other project blueprint , he/she may probably will spend time to do it again, what duplicated the cost of development.

If the next Business Analyst does not realize that you have already done this before, he will not search for the Country List. There is always a gap between searching for the solution and the solution itself.

Unique Wordload with Different Client

Although there are many properties in common in a CRM system, there are different properties too. For example, while a trading company may expect a Contact is defined as a Company or Organisation which should have Company Name field, a Beauty salon may expect all their Contact is an individual which should have First Name and Last Name field.

It is necessary for us (Diamond Digital Marketing Group) to have a system which stores all the common and differences of building different systems for different clients.

Streaming (vs Batch) BPM Building Process

To continue the example of CRM system building, as a Business Analyst, even though you have carefully listened to your client and clearly defined the common and different fields of the target CRM system after the 1st interview, it is very unlikely that you can hit a home run to gather 100% of the expected features and properties of the target CRM system in the 1st interview. While a system building is a lengthy project which always lasts for months or even years, the business environment is probably changed from time to time during the target CRM system building period, which will also affect the features and properties of the target CRM system.

Imagine a scenario as below:

In Day 1 the Business Analyst suggested the field First Name and Last Name to be included in the Contact Module of the target CRM system. In the very next Day 2 , the programmers have already kicked off the program coding workload and created a Table in the Database , as well as the First Name and Last Name Field in the user interface.

However, in Day 3 due to a new Marketing Manager (a “she”) on board from the client side, she perceived that the fields Maiden Name and Middle Name are common sense and should also be added into the Contact Module. While she passed this request to our Business Analyst, and then our Business Analyst passed this request to the programmers on Day 5 by directly appending 2 New Columns Maiden Name and Middle Name in the CRM system Building Blueprint Spreadsheet.

This behavior will make the programmers confused because (if you have paid attention to our story) the programmers had already completed the program coding workload in Day 2, how can they realize that 2 new columns are appended to the CRM system blueprint spreadsheet which they had just brought to coding?

Even though you may suggest that the Business Analyst can notice the programmers after they had done any adjustment in the blueprint spreadsheet, due to the fact that the specification of the blueprint is in fact under a streaming status which can be and will be changed from time to time, it will be impossible for the programmers to build the system based on a ever changing blueprint. Do you expect the programmers to click if there is any modification in the blueprint spreadsheet every 1 hour?

In this sense, a streaming oriented system blueprint is necessary for the communication between the Business Analyst and the Programmers, instead of a traditional system building blueprint which only reflects an instantaneous time spot.

This streaming oriented communication mechanism not only satisfies the modification need during the development status, under a DevOps concept, but also in the future after the system is brought to production due to the fact that the system is a living organism which is dynamic to the ever changing business environment. The traditional Batch (or Versioning) oriented can not satisfy in this sense.

Mobility of the System Building

As an experienced Business Analyst , you can imagine that no matter how you ask your client to submit an expected new field or new feature of a system via a submission form, the client may probably not follow your instruction and simply send that expected new field to you via email or even WhatsApp.

After you receive the request from the client, instead of only simply forwarding the request to the programmer to handle, as a responsible and professional Business Analyst , it is our duty to validate whether or not the new request is a valid request (most of the time the request is invalid).

For example, if the Client complains in the Contact module of a CRM that the field Sex is missing in the Form in the user interface, you should first of all go to the project blueprint to check whether the field Sex should be included in the blueprint. If the Sex field can be found in the blueprint but not in the Form in the user interface, then you should contact the programmer to fix it up. But in the real world, most of the time after you conduct the checking, you will realize that the field Sex is in fact named as Gender in the Form in the user interface of the Contact module.

Can you imagine this kind of back and forth checking and non productive communication is the main cause of eroding the time on production.

Think about if you are handling 10 BPM system building projects on hand , how can you quickly open a system (if there is any!) in your mobile device to check whether the complaint from one client is valid or not? If the complaint is valid , how can you quickly send an instruction to the programmer to fix the bug, provided that you are not seated in front of the desktop but instead on the way travelling to the next client meeting?

If you find the complaint is valid and is a critical path of the project which if you don’t fix up the bug immediately the error will cascade to the next node of the critical path of the project which in turn derives an irreversible catastro, you cannot afford to notify the programmer after you finished the meeting.

A powerful steaming BPM building system is necessary for catering all the mobility needs of the communication.

December 6, 2024
What is Metadata Repository
Definition

Metadata Repository is a centralized database which stored all the metadata (i.e schema) of all other system or database within an organisation. You can regard it as the library of metadata of metadata in an organisation

What Problem Pattern the Metadata Repository Solves

Duplicated Schema definition in different system

Imagine a scenario when an organisation has a CRM and eDM system which both have system properties First Name and Last Name , Middle Name and Gender . First of all, there will be a duplication time on defining the schema of the system properties (e.g. DataType , Validation Rules, Conditional Logic etc.). Besides, as the option values for Gender will also be the same (Male , Female, Unisex , Transgender), it will be spent duplicated time if we create the single dropdown list in the Gender field in both 2 systems.

If we can just define the concept (e.g. Gender) once and utitize as much as we want throught all the systems inside an organsiation, it will save lots of time.

Schema Definiton

But the point is , what if these 2 different systems are built by 2 different teams? How can they figure out the concept Gender is already defined by other teams in another system?

The Metadata Repository is acting as a metadata map to provide a bird view for the system developers and identify how to leverage the effort from other systems when they are building on their own.

Synonym and Word Variations in Schema Defintion

To reflect the intricacies of the real world, think about what if in the field Last Name in the CRM system is naming as Surnamein the eDM system, while in fact they are referring to the exactly same concept?

Duplicated records in different system.

To extend the example above, there will be different scenarios to record a new person in the CRM and eDM system, which means the maintenance cost will be increased as we are managing 2 sets of dataset which are overlapped and some are different.

How can we know that the person Elon Reeve Musk in the CRM is the same person as Elon Musk shown in the eDM due to the fact that the Middle Name is recorded in the CRM while there is no Middle Name field in the eDM system?

By using a Metadata Repository, the Relationships between these “Field” (i.e. Node) will be addressed and defined.

Functions of a Metadata Repository

As a Schema Template

In practical, it is the duty of the System Analyst to interview the client and address the business objects and its associated properties and pass them to the programmar to build the system.

For example, when building a lite version of a CRM , the System Analyst is expected to pass the following Schema Rules to the programmar:
1. First Name
  - Datatype = String
    
    UI = Single Text Field
  - Max Length = 50 Characters
  - Allow Duplicate = [Yes / No]
    
    UI = Toggle
  - Placeholder Text = “Insert First Name”
  - Default Value = NIL
  - Index Key = [Yes / No]
Imagine you are working for a company which need to builds tens of different of Systems for hundreds of different of clients, each system may comprise hundreds of Business Objects, and each Business Object may have tens of Properties, and each of the properties has tens of Schema Rules, not to mention that the systems of each different client might have a slightly different versions of Properties and Schema Rules.

How can you , as a System Analyst, to pass the Schema Rules to the Programmar for coding will be time consuming process and big challenge.

A Metadata Repository is expected to store the metadata of all these different systems in 1 place as a template for future use, as well as speed up the communication among different parties

Metadata Harvesting

On the contrary, what if the client already had a in-house self-hosted and CRM up and running , and would like to build a eDM by leveraging the existing properties of the CRM?

Opposite to the function “As a Schema Template”, Metadata Harvesting is referring to collecting metadata from the existing system and turning them into the Schema Tempate.

Identify the Relationships among Properties of different Business Objects

After we have collected all the metadata from different data source, we have to find out the relationships among :
1. Business Object (BO) to Business Object
  - e.g. : 1 Client (BO) HAD_PURCHASED Many Product (BO)
2. Business Object (BO) to Business Object Properties (BOP)
  - e.g. : 1 Client (BO) HAS 1 Birthday Date (BOP)
3. Properties (BOP) to Properties (BOP)
  - e.g. : Last Name (BOP) in CRM IS_SAME as Surname (BOP) in eDM
Tools of Building Metadata

In order to speed up the development cycle of system building, following technology stacks are used to fulfill the objectives.
1. Graph Database – to record the relationships among Business Objects (BO) and its Properties (BOP)
2. API – to fetch the schema defintions from different systems.
3. Graph Data Science (Python) Library – To evaluate or predict the relationships among Business Objects (BO) and its Properties (BOP)
4. Graph Visualizer – to visualize the result in different prospective for different users.
5. Semantic Search Bar – to semantically input the search terms in in the Graph Visualizer in order to cater any search problem including : infix search / prefix search / synonym / typo / partial search and etc.
Conclusion

Metadata Repository and Graph Database will be the game changer in any industry as it can scale up and speed up the Business Management System development cycle significantly.
November 28, 2024

Marketing Nerual Networking Model

Definition

A Neural Network Model, also known as an artificial neural network (ANN), is a type of machine learning model inspired by the structure and function of the human brain.

While this model is applied in the Marketing domain, it becomes the Marketing Neural Networking Model.

Instead of diving into the intricacy of the mathematical formula and operation, we instead will put the spotlight on the semantic logic behind the calculation.

What Problem Pattern the Marketing Nerual Networking Model Solves

Formulate Markeitng Strategy via A.I.

In a nutshell, while Marketing Consultant is mainly to providing Marketing Strategy, a Marketing Strategy is simply making a series of decisions on how to choose among alternatives. For example, if you want to sell a Tattoo Printer to teenages, will you use Facebook or Instagram to promote your product?

To choose between “Facebook” or “Instagram” (i.e. 2 alternatives) is called Marketing Strategy. For sure, in reality, it always takes more than 1 factor (or attribute) to make a decision, and takes more than 1 decisions to formuate a strategy . You can imagine it’s in fact a dynamic decision chain in which the outcome of 1 decision will affect not only the outcome, but also even the option values (i.e. all alternatives) of the decision.

The Marketing Neural Networking Model is purposed to learn and solve how to make decisions in a scientific way.

Only after we turn the decision making process in a scientific way can we automate the decision making process via A.I. by applying the Marketing Neural Networking Model, which in turn creates an A.I. Marketing Consultant.

How Marketing Neural Networking Model look likes

Marketing Neural Networking Model Diamond Digital Marketing International

Although the intricacy of the Neural Networking Model is a bit scary, decoupling it in piecemeal and demonstrating with a story, will definitely aid you to comprehend the concept more efficiently. Bear in mind that it is obviously a simplified example which in reality will be 1000 times in scale.

Before starting the story, allow us to provide you the legend of the Figure (Marketing Neural Networking Model) above:

Rectangle ( ▭ ) : The Attribute (or Property, or Layer) of the Object, which the Object is the Marketing Neural Network Model.

Circle (○) : Nodes (i.e. any Business Concepts)

Sold Line ( ⎯⎯ ) : Positive Edges which has directionaly relationship between 2 Nodes

Dot Line (···) : Negative Edges which has NO directional relationship between 2 Nodes

Imagine you are the CEO of a conglomerate which at the same time run a Fashion Retail Store as well as a Diamond Wholesaler business. You are required by your shareholders to incrementally increase the ROI of the conglomerate by 10X, which is quite an impossible mission. In order to achieve this goal, you start by enumerating all the “Concepts” (i.e. the Node) in your mind which related to the business as below:

Fashion Retail Store
Diamond Wholesaler
Website
Google Merchant Center
Linkedin Business Page
Ads
Payment Gateway
Feed
Enquiry Form

In reality, the process of addressing , enumerating and filtering all the Concepts (i.e. the Nodes) relating to the business is almost an impossible task for human beings. The more knowledge Nodes the marketer acquired and manipulated, the more professional he is.

Back to our story, immedate after you enumerated all the Nodes in your mind which you think are related to your business, you addressed some pattern that there are some patterns within these Nodes:

Causal Relationship

Having played around with the interface of the Google Merchant Center for a day, you realized that Google Merchant Center is mainly designed for listing the products in the storefront of Google Shopping Tab in retail price, and therefore the Google Merchant Center is better to apply in any retail instead of wholesale business because there is no any field for the Google Merchant Center to insert any tiered pricing or bulk discount in the storefront. In this sense, you addressed that what Digital Assets (Attribute) you are uisng will be dependent to the Business Model (Attribute 1). Therefore you deduce your own business rule (which is called business intelligent in the business world) as below:

Business Rules 1 : Digital Assets is dependent to the Business Model

By applying Business Rules 1 in your business, you decide to adapt the Google Merchant Center into your Fashion Retail Store (Edge 2) and meanwhile NOT adapt in your Diamond Wholesaler business (Edge 5)

Correlation Coefficient

While having 10 years experience on using Linkedin Business Page, you understand that the users who are responsive in Linkedin are mainly seeking for business opportunities (i.e. B2B) instead of retail purchasing (i.e. B2C). Although you have this “insight”, you still from time to time scrolled to some Feeds in Linkedin which are selling to retail customers. As you cannot 100% sure about your insight, and therefore you classify it as a Correlation Coefficient (denotes “r”) relationship which the Correlation Coefficient of the responsiveness between Linkedin Business Page and Retail Business is low (e.g. r=0.3) , and meanwhile it is high (e.g. r=0.9) between Linked Business Page and Wholesale business.

In this stage, you can bypass the understanding of the mathematical operation of the Correlation Coefficient. What you need to know is simply that the higher the value of the Correlation Coefficient (r) , the closer the relationship to (Positive) Causal Relationship.

Now based on the Correlation Coefficient which is conducted by your empirical study, you deduce other Business Rule as below:

Business Rule 2 : The responsiveness of the Linkedin Business Page is high for Wholesale Business and low for Retail Business.

By applying Business Rules 2 in your business, you decide to adapt the Linkedin Business Page into your Diamond Wholesaler Store (Edge 6) and meanwhile NOT adapt in your Fashion Retail Store business (Edge 3)

By continuing deducing the Business Rules based on your experience or any other statistic, you figured out the following Business Rules for the Edge as below:

Decision#	Involved Edge	Business Rules
Edge #1 and #7	Fashion Retail Store > Website > Ads	Fashion Retail Store needs Website as the landing page of placing Ads.
Edge #1 and #8	Fashion Retail Store > Website > Payment Gateway	Fashion Retail Store needs Payment Gateway to install in Website to receive payment from Client
Edge #1 and #9	Fashion Retail Store > Website > Feed	Fashion Retail Store needs put the Feed in the Website for content marketing articles publishing
Edge #1 and #10	Fashion Retail Store > Website > Enquiry Form	Fashion Retail Store needs put the Enquiry Form in the Website for replying questions from client.
Edge #2 and #11	Fashion Retail Store > Google Merchant Center > Ads	Fashion Retail Store needs Google Merchant Center showcasing their product in Google Ads Campaign
Edge #2 and #12	Fashion Retail Store > Google Merchant Center > Payment Gateway	Google Merchant Center does not support Payment Gateway
Edge #2 and #13	Fashion Retail Store > Google Merchant Center > Feed	Fashion Retail Store needs to turn the Product Page of the website to Google Merchant Center’s Feed
Edge #2 and #14	Fashion Retail Store > Google Merchant Center > Enquiry Form	Fashion Retail Store does not support Enquiry Form Function
Edge #3 and #15	Fashion Retail Store > Linkedin Business Page > Ads	Ads placed in Linkedin Business Page is not appropriate for Fashion Retail Store
Edge #3 and #16	Fashion Retail Store > Linkedin Business Page > Payment Gateway	Linkedin Business Page does not support Payment Gateway
Edge #3 and #17	Fashion Retail Store > Linkedin Business Page > Feed	Audience of Linkedin Business Page is not expected Retail Feed from Fashion Retail Store showing in their Linkedin Personal account.
Edge #3 and #18	Fashion Retail Store > Linkedin Business Page > Enquiry Form	There is no Enquiry Form function in Linkedin Business Page
Edge #4 and #7	Diamond Wholesaler > Website > Ads	Diamond Wholesaler needs Website as the landing page of placing Ads.
Edge #4 and #8	Diamond Wholesaler > Website > Payment Gateway	Diamond Wholesaler does not expect the client to place order in the Website directly. Therefore Payment Gateway is not needed.
Edge #4 and #9	Diamond Wholesaler > Website > Feed	Diamond Wholesaler needs put the Feed in the Website for content marketing articles publishing
Edge #4 and #10	Diamond Wholesaler > Website > Enquiry Form	Diamond Wholesaler definitely needs Enquiry Form in the Website as the client will ask for product info and transactional info before placing order.
Edge #5 and #11	Diamond Wholesaler > Google Merchant Center > Ads	Diamond Wholesaler may not need to place the Ads via Google Merchant Center Campaign because Google Merchant Center do not support tiered-pricing or quantity pricing function.
Edge #5 and #12	Diamond Wholesaler > Google Merchant Center > Payment Gateway	Google Merchant Center does not support Payment Gateway
Edge #5 and #13	Diamond Wholesaler > Google Merchant Center > Feed	Diamond Wholesaler may not need to sync the Product Feed from their website to Google Merchant Center because Google Merchant Center do not support tiered-pricing or quantity pricing function.
Edge #5 and #14	Diamond Wholesaler > Google Merchant Center > Enquiry Form	There is no Enquiry Form function in Google Merchant Center.
Edge #6 and #15	Diamond Wholesaler > Linkedin Business Page > Ads	Diamond Wholesaler is appropriate to place Ads in Linkedin Business Page to seek for the management level Decision Maker or Merchandiser based on the Job Title Ads segmentation.
Edge #6 and #16	Diamond Wholesaler > Linkedin Business Page > Payment Gateway	Linkedin Business Page does not support Payment Gateway
Edge #6 and #17	Diamond Wholesaler > Linkedin Business Page > Feed	Diamond Wholesaler is appropriate to publish Feed in Linkedin Business Page to seek for the management level Decision Maker or Merchandiser.
Edge #6 and #18	Diamond Wholesaler > Linkedin Business Page > Enquiry Form	There is no Enquiry Form function in Linkedin Business Page

All Decision Combinations Table of the Marketing Nerual Networking Model

Points to note

Although there are only 18 Edges inside the Model, there are in fact 24 Decision Combinations that we need to make because all of the times we need to put all 3 Attributes (i.e. Business Model / Digital Assets / Digital Assets Features) together into consideration, instead of only considering 2 Attributes each time.
(Business Model) 2 x (Digital Assets) 3 x (Digital Assets Feature) 4 = 24 Decisions. We call the product ^ⓘ of the multiplication Carterisan Product^ⓘ.

What Problem Patterns the Marketing Neural Networking Model Solves

Enumerating all Possible Decisions Combinations

The reason why we need to enumerate all the possbile decision combinations is that while Strategy means “decision“, to formulate a Marketing Strategy, covering all possible decisions comprehensively is as important as figuring out the appropriate answer of a single decision.

The only way to enumerate 100% of the decision combinations is by enumerating all the Attributes and all Option Values of each Attributes, and multiplying them all together to become an Cartersian Product. In turn, there will be no decision combination missing out within the Model (i.e. figured out exactly ALL possibilties within the Model, no more and no less) , provided that there are no relevant attributes in the Marketing Neural Networking Model that are missing out, which we will discuss this “bug” in upcoming chapter.

Automating the Decison Making procedures by computer or A.I.

Remember in the old days (or even today without A.I) you learn digital marketing strategies by listening from the advice provided by the senior digital marketing consultant to the client. Every time when you were participating in a client meeting, you were impressed by how deep the knowledge ocean that the senior digital marketing consultant acquired that seemed he could non stop sharing his knowledge forever. You dropped down every single piece of know-how into a notebook and dreamed of that you might become him some day when you acquired ALL his knowledge, although you never know how “exact quantity” of “ALL” knowledge is.

Even if luckily , you did the miracle and learned “all” the knowledge and become another iconic senior digital marketing consultant, your next generation will encounter the same problem as you did, which he/she needs to take notes and learn piece by piece starting from a blank paper.

This inefficient resistant makes the knowledge transmission process extremely slow, just like what human beings did in the passed 7,000 years since mankind’s history.

Bear in mind that the example that we made previously in this session only describes 24 decision combinations , which accounts for a extremely tiny portion of reality which probably has 10 of millions of decision combinations, which is far beyond the processing power of a mortal within his lifespan.

In order to have a systematic way to record all the Knowledge Nodes and the relationships amongst the Nodes, the Neural Networking Model is a perfect candidate to provide a paradigm which turn reality into a conceptualised mathematical model to do the job , not only by human beings but also by computer, which it’s compute power can dramatically speed up the pace of learning by decade of years, and letting processing ALL decision combinations to be an mission possible.

September 28, 2024

Relationship between Human Learning and Data Structure
Abstract

Human learning is a complex and on-going process which describes the interaction between the human being and the environement surrounded them, and how they interpret the data and formulate the model to project the world. While it’s worth a whole book to explain it, in this article we only extract the part which related to the Data Structure.

Definition

Data

First of all, Data is nothing about computers or digital. long before the invention of computer or any digital devices, data exists.

Allow me to explain Data with an example. Some day 5,000 years ago in Mesopotamia^ⓘ, a Sumerians named Adamen brought a sheep to the market for sale. While he stood in the street for almost 6 hours, finally he found a richman who was really going to buy his sheep for 50 Shekels^ⓘ. He was happy and thought that if he could sell all the sheeps he possessed , which was 10 sheeps , he could have financial freedom. So he left the market and thought of how to execute his plan.

Immediately after he arrived home, he found it’s really hard for him to bring 10 sheeps from his home to the market. He was thinking that instead of bringing the entire sheeps to the market, is there any way that he can only bring part of the sheep? In turn, he cut off one nail from each of the sheep, and brought these 10 nails to the market to make people believe that he possessed 10 sheeps.

In this story, the nail of the sheep is acting as a Data to denote the underlying material object – the sheep.

You may wonder why he doesn’t simply use a paper and write the word “sheep” on it. Please bear in mind that paper and words were not invented at that time.

Of course, when time goes by, when the word and paper were invented, people like Adamen can simply use a paper to write down the wording “Sheep” to denote the underlying material object “Sheep”. No matter how , the function of Data, to point a word (or symbol , or glyphics, or character, or sound, or pronunciation, you name it.) to an underlying material object, is always the same.

That’s the beginning of the story of Data.

Data Structure

A data structure is a concept for running a database. Data structure is a specialised format for organising, processing, retrieving, and storing data. It defines how data is arranged in a computer so that it can be accessed and updated efficiently. There are mainly 2 types of Data Structures:

Relational

In common English for easy understanding, you can regard Relational Data Structure as a 2-Dimension table which use both Column and Row to co-ordinate a Value (i.e. we call it “Cell” in MS Excel or Google Spreadsheet). It mainly focus on the relationship between the attribute (i.e. the Column Name and Born) and the attaching object (i.e. the Table Ancient Celebrities) itself.

Example of a Relational Data Structure (i.e. a Table)

Ancient Celebrities # Name Born in Job Title
201 Plato B.C 429 Philosopher
202 Aristotle B.C 322 Philosopher & Mathematician
203 Alexander the Great B.C 356 King of Macedonia
Ancient Celebrities Table

Non-relational

In common English, you can regard Non-relational Data Structure as a tree (or hierarchical) list which uses Node and Edge to coordinate the Value. Unlike Relational Data Structure which focus on the relationship between the attribute and it’s attaching object, Non-relational Data Structure focus on the relationship (i.e. the Edge) between Object (i.e. the Node) and another Object (i.e. another Node)

Example of a Non-relational Data Structure (i.e. a Tree List)
- Plato (Node 1)
  
  Aristotle (Node 2)
  
  Alexander the Great (Node 3)
Teacher – Student Tree List

whereas , there are 3 Nodes in the Tree List. Although it is tempting to think that there only 2 relationships (Edges) between the 3 Nodes, in fact there are 4 relationships (Edges) in among:
1. Plato (Node 1) is the teacher of Aristotle (Node 2)
2. Aristotle (Node 2) is the student of Plato (Node 1)
3. Aristotle (Node 2) is the teacher of Alexander the Greate (Node 3)
4. Alexander the Great (Node 3) is the student of Aristotle (Node 2)
4 Edges instead of 2 Edges because the direction of the relationship (Edge) does matter.

How Human Learn based on Data Structures

Let’s start this topic with a question asked from your friend:

Hey, who is Aristotle?

To answer this question, you may reply him in English as below:

Aristotle is ancient philosopher and mathematician who was born in B.C 322 , whom is the student of Plato as well as the teacher of Alexander the Great.

While the answer above is exactly same as what we will speak in daily English, this sentence is informative enough for anyone to have a brief understanding on who Artistotle is. Nevertheless, even though you are very good in English, compared with the time spent on reading the Table and Tree List , you may spend more time to read through the English sentence word by word.

On the contrary, while you are reading the sentence, in fact what you do to comprehend the sentence is by idetntifing the attributes of Aristotle (e.g. Born in , Job Title) , as well as the hierarchical relationship (i.e. Edges) between Plato (Node 1) and Alexander the Great (Node 3).

By presenting in Table and Tree List format, only with few hours of practice, anyone can comprehend any articles much faster than simply reading in plain English format.

Human Learning Behavior as an Adaptive Search

However, the story of human learning does not end just like this. Back to our example, while your friend seriously listened to your reply, although he realised that Aristotle is the student of Plato, you could never imagine he didn’t know the meaning of “B.C.” and he asked you about what is “B.C.”.

“B.C.” is an acronym of “Before Christ”. It is a dating system which is used to denote any year before the birth of Christ. The opposite of “B.C.” is “A.D.”, which stands for “Anno Domini”, which is a Latin phrase meaning “In the Year of Our Lord”. The year of 2024 means we are in A.D. 2024, which we normally will skip the terms “A.D.” as it is by default.

Having Replied by you, now your friend knew the new knowledge regarding the dating system B.C. and A.D. you can simply turn the plain English into the Table and Tree List format as if we have done before as below:

Acronym Word Stem Language Presenting Year
B.C Before Christ English Before Christ Born at Year 0
A.D Anno Domini Latin After Christ Born at Year 0
Dating System Table
- Dating System
  
  B.C.
  
  A.D.
Dating System Tree List

In fact, every single concept (I called it Knowledge Node) will always have its own attributes as well as the relationships (i.e. Edges) between other Nodes.

Imagine if your friend is a 5-year old boy and he knows very little about what you said (and even about this world!) and he is going to ask you almost every single word in your sentense like this:
1. Who is Aristotle
2. Who is Plato
3. Who is Alexendar the Great
4. What is B.C.
5. What is A.D
6. What is Latin
7. Who is Christ
8. What is Anno Domini
9. What is Macedonia
10. What is Philosopher
11. What is Mathematician
If you turn all these 11 concepts (i.e. Knowledge Nodes) into Table and Tree List format, you can imagine the Data Structure will resemble the image below:

Diamond Digital Marketing bGraph Knowledge Node Pattern

This is a typical Adaptive Search pattern which someone need to “search for what he wants to search for“, and in turn forming a Knowledge Graph which a smart person like you will quickly realise that you can (or need) to add an infinite number of Nodes and Edges inside the Graph in order to learn something. The more Nodes you add into the diagram, the more attributes will be derived. And each attribute of a Node can become a new Node.

And that’s exactly how the data structure behaves during human learning.

Remember the previous example when you explain to your friend who Aristotle is. In order to make him understand who is Aristotle, he need to acquire the foundation knowledge which made him diving into 4 level of Nodes as below:
1. Level 1 Node
  - Ancient Celebrities
2. Level 2 Node
  - Dating System
3. Level 3 Node
  - Language
  - Job Title
4. Level 4 Node
  - Country
You can now sense the challenge of how a human being learns a new concept which he will get lost in the maze very soon after he has no idea how many levels he should dive into in order to comprehensively understand a single concept in a topic (i.e. a Knowledge Domain). And the Knowledge statedion in your brain will finally distribute in this way:

Knowledge Graph stationed in your brain

Nevertheless, don’t be upset by the truth and we should (and already have) found a “Map” to navigate us in this knowledge maze.

Finally , let’s back to Aristotle again and end this topic by an citation from him which describes the problem being suffered during human learning:

The More You Know , The More You Realize You Don’t Know

bGraph

Data Schema of Relational Table Importing to Graph Database – Execution
by Jan Tang
March 10, 2025
Introduction In the earlier article titled Data Schema of Relational Table Importing to Graph Database – Dimensionality, we explored how…

Read More

bGraph

Data Schema of Relational Table Importing to Graph Database – Dimensionality
by Jan Tang
March 3, 2025
Definition Relational Table The Table is the fundamental components inside a Relational Database which stores data in tabular format using…

Read More

bGraph

What Problem Patterns bGraph Is Going to Solve
by Jan Tang
March 1, 2025
Introduction bGraph is an SaaS developed in-house by Diamond Digital Marketing Group which can be categorised as a GraphRAG web…

Read More
September 28, 2024

Ancient Celebrities #	Name	Born in	Job Title
201	Plato	B.C 429	Philosopher
202	Aristotle	B.C 322	Philosopher & Mathematician
203	Alexander the Great	B.C 356	King of Macedonia

Acronym	Word Stem	Language	Presenting Year
B.C	Before Christ	English	Before Christ Born at Year 0
A.D	Anno Domini	Latin	After Christ Born at Year 0

Category: bGraph

Introduction

Why to transform data from Relational Database to Graph Database

Data already recorded in the Relational Database.

Common Form as the Input Interface

Problem Pattern – Solution – Data Schema Trio

Relational Database Problem Pattern – Lack of Recursive Relationships

Graph Database Solution

Graph Database Data Schema

Relational Database Problem Pattern – Data Duplication

Relational Database Problem Pattern – Lack of Functional Dependency

Relational Database Problem Pattern – Data Duplication

Relational Database Problem Pattern – Data Duplication

Conclusion

Footnotes

Definition

Relational Table

Data Schema in Relational Database

Graph Database

Objective of Importing Data From Relational Table to Graph Database

Data Types

Interpreting Relational and Graph Database in a Dimensional Perspective

What is Dimension

Relationship Between Dimension and Vector

Relationship Between Dimension and Database

0-Dimension in Database

1-Dimension

2-Dimension

Adding a New Record in 2-Dimension

Adding a New Column in 2-Dimension

3-Dimension

Testing The Theory From 1 to 3 Dimension

4-Dimension

4-Dimension+

Conclusion

Footnotes

Introduction

Real World Problem Patterns

Production Time Cost

Communication Time Cost

Communication between staff to client

Communication between staff to staff

Searching Time Cost

Error Handling Time Cost

Research and Development Time Cost

Ways of handling the Time Cost

Eliminate the Cost Item

Minimize the Cost

Turn Cost from expenses to assets in nature

Applications which can handle all the Problem Patterns

Enumeration

Indexing

Mapping

Searching

Real World Scenario

Background

Trigger – Symptom

Problem Pattern

Solution and Environment

Deliverable (SKU) and SKU Feature

Target Audience Properties and Sales Trigger

Plugin for Production

Putting Everything Together

Conclusion

Introduction

Definition

bGraph

Architecture

Model Data

Business Data

Meta Data

Model Data

What Problem Patterns the Model Data Solves

Avoid Duplicated description in the Data Model

As an Simple View of Truth Under the DevOps business environment

Fill in the gap between planning and execution

Avoid Duplicated Workload between Template and Instance

Model Data in Relational and Graph Database

Model Data in Traditional Relational Database

Model Data in Graph Database