bGraph Engine

What Problem Patterns bGraph Is Going to Solve

Jan Tang Avatar

( modified at

) by

in category

Introduction

bGraph is an SaaS developed in-house by Diamond Digital Marketing Group which can be categorised as a GraphRAG web application serving as an Enterprise Knowledge Graph.

To better understand what GraphRAG exactly is, it is imperative for us to start with a real world problem pattern


Real World Problem Patterns

The definition of profit is simply Sales Revenue Minus Cost. While in the legal aspect, an cost item of Labour Cost (e.g. Salary) is good enough to meet the legal duty in terms of financial report, it cannot be in the real world reflects the problem that how the 40 Hours x 4 Weeks working hour for a staff is distributed among different activities throughout his/her daily operation. Instead of presenting the cost in monetary terms, I would like to convert it into Time.

In almost any industries or any business model we can categorize the types of time cost in below:


Production Time Cost

In a service-oriented business, Production Time Cost is simply referring to the time a staff spent on rendering a service to the client. For example, a hair dresser spent 30 minutes on providing hair styling service to the client. This 30 minutes will be categorized as Production Time Cost.

In a sku-oriented business, Production Time Cost can be categorized as the time on any kinds of labour cost incurred between planning to product delivered to the client. For example, even if you sell a Clock online, not only the Time Cost the Product Manager should be spent on designing and manufacturing the clock, the Customer Service Officer also needs to spend time on answering the enquiries from the wholesale or end-user clients. 

Communication Time Cost

Communication Time Cost is indispensable in business world. We can easily find the Communication Time Cost in scenarios below:

Communication between staff to client

  1. Reporting (e.g. Sales Report, Order History)
  2. Documentation (e.g. Invoice , Quotation, Shipping Note)
  3. Enquiries from Clients

Communication between staff to staff

  1. Reporting (e.g. Monthly Report)
  2. Knowledge TransferMeeting
    • e.g. In a brainstorm marketing meeting in a digital marketing agency which the salesperson needs to transmit the requirement and the marketing parameters acquired from the client to the marketer team such that the marketer team can formulate a digital marketing strategy based on the input from the salesperson.
  3. Knowledge TransferTraining and Learning
    • e.g. A new staff onboard which he/she can be competent to his/her job duties when time goes by based on below:
      • Operational Manual Reading – 10%
      • Advice from supervisior and teammate – 10%
      • Hands-on practicing – 30%
      • Trial and Error , feedback and complain from supervisior or client – 50%
  4. Knowledge Not Transferred
    • e.g. Imagine 1000 hours had been spent by the staff on designing and developing a new product , system or skill. When the staff quit the organisation, the knowledge that he/she acquired will be left together with him/her if there is lack of knowledge management practice in the organisation.
  5. Communication by Optimisation
    • e.g. A client requested to the web designer that he want the font-size in the website “Bigger”. Whenver the web designer modified the font-size from “12pt” to “24” pt, and then the client responsed that the font-size is “Too Big” this time. So the web designer by trial and error to adjust the font between “12pt to 24pt” a few times, and finally optimized in font “18pt”.
  6. Instruction Placing
    • e.g. An Marketing Director ordered the Marketing Manager in his/her team to prepare for the Consolidated Marketing Report with the followng specifications which is different each time:
      • CTR
      • CPC
      • CPM
      • CPA
      • ROAS

Searching Time Cost

Searching Time Cost can be derived from following scenarios:

  1. Bad Naming Convention
    • e.g. a Sales Report File named as “Report.pdf“, which the name in itself cannot differentiate among other reports created in/by different time, purpose and person.
  2. Lack of Indexing
    • e.g. Reinvent a new wheel – A staff spent 20 hours to created a comprehensive operational manual covering all his job duties. However, one day when this staff quit and the new staff does not realize the existence of that operational manual and then he spent another 20 hours to write a new one.
  3. Not Semantic Search Friendly
    • e.g. One staff member created a “Client List” and put it in the company drive , while the user searched “Customer List” in the company drive and then no result came out as he did not realise that he needed to search “Client” instead of “Customer“. After 15 minutes back and forth with different staff, he finally realized that he should search “Customer“.
  4. Lack of centralized data repositary
    • e.g. A Prospect discussed the project with the staff via Personal WhatsApp , WhatsApp group, Email, Phone Call , MS Team Video Meeting and Face to Face Meeting (with minutes). Having also checked with the CRM , Facebook CRM and eDM sending record, the staff spent 1 hour to consolidate all the data into the CRM contact log and let his supervisor review the whole picture before they can decide what to do next for closing a deal.

Error Handling Time Cost

Error Time Cost can be derived from following scenarios:

  1. Time Cost on doing wrong thing
    • e.g. Spent 1 hours to go east while it is expected to go west.
  2. Time Cost on undo the wrong step
    • e.g. Spend 1 hours to redo the error and go back to the original starting point.
  3. Insurance cost on monitoring and addressing the error
    • e.g. In order to address the wrong direction as soon as possible , the driver should report to the head office hourly. Besides, each car should install a GPS system (i.e. cost incurred) to trace the real-time position of the car.
  4. Time Cost Ripple EffectError brings New Error
    • e.g. While most of the time in a business world, how to do Step 2 will be dependent on the output of Step 1, and Step 3 dependent to Step 2 , so on and so forth, going wrong in Step 1 will trigger a ripple effect to lead Step 2 and Step 3 all go wrong.
  5. Time cost on Error Identification
    • Imagine you are using WordPress to build a website in which you have installed 100 plugins to make the WordPress website workable. However, after this 100 times installation, an error message came out which made your whole website shut down. You have no idea which plugins cause the error. The only way you can find out the root cause is to deactivate all the 100 plugins and start installing and observing the plugins one by one to see when the error occurs. The more plugins you have , and the more error messages you get, the exponential the error handling time cost will go.

Research and Development Time Cost

Error Time Cost can be derived from following scenarios:

  1. System Development
    • e.g. You want to write an operational manual to describe a step by step guideline on how to run the procedure at a hair-dressing salon from client walk-in to client leave after service rendered.
  2. No record on “Dark Matter”
    • e.g. Imagine there is a problem in which your staff have spent 1 day testing out all the 10 possible solutions. Finally you find out there is 1 and only 1 solution that is feasible among the 10. However, the staff quit and he did not record all the 9 remaining “Not Solutions“. A new staff on board and as he wants to improve the existing 1 solution, he starts his research and goes through the other 9 “Not Solutions” again. 

You can imagine that among all these Time Cost, only a very small portion of Time Cost is observable and measureable. The Time Cost which is not observable and measureable can never be cut or minimized.


Ways of handling the Time Cost

There are 3 directions on handling the Time Cost

Eliminate the Cost Item

Directly and brutally cut the item derived the cost. For example, streamline the workflow from 10 steps to 9 steps

Minimize the Cost

  1. By systemizing the workflow to cut the communication and training cost
  2. By automating the workflow to cut the labour cost

Turn Cost from expenses to assets in nature

  1. Build a do-once-use-many-times system. For example, once you write a Sales Script covering all the possible scenarios from a conversational sales meeting, this Sales Script can be applied to many sales meetings with same types in the future until the underlying environment changed. This kind of time spending , even though it is still a Cost, can be classified as Assets instead of Expenses because this Cost will generate future revenue.

Applications which can handle all the Problem Patterns

After years of hands-on experience (this is a black box and don’t ask me how and why I know! Thsi is an human intelligence before artificial intelligence dominates this world), you will realize the application can take 3 steps (or directions , to be precise) to handle all the time costs mentioned above:


Enumeration

By observing and modeling the world, you can address the relevant factors , steps, components, concepts that are related to your business.

For example, when you are running an e-shop, you will realize different kinds of transactional emails or reports which will reflect the reality. This procedure is called Modeling.

Modeling of an eshop Purchase Cycle:

  1. New User Registration Email
  2. New Order Email
  3. Invoice Email
  4. Receipt Email
  5. Delivery Note Email

While the concept is easy to understand, it is extremely difficult to execute as you have to decouple each procedure, workflow, and concept into an executable encapsulated module which you can reuse or execute systematically.

On top of that, it is a challenge for a Business Analyst or System Analyst to observe from the reality to refine the related components which comprehensively describes the model of the business. We called this comprehensive scenario Sample Space.

For example, while every one will understand the concept “Client”, a Business Analyst have to decouple the concept “Client” based on following attributes in order to make it executable and more close to reality:

  1. User Journey – Prospect vs Client or 1st Time Client vs VIP
  2. Individual Client vs Enterprise Client.

While the comprehensive option value lists of some of the attributes can easily be enumerated , most of the time most of the option value lists for most of the attributes cannot be enumerated in the time spot in which the system is built. More close to reality is that these option value lists, or even the attributes itselfs, are “growing” organically from time to time instead of being addressed in the very beginning.

For example, even the option value of attributes Gender can be classified as Male , Female in old days and an additional option value Transgender nowadays.

Also , what to observe in reality, and whether you think the component is relevant to your business or not highly depends on the level of knowledge of the Business Analyst. For example, in 4-year old you regarded water as water. But in 14-year old you should have realized the water can in fact be further decoupled by 1 H (Hydrogen) and 2 O (Oxygen).

While we will not dive into the problem patterns that we suffered during the enumration process, enuermation by itself is the very beginning of the GraphRAG based Enterprise Knowledge Graph web application.

In a techncial stack, we normally have the following technical components to execute the Enumeration process:

  1. Web Scrapper
  2. Public APIs (e.g. Public Facebook Post)
  3. Private Database and APIs (e.g. Company self-host CRM or Inventory System)
  4. UGC – Voice-to-Text conversation log or User submitted Documentations
  5. Modeling Blueprint – Meta Data , Data Schema , Business Lgoic, Compliance and Regulations.

Indexing

Indexing is the procedure to facilitate all the things or concepts in the Sample Space to be stored and searchable.

For example, giving a Sales Order an Order Number (e.g. SO20323) is a common and easy way to “Index” an Sales Activities (and conceptualized via the document Sales Order).

However, not anything can be easily indexed as simple as a Sales Order.

While i am not going to dive into the problem patterns we suffered in the indexing procedure, we can in high level describe some of the indexing procedures for a solution application:

  1. Text Embedding to Vector Database – Instead of documentation level, indexing down to the level which is to index every single word written or spoken by a client into a vector database its searchability.
  2. Business Catalog in Data Lake – Indexing the Meta Data (i.e. the data of data, a Column Name of a Table, for example) via different data source (e.g. CRM / eDM / Inventory System / Booking System / e-Shop). For example, after you realized that there is a new option value “Transgender” under the attributes of Gender, you will need to index this new option value.
  3. Data Streaming – instead of indexing every single component in a batch (e.g. once per day), we index the data in real-time instream.

Mapping

Mapping is simply to find the relationship between 2 concepts. The challenge task is that you need to address which relationship is relevant to link up among tens of thousands of combinations. For example, when a customer service office asked the client to provide the Client ID# , the client forgot his/her Client ID# and simply provide a mobile phone number for the customer service office to lookup what the Client ID# is.

In the above example , Client ID# and Client Mobile Phone Number is easy to map due to the fact that most likely that the Client Mobile Phone Number is linked to the Client Table itself. However , this ease does not apply to everything.

For example, how can you figure out an Facebook Username “BillGates” is in fact referring to the same person in Instagram Username “ThisisBillGates” as they are using different wording to refer to the same object (person)?

As usual, while we didn’t dive into details, we describe the techncial stack it normally be applied to do the mapping:

  1. Entity Resolution by Graph Database – find out same person with different name or wording under different data source.
  2. Link Preduction by Graph Database – find out the relationship between 2 concepts.
  3. RAG by Human Know-how – while LLMs is well trained by public knowledge domain, there is some domain specific knowledge which is in private , for example , the know-how of a 3-star Michigan chef on how to produce a perfect Risotto. Some sort of merchanism should allow the human to manually map the know-how into the Knowledge Graph Database so that the LLMs can apply its well-trained intelligent into the human know-how knowledge. (i.e. This is what RAG is performing)

Searching

Once all the concepts are enumerated and indexed , and the relationship among each concepts (we called it “Node“) are well defined and connected, we can start the Searching step.

In fact, regardless of industry, job nature, role, task , business model, anything, as long as you ask a question, you are performing a “Search” activity.

To execute a “Search” is to “Find a Needle (an instance) in Haystack (a pool of instances)”

For example, when the customer service officer received the an enquiry from the client asking “When my Sales Order Delievered”, he then carry out the steps below:

  1. Get the Client Mobile Phone Number from the Client.
  2. Search (i.e. Lookup) the Client# by the Client Mobile Phone Number
  3. Search (i.e. Lookup) the Sales Odrer# by Client#
  4. Search (i.e. Lookup) the Shipping Record # by Sales Order#
  5. Search the value of the attributes “Shipping Status” and “Expected Delivery Date” inside the Shipping Order record.

While the previous example is happened in a customer service scenario, the “Search” pattern also happens in the production team.

In fact the search theory deserves a whole book to elaborate. We will skip the theory and directly highlighted the technical stack that we are going to use to carry out the Search Function:

  1. Search Bar or Chat Bubble – A search bar or Chat Bubble in the frontend interface which let the user to communicate with the system by inserting their search queries or questions.
  2. Semantic Search Engine – A search bar which the user can simply input human daily language , even though the search query is not 100% correct or precise. The Semantic Search Engine can still output some similar result even the search query is not 100% precise.
    • For example, when i search “GA4 Instell Guide” , the Semantic Search Engine will output the “Google Analytics 4 installation and configuration Guide” which connected to our Enterprise Knowledge Graph even thought there is a typo in the word “install” and the “GA4” is the synonyms of “Google Analyitcs 4“. (Although it is a norm in your daily to use Google Search Engine in this manner, don’t take this as granted as you can hardly find this semantic search function the search engine other than Google Search Engine.)
    • Another concept is that people in different role will use different language to refer to the same concept. For example, in a website design project, 3 differents role will use different wording to discuss the font size of the homepage:
      • Client : Please make the word in the title bigger
      • Marketer : Do you want a Font-Size = 20pt?
      • Programmer : I would prefer 1.1 em in order to cater both moble and desktop version.
  3. LLM – An highly intelligent “brain” (e.g. ChatGPT) which can comprehend and understand both the “Needle” and “Haystack” and perform the search and return the related outcome.
  4. LangChain – While LLM (e.g. ChatGPT) is good in understanding the text, in reality we need another AI model to comprehend Image (e.g. a AI model called Llama 3.2). LangChain is used to orchestrate the multimodel to make them work together.
  5. RAG Solution – While renowned LLM (e.g. ChatGPT) is good at understanding all the public domain concepts and knowledge at the time the model was trained, they cannot understand the concepts which are under specific knowledge domain. For example, if your business invented a new product name “aldis lds” , the ChatGPT will not recognize it as a product of your company. RAG is to orchestrate the knowledge of the real world, as well as the specific knowledge you provided before it gives you a search outcome. Please understand that while the business operation or business data may change everyday, the cost of “fine-tune” the LLM every day is simply too time consuming.
  6. GraphRAG – While the RAG which is based on Vector is good in similarity search, if you have zero fault tolerance and want a 100% precise search result which the underlying know-how knowledge domain which is probably only acquainted by yourself (e.g. as a Cloud Architect Consultant). You hope that you can provide some of your know-how in 100% precision to the system by manual input, and let LLM find out the result to the user based on your personal know-how and the understanding of the knowledge in the public domain. Graph Database perform better than Vector Database in terms of hallucination-free.
  7. Adaptive GraphRAG – In old days we learnt from finding the answer. In the era of AI, we learnt by starting at forming a good question. While most of the time the user doesn’t even know what they are asking or searching for. For example, in an interior design consultation meeting, when the client expresses that they want a “japanese style bedroom”, before the designer gives them the answer , he/she will most probably ask the client “what is your budget” or “is it Ancient Japan or contemporary Japan” . You can realize that the question should be “adaptive” (i.e. keep optimising and adjusting) before the useful question is formed. An answer from an meaningless question is expected to be meaningless.

In conclusion, the GraphRAG SaaS Enterprise Knowledge Graph We Application is a solution backed by Enumeration, Indexing and Search functions which can help any individual or organisation to save time on Production , Searching , Error Handling and Communication cost.


Real World Scenario

In order to visualize the power of the GraphRAG Enterprise Knowledge Graph, allow me to demonstrate with a real-world day to day example in digital marketing world.

While Customer Service Chatbot is for sure one of the powerful aspects of saving time cost, I want to put the focus on another more important point which can be bought by the GraphRAG solution. Therefore I will keep the Customer Service level Chatbot description minimal.

Besides, i will also skip all the description regarding some automation in programmatic (i.e. not AI) level. For example, Email auto forwarding with hard coded conditional logic based on the Email Title via Email API whenever an new Email received.

Background

A Individual Client , John, who owned a WordPress website created by us (DDM Group). He received the Spam Contact Us Form Email daily, which he made him feel annoyed. As we (DDM Group) is John’ website adminstrator, he complained to us.


Trigger – Symptom

Although John received the email because of the Spam bot of the Contact Us Form Submission in his website, he does not realize the fact and his complaint email is as below:

Hi DDM,

My email keep receiving rubblish Email daily. Please help to fix.

In most of the time, the client or end-user , like John, can only use human daily language, instead of technical jargon , to describe the problem they faced.

And most important is that, the event which trigger the action (e.g. write a complaint email) is normally come from a Symptom which drive his emotion (e.g. Fear , Annoying , Despair). Most of the time this Symptom is not the cause of the problem and instead is the consequence by itself.

For easy communication, we named this “Trigger” as Symptom.

Symptom = Rubblish Email

Party Involved = Client


Problem Pattern

By asking John to submit the Rubblish Email to the Google Drive, the Technical Support analyzed that the Email is infact triggered by the Contact Us Form Submission in the existing Website.

And therefore, the Technical Support classifed it as a “Spam Form Submission” Problem Pattern which was already reported by different client many times and therefore is named and indexed as “Spam Form Submission Problem Pattern in our system

Problem Pattern = Spam Form Submission

Party Involved = Technical Support


Solution and Environment

As stated before, the Spam Form Submission is a well-knowned Problem Pattern , as an experienced website administrator, DDM Group had already indexed and mapped different kinds of Solution for different scenario with the same Problem Pattern.

Following factors affecting the choice of the Solutions:

  1. Client Contract Amount
    • VIP Client
    • Standard Client
  2. SMTP Sending Server
    • GMAIL API
    • SMTP Sending Server provided by Client itself
  3. Web Server
    • Proxy Server – Cloudflare

In order to identify the Client Contract Amount , Web Server as well as the SMTP Sending Server specific to John’s case, the Technical Support and the Customer Service Officer have to access the CRM , as well as Website Development Production Database to lookup the Client Contract Amount , applied SMTP Sending Server and Web Server.

After lookup, we figureed out that John is a VIP Client and using GMAIL API and Cloudflare.

The Technical Support, based on his years of experienced and know-how in cyber security knowledge domain , realized that the main reason of the Submission Contact Us Form being spammed is due to the fact that some kinds of Spam Form Bot constantly crawled John’s website and realized the website is using some popular open-source plugin which triggered vulnerability exposure, leading the Spam Form Bot can easily fill in the form in John’s website automatically. In order to stop being spammed , the best way which the Technical Support can think of is to let the Spam Form Bot cannot even reach John’s website. And hence the Solution of Server Level WAF (Firewall) installation is chosen due to the fact that the Cloudflare Proxy Server supports WAF Firewall.

Environment = VIP Client , GMAIL API, Cloudflare

Party Involved = CRM Manager + Techncial Support

Solution = Server Level Firewall (WAF)

Party Involved = Technial Support


Deliverable (SKU) and SKU Feature

Once the Solution is confirmed, the case is passed to the Account Manager (i.e. Salesperson) to follow up and explain to the client.

While the Technical Support does not quite familar with the SKU Name in the SKU Library, he suggested to the Account Manager to visit the SKU Library in DDM Group and search for search term “Cloudflare WAF”

The SKU Library come out with the following SKU#

SKU NameSKU#
Cloudflare WAF – Standard5232323
Cloudflare WAF – Premium5232345
Wordfence (WordPress Firewall)8475623

Due to the fact that the SKU name by itself cannot faciliate the decision on which SKU to be chosen to solve the problem, the Technical Team further dives into the SKU Feature of the 2 Cloudflare related SKU and realized that only Cloudflare WAF – Premium (#5232345) supports the Legitimate Bot whitelisting feature.

SKU = Cloudflare WAF – Premium

SKU Feature = Legitimate Bot Whitelisting

Party Involved = Account Manager


Target Audience Properties and Sales Trigger

As an seasoned and proactive Account Manager, he realized that it is a good opportunity to upsell another SKU to John due to the fact that the Spam Submission Form alerted him for the cyber security concern.

The Account Manager googled the knowledge and figured out that Login Attempt Attack (i.e. a Problem Pattern) is another common vulnerability which is suffered by lots of eshop like what John is running.

The Account Manager , based on his experience, believed that Fear is a good sales trigger to have the intention purchase. In this sense, he told the potential risk of being login attempt attack by the malicious bot and suggested John to install another 2FA plugin which can effectively protect the unauthorized login.

As the Account Manager that John have no idea on what a Login Attempt Attack is , he visualized the problem pattern by showing the visiting report which logged thousands of visits of the login page of John’s eshop with an hour.

John felt worry about it and took the advice and purchased the SKU of 2FA Plugin Installation for WordPress, while the Account Manager successfully upsell a SKU related to John case.

Target Audience Property = Eshop owner

Sales Trigger = Fear

SKU = 2FA Plugin Installation for WordPress

Problem Pattern = Login Attempt Attack

Sympton = Thousand of visits in Login Page in an hour

Party Involved = Account Manager + Client


Plugin for Production

Once John signs the Sales Contract, the Sales Contract with the involved SKUs is passed to the Production Manager.

While the Sales Contract enumerated the SKU name , it does not limit which plugin to use in order to deliver the SKU.

Having checked with the Plugin Library regarding the error and bug reports for each plugin, the Production Manager decided to use the plugin WordFence for the 2FA related SKU and Cloudflare for the Server WAF related SKU.

SKU = Cloudflare WAF – Premium

Plugin = WordFence

SKU = 2FA Plugin Installation for WordPress

Plugin = Cloudflare


Putting Everything Together

When you put everything together, you may realize in fact you are doing the following steps:

  1. Enumerating and indexing all the factors (i.e. Column Name) observed from the real world , AND
  2. Enumerating and indexing all the option values for each factor
    • e.g. Fear / Hope / Greed in Sales Trigger Column
  3. Map the Option Value among each 2 columns under a Bipartite Data Pattern.
  4. Searching based on the need of different roles
    • e.g. As a Client or Account Manager role, search by inserting a value in any of the column (e.g. 1,000 login page visited in an hour in Symptom Column)
  5. Output the result in different columns based on different role.
    • As a Production Manager, refer to the SKU and suggested the related Plugin Name.

In the real business world, there are thousands of factors (i.e. columns) that can be addressed, with each factors may have thousands of option value involved (e.g. a Sales Order Records), forming a infinity number of nodes and edges of a Graph, which can be only comprehensively memorized and handed by machine.


Conclusion

I hope you can understand the problem pattern involved in the real world and realise that the learning activity of a human being is in fact based on enumerating , indexing , mapping and searching. 

By applying the GraphRAG Enterprise Knowledge Graph SaaS Web App (i.e. bGraph), it can automate and speed up the learning of a human being based on following open-source technical stack:

  1. Retrieve the Data via
    • Web Scrapper for Public Domain Knowledge
    • APIs for Public Domain Knowledge (e.g. Weather Condition)
    • APIs For Private Domain Knowledge (e.g. in-house CRM)
    • Domain Specific Know-how provided manually by in-house expert
    • Meta Data (e.g. Data Schema , Business Logic) in different data silo (e.g. CRM or POS) from Data Lake
  2. Text Embedding the retrieved Data into Vector Database
  3. Entity Extraction from the retrieved Data and store in Graph Database
  4. LangChain to orchestrate multi AI models (e.g. Image / Text / Voice)
  5. LLMs (e.g. ChatGPT) to comprehend the content stored in Vector or Graph Database
  6. Adaptive RAG Function to
    • Retrieve the Search Query from the Client
    • Refine their query to useful question
    • Retrieved the data from Step#1
    • Comprehend the Data
  7. Semantic Search Engine (Search Bar or Chat Bubble) to allow user to insert their Search Query in
    • Plain English
    • Not precise wording
  8. Visualize the Output via Graph Database and Graph Data Science Library to solve the following problems:
    • Shortest Click Path between 2 concepts
    • Centrality
    • Community Detection
  9. Output the result via a Chat Bubble by Streamlit.



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Diamond Digital Marketing International