Generative AI and Knowledge Graphs: A Perfect Match

Generative AI and Knowledge Graphs: A Perfect Match

Generative AI shows promise but struggles with accuracy, leading to hallucinations. Combining retrieval augmented generation with knowledge graphs can enhance data relevance.

Jesse Anglen
July 25, 2024

looking for a development partner?

Connect with technology leaders today!

Schedule Free Call

has huge potential, but it also faces problems. If generative AI creates information that is not factually accurate in response to a user request, resulting in so-called hallucinations, it can have a big impact on users. Relying on large language models (LLMs) training data on its own is not enough to prevent hallucinations. According to the Vectara Hallucination Leaderboard, GPT 4 Turbo has a hallucination rate of 2.5%, followed by Snowflake Arctic at 2.6% and Intel Neural Chat 7B at 2.8%.


To deal with this potential issue and improve results, retrieval augmented generation (RAG) allows users to leverage their company data through vector searches. However, RAG is not perfect either. When companies have documents that often reference each other or if the same data is repeated across different documents, it can reduce the effectiveness of the purely vector-search-based approach.


The issue here is that RAG focuses on information similar to the question prompt in order to return results. This makes it harder to answer questions that involve multiple topics or that require multiple hops, as vector search finds results matching the prompt but cannot jump to other linked results.


As an example, say that you have a product catalog with files on each product. Some of those products may be very similar, with minor differences in terms of size or additional functionality depending on which version you look at. When a customer asks about a product, you would want your LLM to respond with the right information around the category and around any specific product features too. You would not want your LLM to recommend one product that doesn’t have the right features when another in the same line does. Product documentation may also reference other information, e.g., by having a link in the document which means the chunk returned may not offer the end user the full picture.


To overcome the potential problem around including the right level of detail, we can combine RAG with knowledge graphs, so that we can point to more specific files with the right data for a response. A knowledge graph represents distinct entities as nodes within the graph and then edges indicate relationships between the specific entities. For instance, a knowledge graph can provide connections between nodes to represent conditions and facts that might otherwise be confusing to the LLM because they might otherwise seem similar.


When used for RAG, entities relevant to the question are extracted, and then the knowledge sub-graph containing those entities and the information about them is retrieved. This approach allows you to extract multiple facts from a single source that are associated with a variety of entities within the knowledge graph. It also means you can retrieve just the relevant facts from a given source rather than the whole chunk, which might include irrelevant information.


Alongside this, it means that you can deal with the problem of having multiple sources that include some of the same information. In a knowledge graph, each of these sources would produce the same node or edge. Rather than treating each of these sources as a distinct fact and then retrieving multiple copies of the same data, that repeated data will be treated as one node or edge and thus retrieved only once. In practice, this means that you can then either retrieve a wider variety of facts to include in the response, or allow your search to focus only on facts that appear in multiple sources.


Knowledge graphs also make it easier to find related information that is relevant for a request, even when it might be two or three steps away from the initial search. In a conventional RAG approach, you would have to carry out multiple rounds of querying to get the same level of response, which is more expensive from a computation standpoint and potentially more expensive in terms of cost too.


To create and use a knowledge graph as part of your overall generative AI system, you have several options. For instance, you may want to import an existing set of data that you know is accurate already. Alternatively, you can create your own knowledge graph from your data directly, which can be beneficial when you want to curate your information and check that it is accurate. However, this can be time-intensive and difficult to keep updated when you have a large amount of data, or when you want to add new information quickly.


One interesting approach you can use is to employ your LLM to extract information from your content and summarize the data. This automated approach can make it easier to manage information at scale, while still providing you with that up to date knowledge graph that you need. As an example, you can use LangChain and LLMGraphTransformer to take a set of existing unstructured data, apply a structure, and then organize that data. You can then use prompt engineering and knowledge engineering to improve the automated extraction process into a relevant knowledge graph.


For more insights, visit our Rapid Innovation Blogs.


Top Trends

Latest News

Get Custom Software Solutions &
Project Estimates with Confidentiality!

Let’s spark the Idea