AWS debuts advanced RAG features for structured, unstructured data
Join our daily and weekly newsletters for the latest updates and exclusive content on top AI coverage. Find out more
Getting enterprise data into large-scale linguistic models (LLMs) is a critical task for successful enterprise AI deployments.
That’s where Search Augmented Generation (RAG) fits in, an area where many vendors have offered different solutions. Today Fr AWS re:invent 2024 The company announced a series of new services and updates designed to make it easier for businesses to get both structured and unstructured data into RAG feeds. Making structured data available to RAG requires more than just looking up a single row in a table. It involves translating natural language queries into complex SQL queries for filtering, joining tables and aggregating data. The challenges are further multiplied for unstructured data, where by definition there is no structure to the data.
To help address these challenges, AWS announced new services to support structured data retrieval, ETL (extraction, transformation and loading) for unstructured data, data automation and knowledge base support.
“Augmented retrieval generation (RAG) is a very popular technique for personalizing your data, but one of the challenges with augmented retrieval generation is that it’s historically been about text data,” Swami Sivasubramanian, vice president of AI and data at AWS, told VentureBeat. “And if you see enterprises, most of the data, especially operational data, is stored in data lakes and data warehouses that were never ready for RAG as such.”
Improving Structured Data Retrieval Support Using Amazon Bedrock Knowledge Bases
Why is structured data not ready for RAG? Sivasubramanian provided several scripts.
“To build a highly accurate and secure system, you have to really understand the schema, create your own schema embedding, and then really understand the historical query log and then keep up with changes and schemas,” Sivasubramanian said.
During his keynote at re:invent, Sivasubramanian explained that Amazon Bedrock Knowledge Bases is a fully managed RAG feature that allows businesses to tailor responses using contextual and relevant data.
“It automates the entire RAG workflow, eliminating the need to write custom code to integrate data sources and manage queries,” he said.
With support for structured data retrieval in Amazon Bedrock’s knowledge bases, Sivasubramanian said AWS provides a fully managed RAG solution. It enables businesses to natively query all their structured data to generate results for generative AI applications. The knowledge base automatically generates and executes SQL queries to retrieve business data and then enrich the model’s responses.
“The great thing is that it also adapts to your schema and data and learns from your query patterns and provides customization options for greater accuracy,” he said. “Now with easy access to structured data for your RAG, you’ll create more powerful and intelligent gen AI applications across the enterprise.”
GraphRAG: Summarizing everything in a knowledge graph
Another key enterprise AI challenge that AWS is trying to solve for RAG is helping to improve accuracy using multiple data sources. That’s the challenge that GraphRAG’s new capability has to solve.
“One of the big challenges in business is to break down the disparate pieces of data and show how they are connected so you can build explainable RAG systems,” said Sivasubramanian. “This is where knowledge graphs are extremely important.”
Sivasubramanian explained that knowledge graphs create relationships between multiple data sources by connecting different pieces of information.
“When those relationships are converted to graph embedding for your AI applications, the system can simply walk through that graph and get those connections to get a holistic view of your customer data,” he said.
The new GraphRAG features in Amazon Bedrock Knowledge Bases automatically generate graphs using the Amazon Neptune graph database service. Sivasubramanian noted that it connects the relationship between different data sources and creates more complex Gen AI applications without the need for any graph expertise.
Solving Unstructured Data Challenges with Amazon Bedrock Data Automation
Another critical enterprise data challenge is the problem of unstructured data. It’s a problem that many vendors are trying to solve, including startups such as Anomalous.
When data needs to be indexed, be it a pdf, audio or video file for RAG use cases, some understanding of what is in the data is key to making the data useful.
“Unfortunately, unstructured data is difficult to extract and needs to be processed and transformed to be ready,” Sivasubramanian said.
The new Amazon Bedrock Data Automation technology is AWS’s answer to this challenge. Sivasubramanian explained that this feature automatically transforms unstructured multi-model content into structured data that powers AI applications,
“I like to think of it as AI-powered ETL [Extract,Transform and Load] for unstructured data,” he said.
Amazon Bedrock Data Automation automatically extracts, transforms, and processes enterprise multimodal content at scale. He noted that with a single API, an enterprise can generate custom outputs aligned with data schemas and analyze multimodal content for genAI applications.
“With these updates, we’re enabling you to use all your data to build more contextually relevant AI applications,” he said.
Source link