How Databricks is using synthetic data to simplify evaluation of AI agents
December 12, 2024

How Databricks is using synthetic data to simplify evaluation of AI agents


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more


The company goes all out Composite AI agent. They want these systems to be able to reason and handle different tasks in different domains, but are often hampered by the complex and time-consuming process of evaluating agent performance. xToday, the leader in the data ecosystem data block declare The synthetic profile feature makes it easier for developers to do this.

The company says the move will allow developers to produce high-quality human datasets in their workflows to evaluate the performance of agent systems under development. This will save them unnecessary back-and-forth with subject matter experts and get agents into production faster.

While exactly how the synthetic data product will work for businesses using the Databricks Intelligence platform remains to be seen, the Ali Ghodsi-led company claims that its internal testing shows it can significantly improve agency performance across a variety of metrics.

Databricks’ role in evaluating AI agents

data block MosaicML was acquired last year and fully integrates the company’s technology and model Its Data Intelligence Platform provides enterprises with everything they need to build, deploy and evaluate machine learning (ML) and generative AI solutions using data hosted in the company’s Lakehouse.

Part of this work revolves around helping teams build composite AI systems that can not only reason and respond accurately, but also take actions such as opening/closing support tickets, replying to emails, and making reservations. To this end, the company has launched an overall Launching new Mosaic AI feature suite this yearincluding support for fine-tuning base models, a catalog of AI tools, and products for building and evaluating AI agents – the Mosaic AI Agent Framework and Agent Evaluation.

Now, the company is extending the agent evaluation through API generation using new synthetic data.

Currently, Agent assessment provides enterprises with two key capabilities. The first enables users and subject matter experts (SMEs) to manually define datasets containing relevant questions and answers and create some kind of criteria to evaluate the quality of answers provided by AI agents. The second enables SMEs to use this criterion to evaluate agents and provide feedback (labels). This is supported by artificial intelligence judges who automatically record human reactions and feedback in tables and rate the quality of agents based on metrics such as accuracy and harmfulness.

This approach works, but the process of building an assessment data set is time-consuming. The reasons are easy to imagine: domain experts are not always available; the process is manual, and users may often struggle to identify the most relevant questions and answers to provide “golden” examples of successful interactions.

This is where the Synthetic Profile Generation API comes in, enabling developers to create high-quality assessment profiles for preliminary assessments in minutes. It simplifies the work of SMEs to final validation and fast-tracks an iterative development process, where developers can explore on their own how permutations of the system (tweaking models, changing retrievals, or adding tools) change qualities.

The company conducted internal testing to understand how the dataset produced by the API could help evaluate and improve agents, noting that it could lead to significant improvements in various metrics.

“We asked researchers to use synthetic data to evaluate and improve the agent’s performance, and then use human-curated data to evaluate the final agent,” Eric Peter, head of AI platform and product at Databricks, told VentureBeat. “The results showed that across a variety of metrics , the agent’s performance improved significantly. For example, we observed an almost 2x increase in the agent’s ability to find relevant files (measured by recall@10). We also saw an improvement in the overall correctness of the agent’s responses. .

How does it stand out?

Although there are There are many tools Databricks, whose products produce comprehensive datasets for evaluation, stands out for its tight integration with Mosaic AI Agentic Evaluation, meaning developers built on the company’s platform don’t have to leave their workflows.

Peter pointed out that creating a data set using the new API requires four steps. Developers simply parse their files (save them as delta tables in Lakehouse), pass the delta table to the synthetic data API, perform evaluation using the generated data and see the quality results.

Instead, using an external tool means several extra steps, including running (extracting, converting, and loading (ETL) Move the parsed document to an external environment where the synthetic profile generation process can be run; move the resulting profile back to the Databricks platform; and then convert it into a format acceptable for agent evaluation. Only after this can the evaluation be performed.

“We knew the company needed a turnkey API that was easy to use—one that generates data with one line of code,” Peter explains. “We’ve also seen many solutions on the market offer simple open source prompts but not adjust for quality. With this in mind, we’ve invested heavily in the quality of the data we produce, while still allowing developers to A prompt-like interface to tailor pipelines to meet their unique enterprise needs. Finally, we knew that most existing products would need to be imported into existing workflows, which added unnecessary complexity to the process. Instead, we built one with. A tightly integrated SDK for the Databricks data intelligence platform and Mosaic AI agent assessment capabilities.

Several enterprises using Databricks are already leveraging the Synthetic Profile API as part of private preview and report significant reductions in the time required to improve agent quality and deploy them to production.

One of those customers is Chris Nishnick, who is Lippertsaid their team was able to use data from the API to improve relative model response quality by 60%, even before experts were involved.

More agent-centric features in the pipeline

Next, the company plans to expand Mosaic AI agent assessment capabilities to help domain experts modify synthetic data to improve accuracy and provide tools to manage its life cycle.

“During the preview, we learned that customers wanted some additional features,” Peter said. “First, they needed a user interface for domain experts to review and edit the comprehensive assessment data. Second, they wanted a way to manage and manage the lifecycle of their assessment set so that changes could be tracked and immediately based on domain expert review of the data. Delivering updates to developers To address these challenges, we are already testing several features with customers and plan to launch them early next year.

Overall, these developments are expected to drive adoption of Databrick’s Mosaic AI products, further solidifying the company’s position as the go-to provider for all things data and generated AI.

But Snowflake is also catching up in the category with a slew of product announcements, including Exemplary partnership with Anthropicfor which artificial intelligence cortex Products that allow enterprises to build next-generation artificial intelligence applications. Snowflake also acquired observability startup earlier this year Truela Provides AI application monitoring capabilities within Cortex.


2024-12-09 14:00:00

Leave a Reply

Your email address will not be published. Required fields are marked *