Scaling AI: Platform best practices
This is the first VB Lab Insights article published by Capital.
Enterprises are now investing heavily in how to build and continuously develop world-class enterprise platforms that enable AI use cases to be built, deployed, scaled and evolved over time. Many companies have historically taken a federated platform approach when building functionality and features to support customized needs in various areas of their business.
Today, however, advances such as generative artificial intelligence create new challenges that require improved approaches to building and scaling enterprise platforms. This includes considering the expertise and graphics processing unit (GPU) resource requirements for training and hosting large language models, access to large amounts of high-quality data, close collaboration across multiple teams to deploy agent workflows, and a high level of maturity, e.g., for multiple Internal application programming interfaces (APIs) and tools required for agent workflow. Disparate systems and a lack of standardization hinder companies’ ability to realize the full potential of artificial intelligence.
At Capital One, we understand that large enterprises should follow a common set of best practices and platform standards to effectively deploy AI at scale. While the details will vary, there are four common principles that can help companies successfully deploy AI at scale to unlock their business value:
1. Everything starts with the user
The goal of any enterprise platform is to support users, so you must start with the needs of those users. You should seek to understand how users interact with your platform, what problems they are trying to solve, and any friction they encounter.
For example, at Capital One, a key principle that guides our AI/ML platform team is that we focus on all aspects of the customer experience, even those we don’t directly oversee. For example, in recent years, we have taken many steps to address data and access management pain points for our users, even though we rely on other enterprise platforms to solve these problems.
When you gain your users’ trust and engagement, you can innovate and reimagine the art of the possible through new ideas and “further advancement.” This dedication to our customers is the foundation for building a lasting and sustainable platform.
2. Establish a multi-tenant platform control plane
Multi-tenancy is critical to any enterprise platform, allowing multiple lines of business and distributed teams to use core platform capabilities such as compute, storage, inference services, and workflow orchestration in a shared but well-managed environment. It helps you solve core data access pain points, allows abstraction, supports multiple computing models, and simplifies the configuration and management of computing instances for core services—for example, the large number of GPUs and central processing units (CPUs)/ML used by AI workload needs.
By properly designing a multi-tenant platform control plane, you can integrate best-in-class open source and commercial software components and flexibly scale as the platform continues to evolve. At Capital One, we have developed a powerful platform control plane based on Kubernetes that scales to large computing clusters on AWS and is used by thousands of active AI/ML users across the company.
We regularly experiment and adopt best-in-class open source and commercial software components as plug-ins and develop our own proprietary functionality, giving us a competitive advantage. For end users, this gives them access to the latest technology and enhanced self-service capabilities, allowing teams to build and deploy on our platform without having to seek support from our engineering team.
3. Embed automation and governance
When you build a new platform, it’s critical to have mechanisms in place to collect logs and insights from models and features throughout their lifecycle of build, test, and deployment. Enterprises can automate core tasks such as lineage tracing, compliance with enterprise controls, observability, monitoring and detection across all layers of the platform. By standardizing and automating these tasks, weeks or even months can be shortened from developing and deploying new mission-critical models and AI use cases.
At Capital One, we’ve taken it a step further and established a marketplace for reusable components and software development kits (SDKs) with built-in observability and governance standards. These enable our employees to confidently find the reusable libraries, workflows, and user-contributed code they need to develop AI models and applications, knowing that the artifacts they build on enterprise platforms are being well looked after behind the scenes. management. In fact, at this stage of our journey, we view this level of automation and standardization as a competitive advantage.
4. Invest in talent and effective business practices
Building the most advanced AI platforms requires world-class cross-functional teams. An effective AI platform team must be multidisciplinary and diverse, including data scientists, engineers, designers, product managers, network and model risk experts, etc. Each of these team members brings unique skills and experience and plays a key role in building and iterating an AI platform that works for all users and scales over time.
At Capital One, our mission is to collaborate cross-functionally across the company when building and deploying AI platform capabilities. As we look to grow our organization and build our AI workforce, we created the Machine Learning Engineer position in 2021 and more recently the Artificial Intelligence Engineer position to recruit and retain technical talent to help us continue to stay ahead of the curve status.
Along the way, establishing and communicating a clear roadmap and change controls for platform users, and incorporating feedback loops into your planning and software delivery processes, are critical to ensuring your users stay informed and prepared for what’s to come. Contributing and understanding the benefits is crucial.
Lay a future-proof foundation for your AI
Building or transforming an enterprise platform for the AI era is no easy task, but it will allow your business to gain greater agility and scalability. At Capital One, we’ve seen firsthand how these foundations Large-scale support for AI/ML Continue to create value for our business and our more than 100 million customers.
By laying the right technology foundation, establishing governance practices from the start, and investing in talent, your users will soon be able to leverage AI in a well-governed manner across the entire enterprise.
Abhijit Bose is Senior Vice President, Head of Enterprise Artificial Intelligence and Machine Learning Platforms at Capital One.
VB Lab Insights content is created in partnership with companies that pay for their posts or have a business relationship with VentureBeat, and they are always clearly marked. For more information please contact
2024-12-11 15:08:22