Lambda launches inference-as-a-service API | VentureBeat
December 13, 2024

Lambda launches inference-as-a-service API | VentureBeat


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more


lambda is a 12-year-old San Francisco company known for providing on-demand graphics processing unit (GPU) services to machine learning researchers, artificial intelligence model builders, and trainers.

But today, with the launch of lambda reasoning API (Application Programming Interface), which it claims is the lowest-cost service of its kind on the market. The API allows enterprises to deploy AI models and applications into production for end users without having to worry about procuring or maintaining compute.

This release complements Lambda’s existing focus on providing GPU clusters for training and fine-tuning machine learning models.

“Our platform is completely vertical, which means we can deliver significant cost savings to end users compared to other vendors like OpenAI,” Robert Brooks, vice president of revenue at Lambda, said in a video interview with VentureBeat. “In addition, There are no rate limits to limit expansion, and you don’t need to talk to sales to get started. ”

In fact, as Brooks told VentureBeat, developers can go to Lambda’s new inference API web page, generate an API key, and start using it in five minutes.

Lambda’s inference API supports cutting-edge models such as Meta’s Alpaca 3.3 and 3.1, We are Hermes-3and Alibaba’s Qwen 2.5making it one of the most accessible options for the machine learning community. this full list It can be found here, including:

  • deepseek-coder-v2-lite-command
  • dracarys2-72b-command
  • Hermès 3-405b
  • Hermès 3-405b-fp8-128k
  • Hermès 3-70b
  • Hermès 3-8b
  • LFM-40B
  • llama3.1-405b-command-fp8
  • llama3.1-70b-command-fp8
  • llama3.1-8b-command
  • llama3.2-3b-command
  • llama3.1-nemotron-70b-command
  • Llama 3.3-70b

Small models such as the Llama-3.2-3B-Instruct are priced at $0.02 per million tokens, and large state-of-the-art models such as the Llama 3.1-405B-Instruct are priced at $0.90 per million tokens.

As Lambda co-founder and CEO Stephen Balaban said recently on It provides artificial intelligence models through inference at a cost per token compared to its opponents.

Additionally, unlike many other services, Lambda’s pay-as-you-go model ensures customers only pay for the tokens they use, eliminating the need for subscriptions or rate-limiting plans.

Closing the Artificial Intelligence Loop

Lambda has more than a decade of history leveraging GPU-based infrastructure to support advances in artificial intelligence.

From hardware solutions to training and fine-tuning capabilities, the company has become a reliable partner for businesses, research institutions and start-ups.

“You know, Lambda has been deploying GPUs to our user base for over a decade, so we have literally tens of thousands of Nvidia GPUs, some of which may be from older lifecycles, some of which may be from newer ones. cycle, which allows us to still get the maximum utility from these AI chips at a lower cost for the broader machine learning community,” Brooks explained. “With the launch of Lambda Inference, we are closing the loop on the full-end AI development lifecycle. The new API formalizes what many engineers are already doing on the Lambda platform – using it for inference – but now with A dedicated service to simplify deployment.

Brooks noted that its deep reserve of GPU resources is one of Lambda’s distinguishing features, and reiterated that “Lambda has deployed tens of thousands of GPUs over the past decade, allowing us to provide cost-effective solutions for both older and newer AI chips.” program and maximum utility.”

This GPU advantage enables the platform to support scaling to trillions of tokens per month, providing flexibility to developers and enterprises alike.

Open and flexible

Lambda positions itself as a flexible alternative to the cloud giants, providing unrestricted access to high-performance inference.

“We want to give the machine learning community unlimited access to a rate-limited inference API. You can plug and play, read the documentation, and quickly scale to trillions of tokens,” Brooks explained.

The API supports a range of open source and proprietary models, including the popular command-tuned Llama model.

The company also hinted at expansion into multi-modal applications in the near future, including video and image generation.

“Initially, we are focusing on text-based LLM, but soon we will expand to multimodal and video text models,” Brooks said.

Provide privacy and security services to developers and enterprises

The Lambda Inference API targets a wide range of users in media, entertainment, and software development, from startups to large enterprises.

These industries are increasingly using artificial intelligence to support applications such as text summarization, code generation and generative content creation.

“User data is not retained or shared on our platform. We act as a conduit to provide data to end users, ensuring privacy.

As AI adoption continues to rise, Lambda’s new service is expected to attract the attention of enterprises looking for cost-effective solutions to deploy and maintain AI models. By removing common barriers like rate limits and high operating costs, Lambda hopes to help more organizations realize the full potential of artificial intelligence.

The Lambda Inference API is now available, with detailed pricing and documentation available via Lambda’s website.


2024-12-12 19:19:14

Leave a Reply

Your email address will not be published. Required fields are marked *