
Apple collaborates with NVIDIA to research faster LLM performance
in a Today’s blog postApple engineers shared new details about a collaboration with NVIDIA to leverage large language models for faster text generation performance.
apple publish and open source ReDrafter technology was launched earlier this year. It represents a new approach to text generation using LLM that is significantly faster and “achieves state-of-the-art performance”. It combines two techniques: beam search (exploring multiple possibilities) and dynamic tree attention (efficiently processing choices).
Despite its research results, Apple is working with NVIDIA to bring ReDrafter into production. As part of the collaboration, ReDrafter is integrated into NVIDIA TensorRT-LLM, a tool that helps run LLM faster on NVIDIA GPUs.
The result is as follows:
In order to achieve the integration of ReDrafter, NVIDIA has added new operators or exposed existing operators, which greatly improves TensorRT-LLM’s ability to adapt to complex models and decoding methods. ML developers using NVIDIA GPUs can now easily benefit from ReDrafter’s accelerated token generation for their production LLM applications using TensorRT-LLM.
When benchmarking a production model with tens of billions of parameters on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework and ReDrafter, we saw a 2.7x improvement in greedy decoding token generation per second. These benchmark results show that this technology can significantly reduce the latency that users may experience, while using less GPU and consuming less power.
“LLM is increasingly used to power production applications, where improving inference efficiency can impact both computational costs and reduce latency for users,” Apple machine learning researchers concluded. “By incorporating ReDrafter’s novel inference With the integration of sexual decoding methods into the NVIDIA TensorRT-LLM framework, developers can now benefit from faster token generation for their production LLM applications on NVIDIA GPUs.”
You can learn more about this work in blog posts on Apple’s website and NVIDIA’s website:
follow opportunities: Number of execution threads, blue sky, Instagramand Mastodon.
FTC: We use auto affiliate links to earn revenue. More.
2024-12-18 21:33:45