Together AI Unveils Inference Engine 2.0 With Turbo And Lite Endpoints

Terrill Dicki Jul 18, 2024 18:41

Together AI launches Inference Engine 2.0, offering Turbo and Lite endpoints for enhanced performance, quality, and cost-efficiency.

Together AI Unveils Inference Engine 2.0 with Turbo and Lite Endpoints

Together AI has announced the release of its new Inference Engine 2.0, which includes the highly anticipated Turbo and Lite endpoints. This new inference stack is designed to provide significantly faster decoding throughput and superior performance compared to existing solutions.

Performance Enhancements

According to together.ai, the Together Inference Engine 2.0 offers decoding throughput that is four times faster than the open-source vLLM and outperforms commercial solutions such as Amazon Bedrock, Azure AI, Fireworks, and Octo AI by 1.3x to 2.5x. The engine achieves over 400 tokens per second on Meta Llama 3 8B, thanks to advancements in FlashAttention-3, faster GEMM & MHA kernels, quality-preserving quantization, and speculative decoding.

New Turbo and Lite Endpoints

Together AI has introduced new Turbo and Lite endpoints, starting with Meta Llama 3. These endpoints aim to balance performance, quality, and cost, allowing enterprises to avoid compromises. Together Turbo closely matches the quality of full-precision FP16 models, while Together Lite offers the most cost-efficient and scalable Llama 3 models available.

Together Turbo endpoints provide fast FP8 performance while maintaining quality, matching FP16 reference models and outperforming other FP8 solutions on AlpacaEval 2.0. These Turbo endpoints are priced at $0.88 per million tokens for 70B and $0.18 for 8B, making them significantly more affordable than GPT-4o.

Together Lite endpoints use INT4 quantization to offer high-quality AI models at a lower cost, priced at $0.10 per million tokens for Llama 3 8B Lite, which is six times lower than GPT-4o-mini.

Adoption and Endorsements

Over 100,000 developers and companies, including Zomato, DuckDuckGo, and the Washington Post, are already utilizing the Together Inference Engine for their Generative AI applications. Rinshul Chandra, COO of Food Delivery at Zomato, praised the engine for its high quality, speed, and accuracy.

Technical Innovations

The Together Inference Engine 2.0 incorporates several technical advancements, including FlashAttention-3, custom-built speculators, and quality-preserving quantization techniques. These innovations contribute to the engine's superior performance and cost-efficiency.

Future Outlook

Together AI plans to continue pushing the boundaries of AI acceleration. The company aims to extend support for new models, techniques, and kernels, ensuring the Together Inference Engine remains at the forefront of AI technology.

The Turbo and Lite endpoints for Llama 3 models are available starting today, with plans to expand to other models soon. For more information, visit the Together AI pricing page.

Image source: Shutterstock
RECENT NEWS

Ether Surges 16% Amid Speculation Of US ETF Approval

New York, USA – Ether, the second-largest cryptocurrency by market capitalization, experienced a significant surge of ... Read more

BlackRock And The Institutional Embrace Of Bitcoin

BlackRock’s strategic shift towards becoming the world’s largest Bitcoin fund marks a pivotal moment in the financia... Read more

Robinhood Faces Regulatory Scrutiny: SEC Threatens Lawsuit Over Crypto Business

Robinhood, the prominent retail brokerage platform, finds itself in the regulatory spotlight as the Securities and Excha... Read more

Surprise Crypto Surge May Come This Week – Here Are The Top Coins To Keep An Eye On

This week’s crypto market shift has investors buzzing—find out which digital currencies could be poised for a breako... Read more

CFTC Wins $36m Victory In California Crypto Fraud Case

New York resident William Koo Ichioka agreed to pay $36 million in a CFTC case alleging cryptocurrency and forex fraud. ... Read more

Experts Predict 5000% Gains For This Solana Memecoin Set To Rival Dogecoins 2021 Surge

Discover a new memecoin on Solana, inspired by Dogecoin, with analysts predicting gains of up to 5,000%. #partnercontent Read more