F5, NVIDIA to boost AI inference efficiency and token economics

F5 has announced expanded capabilities in its ongoing collaboration with NVIDIA, aiming to help enterprises and service providers improve the efficiency and economics of AI inference as adoption accelerates.

Kunal Anand, chief product officer, F5. Said: “Together with NVIDIA, we are enabling AI factories to treat token production as a measurable business metric. BIG-IP Next for Kubernetes provides the intelligence and governance required to increase GPU yield, reduce cost per token, and scale shared AI platforms confidently.”

Boosting AI inference efficiency and token economics

The integration combines F5’s BIG-IP Next for Kubernetes with NVIDIA’s BlueField-3 data processing units (DPUs). It aims to increase token throughput, reduce latency, and improve GPU utilisation.

In AI systems, the units of generated data, such as words or symbols called “tokens”, have emerged as a key performance metric shaping user experience and determining the return on costly GPU infrastructure. As a result, companies are increasingly focused on “token economics,” including throughput, time-to-first-token, and cost per token.

F5 claims that the enhanced platform uses real-time telemetry, including NVIDIA NIM statistics and GPU signals, to route workloads more efficiently before execution. The approach helps reduce delays and improve overall system performance by matching AI tasks to the most suitable compute resources.

Testing conducted by The Tolly Group revealed that the combined solution increased token throughput by up to 40%, reduced time to first token by 61%, and reduced request latency by 34%. Additionally, it frees up GPUs for ongoing inference workloads without requiring any modifications to existing AI models.

“NVIDIA’s accelerated computing infrastructure, coupled with F5’s AI-aware Application Delivery and Security Platform, unlocks superior AI factory tokenomics—delivering scalable and cost-effective inference without making any changes to the models,” said Kevin Deierling, SVP, Networking, NVIDIA. “Together, F5 and NVIDIA are empowering enterprises to scale AI factory inference efficiently and economically.”

F5, NVIDIA to boost AI inference efficiency and token economics

FutureCIO Editors

Recent Posts

Live Poll

Categories

Strategic Insights for Chief Information Officers

Quick Links

Cxociety Media Brands

Categories

Retrieve your password