<![CDATA[Red Hat and Tesla engineers tackled a real production problem together.]]>

<![CDATA[Red Hat and Tesla engineers tackled a real production problem together.]]>Red Hat and Tesla engineers tackled a real production problem together.

3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + llm-d + vLLM. Fixes pushed upstream to KServe along the way.

This is what open source looks like.

Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM | llm-d

How migrating from a simple vLLM deployment to a robust MLOps platform utilizing KServe, llm-d's intelligent routing, and vLLM solved significant scaling and operational challenges in LLM deployment through deep customization and prefix-cache aware routing to maximize GPU utilization.

llm-d (llm-d.ai)

#RedHat #Tesla #RedHatAI #vLLM #Pytorch #Kubernetes #OpenShift #KServe #llmd #Llama #OpenSource

]]>https://board.circlewithadot.net/topic/d42200d0-7200-4087-b5bd-e0e76918a0e9/red-hat-and-tesla-engineers-tackled-a-real-production-problem-together.RSS for NodeThu, 14 May 2026 22:34:32 GMTThu, 23 Apr 2026 19:23:54 GMT60