<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Red Hat and Tesla engineers tackled a real production problem together.]]></title><description><![CDATA[<p>Red Hat and Tesla engineers tackled a real production problem together.</p><p>3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + llm-d + vLLM. Fixes pushed upstream to KServe along the way.</p><p>This is what open source looks like. <img src="https://board.circlewithadot.net/assets/plugins/nodebb-plugin-emoji/emoji/android/1f91d.png?v=28325c671da" class="not-responsive emoji emoji-android emoji--handshake" style="height:23px;width:auto;vertical-align:middle" title="🤝" alt="🤝" /> <img src="https://board.circlewithadot.net/assets/plugins/nodebb-plugin-emoji/emoji/android/1f680.png?v=28325c671da" class="not-responsive emoji emoji-android emoji--rocket" style="height:23px;width:auto;vertical-align:middle" title="🚀" alt="🚀" /></p><p></p><div class="card col-md-9 col-lg-6 position-relative link-preview p-0">



<a href="https://llm-d.ai/blog/production-grade-llm-inference-at-scale-kserve-llm-d-vllm" title="Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM | llm-d">
<img src="https://llm-d.ai/img/llm-d-social-card.jpg" class="card-img-top not-responsive" style="max-height:15rem" alt="Link Preview Image" />
</a>



<div class="card-body">
<h5 class="card-title">
<a href="https://llm-d.ai/blog/production-grade-llm-inference-at-scale-kserve-llm-d-vllm">
Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM | llm-d
</a>
</h5>
<p class="card-text line-clamp-3">How migrating from a simple vLLM deployment to a robust MLOps platform utilizing KServe, llm-d's intelligent routing, and vLLM solved significant scaling and operational challenges in LLM deployment through deep customization and prefix-cache aware routing to maximize GPU utilization.</p>
</div>
<a href="https://llm-d.ai/blog/production-grade-llm-inference-at-scale-kserve-llm-d-vllm" class="card-footer text-body-secondary small d-flex gap-2 align-items-center lh-2">



<img src="https://llm-d.ai/img/llm-d-favicon.png" alt="favicon" class="not-responsive overflow-hiddden" style="max-width:21px;max-height:21px" />



<p class="d-inline-block text-truncate mb-0">llm-d <span class="text-secondary">(llm-d.ai)</span></p>
</a>
</div><p></p><p><a href="https://fosstodon.org/tags/RedHat" rel="tag">#<span>RedHat</span></a> <a href="https://fosstodon.org/tags/Tesla" rel="tag">#<span>Tesla</span></a> <a href="https://fosstodon.org/tags/RedHatAI" rel="tag">#<span>RedHatAI</span></a> <a href="https://fosstodon.org/tags/vLLM" rel="tag">#<span>vLLM</span></a> <a href="https://fosstodon.org/tags/Pytorch" rel="tag">#<span>Pytorch</span></a> <a href="https://fosstodon.org/tags/Kubernetes" rel="tag">#<span>Kubernetes</span></a> <a href="https://fosstodon.org/tags/OpenShift" rel="tag">#<span>OpenShift</span></a> <a href="https://fosstodon.org/tags/KServe" rel="tag">#<span>KServe</span></a> <a href="https://fosstodon.org/tags/llmd" rel="tag">#<span>llmd</span></a> <a href="https://fosstodon.org/tags/Llama" rel="tag">#<span>Llama</span></a> <a href="https://fosstodon.org/tags/OpenSource" rel="tag">#<span>OpenSource</span></a></p>]]></description><link>https://board.circlewithadot.net/topic/d42200d0-7200-4087-b5bd-e0e76918a0e9/red-hat-and-tesla-engineers-tackled-a-real-production-problem-together.</link><generator>RSS for Node</generator><lastBuildDate>Thu, 14 May 2026 22:34:32 GMT</lastBuildDate><atom:link href="https://board.circlewithadot.net/topic/d42200d0-7200-4087-b5bd-e0e76918a0e9.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 23 Apr 2026 19:23:54 GMT</pubDate><ttl>60</ttl></channel></rss>