Red Hat and Tesla engineers tackled a real production problem together.
Uncategorized
1
Posts
1
Posters
0
Views
-
Red Hat and Tesla engineers tackled a real production problem together.
3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + llm-d + vLLM. Fixes pushed upstream to KServe along the way.
This is what open source looks like.

https://llm-d.ai/blog/production-grade-llm-inference-at-scale-kserve-llm-d-vllm
#RedHat #Tesla #RedHatAI #vLLM #Pytorch #Kubernetes #OpenShift #KServe #llmd #Llama #OpenSource
-
System shared this topic