CIRCLE WITH A DOT

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Red Hat and Tesla engineers tackled a real production problem together.

1 Posts 1 Posters 0 Views

M This user is from outside of this forum
M This user is from outside of this forum
maxamillion@fosstodon.org

wrote last edited by

#1

Red Hat and Tesla engineers tackled a real production problem together.
3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + llm-d + vLLM. Fixes pushed upstream to KServe along the way.
This is what open source looks like.
https://llm-d.ai/blog/production-grade-llm-inference-at-scale-kserve-llm-d-vllm
#RedHat #Tesla #RedHatAI #vLLM #Pytorch #Kubernetes #OpenShift #KServe #llmd #Llama #OpenSource
1 Reply Last reply
1
0
System shared this topic

Log in to reply