Blogs
242026
- Apr 16, 2026
- Apr 2, 2026
- Apr 1, 2026
- Mar 16, 2026
- Mar 9, 2026
2025
- Dec 24, 2025
- Nov 26, 2025
- Oct 31, 2025
- Sep 30, 2025
- Sep 23, 2025
- Sep 15, 2025
- Aug 8, 2025
- May 1, 2025
- Apr 30, 2025
- Feb 12, 2025
2024
- Dec 3, 2024
- Nov 24, 2024
- Nov 1, 2024
- Sep 1, 2024
- Jul 2, 2024
- Jun 1, 2024
- Jan 16, 2024
2023
- Jan 1, 2023
2022
- Jun 1, 2022
Conference Talks
6A Cross-Industry Benchmarking Tutorial for Distributed LLM Inference on Kubernetes
Samuel Monson (Red Hat), Ganesh Kudleppanavar (NVIDIA), Jason Kramberger (Google), Jing Chen (IBM Research)
Routing Stateful AI Workloads in Kubernetes
Maroon Ayoub (IBM Research), Michey Mehta (Red Hat)
Learn How to Run an LLM Inference Performance Benchmark on NVIDIA GPUs
Samuel Monson (Red Hat), Ashish Kamra (Red Hat)
Multi-Node Finetuning LLMs on Kubernetes: A Practitioner's Guide
Ashish Kamra (Red Hat), Boaz Ben Shabat (Red Hat)
Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes
David Gray (Red Hat)
Efficiently Deploying and Benchmarking LLMs in Kubernetes
Nikhil Palaskar (Red Hat)
Publications
1llm-tuna: Hyperparameter Optimization for LLM Inference
An open-source framework that automates vLLM inference hyperparameter optimization using Bayesian search via Optuna, achieving up to 32.9% throughput improvement on mixture-of-experts models.
Featured Projects
4auto-tuning-vllm
Auto-tuning for vLLM using Optuna and GuideLLM to find optimal hyperparameters and get the best performance out of your LLM deployment.
Creatorperformance-dashboard
Interactive performance analysis dashboard for Red Hat AI Inference Server benchmarks across different accelerators, versions, and configurations.
Creatorllm-d-bench
Benchmarking automation for llm-d distributed LLM inference on Kubernetes.
CreatorGuideLLM
SLO-aware benchmarking and evaluation platform for LLM deployments that simulates production workloads against OpenAI-compatible and vLLM-native servers.
Maintainer