LMCache Lab (@lmcache) / X

LMCache Lab

239 posts

LMCache Lab

@lmcache

🧪 Open-Source Team that maintains LMCache and Production Stack 🤖 Democratizing AI by providing efficient LLM serving for ALL

Github, Online

lmcache.ai

Joined September 2024

Following

1,781

Followers

LMCache Lab
@lmcache
Sep 17, 2024
🚀Meet LMCache – Your secret weapon for fast and cost-efficient LLM inference! ⚡With 7x faster access to 100× more KV caches, LMCache accelerates #vLLM for faster multi-turn conversations and RAG. Blog: lmcache.github.io/2024-09-17-rel… Github: github.com/LMCache/LMCache #LLM #LMCache #RAG
11K
LMCache Lab
@lmcache
Aug 15, 2025
8 KV-Cache Systems You Can’t Afford to Miss in 2025 By 2025, KV-cache has evolved from a “nice-to-have” optimization into a critical layer for high-performance large language model (LLM) serving. From GPU-resident paging tricks to persistent, cross-node cache sharing, the
3.7K
LMCache Lab
@lmcache
Jul 21, 2025
Everyone is focused on faster LLM inference engines. But bigger potentials might be reached with what is beyond the engine. 🚀 The real frontier could be the orchestration layer above it. Replicating engines with Kubernetes is hitting a wall. We need stateful, LLM-native
4.2K
LMCache Lab
@lmcache
Apr 9, 2025
1K Stars ⭐ for 𝘃𝗟𝗟𝗠 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗦𝘁𝗮𝗰𝗸! 🤝 We thank every contributor and user who has supported our journey in building an easy-to-use and high-performance serving stack for vLLM! We're thrilled to have reached this milestone. 😬 Been among the 𝘃𝗲𝗿𝘆
4.8K
LMCache Lab
@lmcache
Mar 2, 2025
🚀 We're thrilled to announce vLLM Production Stack—an open-source, Enterprise-Grade LLM inference solution that is now an official first-party ecosystem project under vLLM! Why does this matter? A handful of companies focus on LLM training, but millions of apps and businesses
7.4K
LMCache Lab
@lmcache
Apr 11, 2025
🚀 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Powers Up 𝘃𝗟𝗟𝗠 𝗩𝟭: P/D Disaggregation & NIXL Support! vLLM V1 revolutionized LLM serving, but lacked a dedicated KV cache interface for advanced optimizations... until NOW! ⚡ LMCache Lab is thrilled to announce two major updates enhancing vLLM V1's
2.1K
LMCache Lab
@lmcache
Jan 22, 2025
🔥Meet the vLLM Official Production Stack🔥 -⚡️ 3x higher throughput & 3x faster response! -🔧 Easy k8s deployment with helm chart! -📈 Observability dashboard! And it’s open-source under vllm-project! Code: github.com/vllm-project/p… Blog: blog.lmcache.ai/2025-01-21-sta… #LLM #vLLM #k8s
5.4K
LMCache Lab
@lmcache
Jul 8, 2025
🚨 LMCache now turbocharges multimodal models in vLLM! By caching image-token KV pairs, repeated images now get ~100% cache hit rate — cutting latency from 18s to ~1s. Works out of the box. Check the blog: blog.lmcache.ai/2025-07-03-mul… Try it 👉 github.com/LMCache/LMCache #vLLM #MLLM
1.7K
LMCache Lab
@lmcache
Aug 3, 2025
🚀 Big news from LMCache Lab! 📝 3 papers accepted at SOSP ’25 & NSDI ’26, pushing the frontier of LLM-inference efficiency: 1️⃣ Cross-agent KV-cache sharing (NSDI) 🔗 arxiv.org/abs/2411.02820 2️⃣ Custom design for LLM prefillers (SOSP) 🔗 arxiv.org/abs/2505.07203 3️⃣
1.3K
LMCache Lab
@lmcache
Aug 6, 2025
LMCache supports gpt-oss (20B/120B) on Day 1! TTFT 1.20s → 0.39s (-67.5%), finish time 15.70s → 7.73s (-50.7%) compared to Vanilla vLLM. Release the true power of GPT-OSS with vllm+LMCache -- full deployment tutorial here: blog.lmcache.ai/2025-08-05-gpt… #LMCache #vLLM #OpenAI #LLM
1.8K
LMCache Lab
@lmcache
Jul 22, 2025
Want to create your own LLM Inference Endpoint on Any Cloud in seconds? We're announcing the alpha release of LMIgnite, the one-click high-performance inference stack built for speed and scale. Powered by LMCache, vLLM, and vLLM Production Stack. 🤖 Join the alpha and
1.3K
LMCache Lab
@lmcache
Jul 29, 2025
You might know LMCache Lab for our KV cache optimizations that make LLM prefilling a breeze. But that’s not all! We’re now focused on speeding up decoding too—so your LLM agents can generate new content even faster. In other words: you can save on your LLM serving bills by
1.1K
LMCache Lab
@lmcache
Aug 8, 2025
CacheGen(arxiv.org/abs/2310.07240) lets you store KV caches on disk or AWS S3 and load them way faster than recomputing! Modern LLMs use long contexts, but reprocessing these every time is slow and resource-intensive. While engines like vLLM (and LMCache) can cache contexts in
1.3K
LMCache Lab
@lmcache
Apr 27, 2025
Amazing tool! Absolutely a game-changer for understanding open-source projects! @cognition_labs @silasalberti Finding out more about LMCache and vLLM Production Stack on Deepwiki. 🚀 LMCache: deepwiki.com/LMCache/LMCache 🚀 vLLM Production Stack: deepwiki.com/vllm-project/p… #DeepWiki
Silas Alberti
@silasalberti
Apr 25, 2025
we built DeepWiki, a free encyclopedia of all GitHub repos some numbers: - 30k repos already indexed - processed 4 billion+ lines of code - the indexing alone cost $300k+ in compute spend
00:00
2K