Log inSign up
LMCache Lab
239 posts
user avatar
LMCache Lab
@lmcache
🧪 Open-Source Team that maintains LMCache and Production Stack 🤖 Democratizing AI by providing efficient LLM serving for ALL
Github, Online
lmcache.ai
Joined September 2024
50
Following
1,781
Followers
  • user avatar
    LMCache Lab
    @lmcache
    Sep 17, 2024
    🚀Meet LMCache – Your secret weapon for fast and cost-efficient LLM inference! ⚡With 7x faster access to 100× more KV caches, LMCache accelerates #vLLM for faster multi-turn conversations and RAG. Blog: lmcache.github.io/2024-09-17-rel… Github: github.com/LMCache/LMCache #LLM #LMCache #RAG
    11K
  • user avatar
    LMCache Lab
    @lmcache
    Aug 15, 2025
    8 KV-Cache Systems You Can’t Afford to Miss in 2025 By 2025, KV-cache has evolved from a “nice-to-have” optimization into a critical layer for high-performance large language model (LLM) serving. From GPU-resident paging tricks to persistent, cross-node cache sharing, the
    3.7K
  • user avatar
    LMCache Lab
    @lmcache
    Jul 21, 2025
    Everyone is focused on faster LLM inference engines. But bigger potentials might be reached with what is beyond the engine. 🚀 The real frontier could be the orchestration layer above it. Replicating engines with Kubernetes is hitting a wall. We need stateful, LLM-native
    4.2K
  • user avatar
    LMCache Lab
    @lmcache
    Apr 9, 2025
    1K Stars ⭐ for 𝘃𝗟𝗟𝗠 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗦𝘁𝗮𝗰𝗸! 🤝 We thank every contributor and user who has supported our journey in building an easy-to-use and high-performance serving stack for vLLM! We're thrilled to have reached this milestone. 😬 Been among the 𝘃𝗲𝗿𝘆
    4.8K
  • user avatar
    LMCache Lab
    @lmcache
    Mar 2, 2025
    🚀 We're thrilled to announce vLLM Production Stack—an open-source, Enterprise-Grade LLM inference solution that is now an official first-party ecosystem project under vLLM! Why does this matter? A handful of companies focus on LLM training, but millions of apps and businesses
    7.4K
  • user avatar
    LMCache Lab
    @lmcache
    Apr 11, 2025
    🚀 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Powers Up 𝘃𝗟𝗟𝗠 𝗩𝟭: P/D Disaggregation & NIXL Support! vLLM V1 revolutionized LLM serving, but lacked a dedicated KV cache interface for advanced optimizations... until NOW! ⚡ LMCache Lab is thrilled to announce two major updates enhancing vLLM V1's
    2.1K
  • user avatar
    LMCache Lab
    @lmcache
    Jan 22, 2025
    🔥Meet the vLLM Official Production Stack🔥 -⚡️ 3x higher throughput & 3x faster response! -🔧 Easy k8s deployment with helm chart! -📈 Observability dashboard! And it’s open-source under vllm-project! Code: github.com/vllm-project/p… Blog: blog.lmcache.ai/2025-01-21-sta… #LLM #vLLM #k8s
    5.4K
  • user avatar
    LMCache Lab
    @lmcache
    Jul 8, 2025
    🚨 LMCache now turbocharges multimodal models in vLLM! By caching image-token KV pairs, repeated images now get ~100% cache hit rate — cutting latency from 18s to ~1s. Works out of the box. Check the blog: blog.lmcache.ai/2025-07-03-mul… Try it 👉 github.com/LMCache/LMCache #vLLM #MLLM
    1.7K
  • user avatar
    LMCache Lab
    @lmcache
    Aug 3, 2025
    🚀 Big news from LMCache Lab! 📝 3 papers accepted at SOSP ’25 & NSDI ’26, pushing the frontier of LLM-inference efficiency: 1️⃣ Cross-agent KV-cache sharing (NSDI) 🔗 arxiv.org/abs/2411.02820 2️⃣ Custom design for LLM prefillers (SOSP) 🔗 arxiv.org/abs/2505.07203 3️⃣
    1.3K
  • user avatar
    LMCache Lab
    @lmcache
    Aug 6, 2025
    LMCache supports gpt-oss (20B/120B) on Day 1! TTFT 1.20s → 0.39s (-67.5%), finish time 15.70s → 7.73s (-50.7%) compared to Vanilla vLLM. Release the true power of GPT-OSS with vllm+LMCache -- full deployment tutorial here: blog.lmcache.ai/2025-08-05-gpt… #LMCache #vLLM #OpenAI #LLM
    1.8K
  • user avatar
    LMCache Lab
    @lmcache
    Jul 22, 2025
    Want to create your own LLM Inference Endpoint on Any Cloud in seconds? We're announcing the alpha release of LMIgnite, the one-click high-performance inference stack built for speed and scale. Powered by LMCache, vLLM, and vLLM Production Stack. 🤖 Join the alpha and
    1.3K
  • user avatar
    LMCache Lab
    @lmcache
    Jul 29, 2025
    You might know LMCache Lab for our KV cache optimizations that make LLM prefilling a breeze. But that’s not all! We’re now focused on speeding up decoding too—so your LLM agents can generate new content even faster. In other words: you can save on your LLM serving bills by
    1.1K
  • user avatar
    LMCache Lab
    @lmcache
    Aug 8, 2025
    CacheGen(arxiv.org/abs/2310.07240) lets you store KV caches on disk or AWS S3 and load them way faster than recomputing! Modern LLMs use long contexts, but reprocessing these every time is slow and resource-intensive. While engines like vLLM (and LMCache) can cache contexts in
    1.3K
  • user avatar
    LMCache Lab
    @lmcache
    Apr 27, 2025
    Amazing tool! Absolutely a game-changer for understanding open-source projects! @cognition_labs @silasalberti Finding out more about LMCache and vLLM Production Stack on Deepwiki. 🚀 LMCache: deepwiki.com/LMCache/LMCache 🚀 vLLM Production Stack: deepwiki.com/vllm-project/p… #DeepWiki
    user avatar
    Silas Alberti
    Cognition
    @silasalberti
    Apr 25, 2025
    we built DeepWiki, a free encyclopedia of all GitHub repos some numbers: - 30k repos already indexed - processed 4 billion+ lines of code - the indexing alone cost $300k+ in compute spend
    00:00
    2K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up