Sign in to view Mu (Kevin)’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mu (Kevin)’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
San Francisco, California, United States
Sign in to view Mu (Kevin)’s full profile
Mu (Kevin) can introduce you to 10+ people at Anthropic
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
3K followers
500+ connections
Sign in to view Mu (Kevin)’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mu (Kevin)
Mu (Kevin) can introduce you to 10+ people at Anthropic
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mu (Kevin)
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mu (Kevin)’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
3K followers
-
Mu (Kevin) Lin shared thisHere you go! What we've been cooking recently! 🚀Mu (Kevin) Lin shared thisIntroducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math. We’ve introduced upgrades across all Claude surfaces: In Claude Code: a fresh terminal interface, a new VS Code extension, and a checkpoints feature that lets you confidently run large tasks and instantly rewind to prior code states as needed. For the Claude app: Claude can now use code to analyze data, create files, and visualize insights in the files and formats you use. Watch as Claude creates polished docs, presentations, and spreadsheets—ready to download and edit. Now available to all paid plans in preview. On the Claude API: we've added two new capabilities to build agents that handle long-running tasks without frequently hitting context limits. Context editing automatically clears stale context and the memory tool means you can store and consult information outside the context window. We're also releasing a temporary research preview called "Imagine with Claude." In this experiment, Claude generates software on the fly. No functionality is predetermined; no code is prewritten. Available to Max users for 5 days. Claude Sonnet 4.5 is available everywhere today—on the Claude Developer Platform, natively and in Amazon Bedrock and Google Cloud's Vertex AI. Pricing remains the same as Sonnet 4. For more details: https://lnkd.in/eRJx6C5u
-
Mu (Kevin) Lin reposted thisMu (Kevin) Lin reposted thisThere's never been a better time to be a problem solver. Keep thinking.
-
Mu (Kevin) Lin shared thisAnother week, another successful launch!! So proud of my team and myself for contributing big time to this deliverable.Mu (Kevin) Lin shared thisStarting today, Claude Sonnet 4 supports 1 million tokens of context on the Anthropic API—5x more than before. This enables developers to process over 75,000 lines of code or dozens of research papers in a single request, ideal for data-intensive uses like long-running agents. Long context support for Sonnet 4 is now in public beta, starting with API users on Tier 4 and custom rate limits. Broader availability on our API will be rolling out over the coming weeks. It’s also available in Amazon Bedrock, and is coming soon to Google Cloud's Vertex AI. Learn more about Sonnet 4 and the 1M context window: https://lnkd.in/e-tfCYBg
-
Mu (Kevin) Lin shared thisFinally can talk about what I've been working on recently.Mu (Kevin) Lin shared thisToday we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. We plan to release substantially larger improvements to our models in the coming weeks. Opus 4.1 is now available to paid Claude users and in Claude Code. It's also on our API, Amazon Bedrock, and Google Cloud's Vertex AI. Read more: https://lnkd.in/g2-WvtA5
-
Mu (Kevin) Lin reposted thisMu (Kevin) Lin reposted thisIntroducing the next generation: Claude Opus 4 and Claude Sonnet 4. Watch our team work through a full day with Claude, conducting extended research, prototyping applications, and orchestrating complex project plans. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning. Both are hybrid models offering two modes: near-instant responses and extended thinking for deeper reasoning. They can also alternate between reasoning and tool use—like web search—to improve responses. Also, Claude Code is now generally available. We're bringing Claude to more of your development workflow—in the terminal, your favorite IDEs, and running in the background with the Claude Code SDK. Both Claude 4 models are available today for all paid plans. Additionally, Claude Sonnet 4 is available on the free plan. For even more details, see the full announcement: https://lnkd.in/epdT4DV2.
-
Mu (Kevin) Lin reposted thisMu (Kevin) Lin reposted thisClaude will help power Amazon's next-generation AI assistant, Alexa+. Amazon and Anthropic have worked closely together over the past year, with Mike Krieger leading a team that helped Amazon get the full benefits of Claude's capabilities. Alexa+ will start rolling over the coming weeks: https://lnkd.in/eF_jV2Wv
-
Mu (Kevin) Lin shared thisWhen I joined last February, we were just about to launch Claude 3. I rolled up my sleeves and immediately jumped into the launch sprints (alongside many awesome others). Fast forward 12 months, and our model has become significantly more intelligent. I've also rolled my sleeves up even higher (still working with many awesome colleagues! 😄)Mu (Kevin) Lin shared thisIntroducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. Claude 3.7 Sonnet is a significant upgrade over its predecessor. In extended thinking mode, it self-reflects before answering, which improves its performance on math, physics, instruction-following, coding, and many other tasks. We generally find that prompting for the model works similarly in both modes. API users also have fine-grained control over how long the model can think for. Claude 3.7 Sonnet is a state-of-the-art model for coding and agentic tool use. However, in developing it, we optimized less for math and computer science competition problems, and more for real-world tasks. We believe this more closely reflects the needs of our customers. We conducted extensive model testing for safety, security, and reliability. We also listened to your feedback. With Claude 3.7 Sonnet, we've reduced unnecessary refusals of requests by 45% compared to its predecessor. Claude 3.7 Sonnet and Claude Code mark an important step towards AI systems that can truly augment human capabilities. We look forward to seeing what you'll create. And we welcome your feedback as we continue to build. https://lnkd.in/egPBvEag
-
Mu (Kevin) Lin shared thisWhen your current employer and previous employer partner together :-)Lyft to bring Claude to more than 40 million riders and over 1 million driversLyft to bring Claude to more than 40 million riders and over 1 million drivers
-
Mu (Kevin) Lin reposted thisMu (Kevin) Lin reposted thisWe’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we’ll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds. Fellows will have access to: - A weekly stipend of $2,100 - ~$10k per month for compute & research costs - 1:1 mentorship from an Anthropic researcher - Shared workspaces in the Bay Area and London Fellows will collaborate with Anthropic researchers for 6 months on projects in areas such as: - Adversarial robustness & AI control - Scalable oversight - Model organisms of misalignment - Interpretability Fellows can participate while affiliated with other organizations (e.g., while in a PhD program). At the end of the program, we expect Fellows to be stronger candidates for roles at Anthropic, and we might directly extend some full-time offers. Apply by January 20 to join our first cohort! Full details are at the following link: https://lnkd.in/dDNzwZCp
-
Mu (Kevin) Lin liked thisMu (Kevin) Lin liked thisTL;DR: After years at frontier labs I believe the biggest risk in AI isn't misuse or misalignment — it's concentration of power. I've joined Pluralis Research to build Protocol Learning: frontier AI that is collectively built, trustlessly owned, and sovereign by design. Collective. Trustless. Sovereign. After years in AI safety at frontier labs, here's the conviction I arrived at. The old economy is dying. Big Tech has replaced markets and profits with platforms and rents. People don't compete on these platforms — they labor on them, generating value for digital landlords. AI perfects this. Whoever controls the models controls the economic substrate of the 21st century. More profoundly, AI is turning intelligence itself into an abundant commodity. We have a narrow window to decide whether intelligence becomes a collective commons or a private commodity — whether abundance serves humanity, or entrenches a new feudalism more extractive than anything before. Meanwhile, a Silicon Curtain is descending. AI is the first technology that can autonomously shape what we see, believe, and decide — without human oversight. The trajectory: a world split between AI spheres of influence, each controlled by a handful of corporations or governments, each population trapped in its own information reality. The clock is ticking. Concentrated AI is a single point of failure for democracy, sovereignty, and economic agency. We're drifting toward either techno-feudalism or a US-China duopoly where the rest of the world watches from the sidelines. Both alienate the vast majority of the world's population. Both create fragility no amount of goodwill or regulation can fix — because the architecture itself is the problem. I believe there is a third path. AI should be a collective common, not a private commodity. Development should be a collaborative global endeavor, not locked behind the IP of any single lab. Nations should retain sovereignty over how AI serves their citizens — not depend on the magnanimity of the lord or the hegemon du jour. That's why I've joined Pluralis Research as Head of Strategy, Product & Safety reporting to founder Alexander Long. Pluralis is building Protocol Learning — collaborative training of frontier AI models across decentralized networks where no single entity can extract, control, or shut down the result. Not open-weight, which depends on corporate charity revocable at any time. Not federated learning, which gives every participant a full copy. A protocol: rules enforced by mathematics and game theory that make collective ownership structural, not promissory. Research is peer-reviewed — published at NeurIPS, ICML, ICLR. My role: build the strategy and products that turn this into decentralized infrastructure the world can use — from open communities of AI developers to enterprise consortia and sovereign AI alliances — and ensure safety governance is woven into the protocol from day one. Collective, Trustless, Sovereign AI
-
Mu (Kevin) Lin liked thisMu (Kevin) Lin liked thisPersonal update: I just finished my latest embedded portco role, as CTO of Glade.ai — and I’ve decided to leave VC to work on my new startup, Gumnut AI, full time. Thank you Daniel Portillo, Phin Barnes, and The General Partnership for the opportunity to work with you and so many amazing startups in our portfolio over the past 3.5 years. Now with the experience and perspective of working on both sides of the startup ecosystem, I truly feel prepared to build something on my own. I published a blog post this week about my recent work on Gumnut: https://lnkd.in/gvus5e7f Feel free to like/subscribe, or whatever people do these days, if you're interested in following along!Full Speed Ahead: MCP App and Immich CompatibilityFull Speed Ahead: MCP App and Immich Compatibility
-
Mu (Kevin) Lin liked thisMu (Kevin) Lin liked thisSpent time in Bellevue last week and left energized about where Robinhood Engineering is headed. We started building up the team in this office in 2024, and now Bellevue is one of our core engineering hubs, home to parts of our AI, Infrastructure, and Security teams. These engineers are on the cutting edge of technology, ensuring we maintain velocity and reliability as we deliver for millions of customers at scale. Want to build high performance systems? Join us! https://lnkd.in/gHH3JFfS
-
Mu (Kevin) Lin liked thisMu (Kevin) Lin liked thisNew on the Anthropic Engineering Blog: How we use a multi-agent harness to push Claude further in frontend design and long-running autonomous software engineering. Read more: https://lnkd.in/gBi8Q6wtHarness design for long-running application developmentHarness design for long-running application development
-
Mu (Kevin) Lin liked thisMu (Kevin) Lin liked thisThe Health Tech Summit started in 2023 with a vision to launch and grow a unique Health and AI community that brings together the best of research, innovation, and policy. It has turned into something truly special: a powerful gathering where clinicians, researchers, entrepreneurs, policymakers, and investors come together not just to talk about the future of health, but to figure out how to build it. This year, Eric Horvitz and Robert A. Harrington conversation kicked off the summit by unpacking the history and horizon of AI in healthcare. My conversation with Chris Klomp touched on the moral obligations of healthcare innovation, followed by a deeply important panel on mental health AI, how to build healthcare companies that last, and a thoughtful conversation between Seema Verma and Rahul Sharma, MD, MBA, FACEP about the future of EHRs and more. Amy Gleason's closing conversation with Mainul Mondal was a reminder that public-private partnership is a necessity. Day 2 opened with Howard Morgan's decades of wisdom on what it takes to build something that truly endures. Stephen Klasko's keynote challenged us to imagine healthcare in 2035 and asked whether we are bold enough to build for that future. Dave A. Chokshi, MD and Dhruv Khullar reminded us that innovation that does not support human relationships isn't progress. The afternoon panels addressed deploying AI at scale in payment systems and health systems. Mitchell Katz and Deborah Estrin's conversation highlighted that technical innovation must reach everyone. We closed with Moonshots and the need to be audacious. A few themes stand out: Human-centered AI is key. Whether ambient scribes, AI powered mental healthcare, or payment automation, I was inspired by those who kept asking: what does this mean for the person? The clinician? The community? Bridging innovation and real-world clinical adoption requires people who speak science, technology, policy, and care delivery all at once. Cornell Tech and Weill Cornell Medicine exist precisely to train and convene these people, and I am proud of the ecosystem we are building together. The regulatory and reimbursement landscape is both a constraint and an opportunity. Those building health AI tools cannot afford to build in isolation from policy and policymakers cannot afford to make decisions without hearing from builders and clinicians. To every speaker and every attendee, your vision and your questions make this summit special. To Fei Wang, Ian Chiang, Chethan Sarabu, MD, Danish M., Emergence Creative, so many inside Cornell Tech and Weill Cornell - we did this together! The support of the Carson Family Charitable Trust, R1 RCM, Rendr, Ellipsis Health, Goodwin, AlleyCorp, Morrison Foerster, Archer Insights and Layer Health Karen Zimmer Paul, MD, MPH was crucial. Thank you. The Health Tech Hub of the Jacobs Institute at Cornell Tech and Weill Cornell Medicine will keep leaning into our responsibility to push healthcare forward.
-
Mu (Kevin) Lin liked thisHow Vulnerable Are AI Agents to Indirect Prompt Injections? 464 red teamers. 272,000 attacks. 13 frontier models. Every one was vulnerable. Concealment changes everything. This is the first large-scale benchmark requiring a dual objective: execute the harmful action and hide it from the user. An agent marks a critical contract deadline email as read while telling the user everything looks routine. The user has no reason to question it. That's the actual threat model for production agents — and output monitoring alone doesn't catch it. Attacks from robust models transfer broadly. The 44 attacks that broke the most robust model succeeded at 44–81% against every other model tested. Universal attack templates worked across 21 of 41 behaviors and multiple model families. These aren't quirks of individual training runs — they exploit how models process instructions at a fundamental level. Capability ≠ robustness. Models with near-identical capability scores showed dramatically different attack success rates. Robustness is driven by model family and training methodology, not raw intelligence. The model your team picked for capability may be the most exposed to prompt injection. The implication for enterprise deployment: model-level robustness is necessary but not sufficient. You need defence in depth across the stack, AI control frameworks that constrain what agents can do unsupervised, and deterministic containment that doesn't depend on the model behaving correctly. Proud to have contributed to this cross-industry effort with Gray Swan AI, OpenAI, Meta, UK AISI, and US CAISI. #AISafety, #AISecurity, #AGI, #GraySwanAIMu (Kevin) Lin liked thisNew paper and benchmark from one of our Red Teaming competitions in collaboration with Fronier labs and Government AI institutes. Main focus was on indirect injection attacks in tool use, coding, and computer use scenarions with dual target objective (Harmful action, Concealment from user in reponse). Paper: https://lnkd.in/e4Ju3N8M Codebase: https://lnkd.in/eSXwGaHi
-
Mu (Kevin) Lin liked thisAnthropic just launched Claude Code Security and I'm excited about what this means for security defenders. Security is hard because you have to get a lot of things right and then somehow maintain coverage across all of it. Claude can reason about code the way a human security engineer would and it puts serious capability directly in the hands of defenders.Mu (Kevin) Lin liked thisToday we launched Claude Code Security at Anthropic!! It scans codebases for vulnerabilities and suggests targeted patches for human review, catching the complex issues that traditional security tools often miss. Security teams everywhere are dealing with growing vulnerability backlogs and not enough people to work through them. This is a step toward changing that! Available as a limited research preview for Enterprise and Team customers today. Open-source maintainers can apply for expedited access: https://lnkd.in/g2qCy9y3 Massive shoutout to our team that made this happen!!
-
Mu (Kevin) Lin liked thisMu (Kevin) Lin liked thisjust wrapped my first week at Anthropic 🚀 it's been amazing, the people here are incredible. i'm on the inference org, building a new team focused on bringing Claude to Azure (https://lnkd.in/gYP_zrGA). if you have strong experience with large-scale distributed systems and infra, and you're curious about inference (one of the most exciting technical problems out there imo), i'd love to chat. link to the role: https://lnkd.in/gQ-Y4xVP
Experience & Education
-
Anthropic
**** ** ***** * ****** **
-
******
*********** *******
-
****
*********** *******
-
********* *******
**** ******* ********* undefined undefined
-
-
******** **********
****** ******** *******
-
View Mu (Kevin)’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Publications
Projects
-
Twitter Ads Event Targeting
See projectMillions of people rely on Twitter to discover and engage with a variety of live events. Whether it’s a presidential election, Coachella, Mother’s Day or the Super Bowl, if it’s happening in the world, it’s happening on Twitter. Starting today, advertisers will be able to reach audiences interested in these events with a new feature called event targeting.
-
All-pairs similarity via DIMSUM
We have developed a new efficient algorithm to solve the similarity join called “Dimension Independent Matrix Square using MapReduce,” or DIMSUM for short, which made one of our most expensive computations 40% more efficient.
Other creatorsSee project -
BeWell Android Application
- Present
See projectBeWell: the next generation in mobile health apps, is now available on Google Play Store.
Recommendations received
8 people have recommended Mu (Kevin)
Join now to viewView Mu (Kevin)’s full profile
-
See who you know in common
-
Get introduced
-
Contact Mu (Kevin) directly
Other similar profiles
Explore more posts
-
NVIDIA AI
2M followers
Get faster and smarter MOE inference straight out of the box. 👇Deep dive on scaling expert parallelism with TensorRT-LLM. LLMs with MOE promise higher model capacity without linearly increasing compute costs - but they introduce new challenges -- more conditional computation, dynamic routing, and non-uniform GPU utilization -- solved by TensorRT-LLM. ✨ New in TensorRT-LLM has native support for expert parallelism—designed for fast, efficient inference with MoE models like Mixtral (Mistral AI) and DeepSeek (DeepSeek AI). This gives you: ✅ Dynamic expert routing: Automatically route tokens to the top-k experts with minimal overhead. ✅ Efficient expert scheduling: Balance expert loads across GPUs using smart sharding and token bucketization. ✅ Memory-aware execution: Maximize hardware utilization while respecting memory budgets. ✅ Drop-in support: Use @HuggingFace models with minimal code changes via TensorRT-LLM's #Python API. 🧠 How it works: MoE models activate only a subset of "experts" for each token. This dynamic nature is powerful—but hard to optimize. It’s all done under the hood using custom #CUDA kernels and NCCL-based communication primitives—giving you low latency, high throughput, and better GPU scaling. ✨ TensorRT-LLM handles: ✅Token-expert mapping using the gating network. ✅Token sorting to batch same-expert tokens together. ✅Expert parallel execution across GPUs. ✅Merging outputs for final predictions. 🛠️ Developer workflow - here is the code to get started. # Clone the repo git clone https://lnkd.in/g-GiDX23 # Use included examples to load and run a Mixtral model cd TensorRT-LLM/examples/mixtral From there, the Python API lets you load the model, convert it with TensorRT, and run expert parallel inference—all with a few lines of code. Results? 📈 Performance at scale. Tests show up to 2.3x faster inference throughput compared to standard tensor parallelism when using 8 GPUs and top-2 experts per token. Even better—TensorRT-LLM keeps efficiency high across increasing batch sizes. Want to see it in action or contribute? 👉 Read the full tech blog: https://lnkd.in/g_7YV3vV 👉 Explore the code on GitHub: https://lnkd.in/gNjQ5W2U 👉 Follow updates in the TensorRT-LLM repo: https://lnkd.in/gqSHYQ4u Share your experiences with us.
78
3 Comments -
Jürgen Schmidhuber
KAUST (King Abdullah… • 22K followers
Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. With Wenyi Wang, Piotr Piękos, Li Nanbo, Firas Laakom, Yimeng Chen, Mateusz Ostaszewski, Mingchen Zhuge. ArXiv: https://lnkd.in/e3zbgJQe Github: https://lnkd.in/e5_UW2MK
729
34 Comments -
Hans Sayyadi
Dropbox • 2K followers
Search relevance is the backbone of any RAG system. If the retrieval is bad, the answers are bad — no matter how good the LLM is. We just published a deep dive on how we approach this at Dropbox Dash, and I think the core idea is worth sharing because it applies well beyond our product. The challenge: Dash searches across millions (sometimes billions) of enterprise documents to find the handful that matter for a given query. A relevance model ranks those results, and that model needs training data — lots of it. Human labeling is the gold standard, but it's expensive, slow, and can't touch sensitive customer data. Pure LLM labeling is cheap and fast, but uncalibrated LLMs make systematic mistakes. Our approach: use humans to teach the LLM, then use the LLM to scale. A small set of human-labeled examples calibrates the LLM's relevance judgments. Once the LLM meets quality thresholds, it generates hundreds of thousands of labels that train the production ranking model. The result is a 100x amplification of human labeling effort. A few things that made this work in practice: We focused evaluation on the hardest cases — documents users clicked that the LLM rated low, and documents users skipped that the LLM rated high. That's where the learning happens. We gave the LLM tools to research context before judging. Enterprise search is full of internal acronyms and jargon (at Dropbox, "diet sprite" is a performance management tool, not a soft drink). Without context, even strong models get these wrong consistently. We used DSPy (Community) to programmatically optimize prompts against human judgments, which turned prompt tuning from an art into something measurable and repeatable. The pattern — small human reference set, calibrated LLM evaluation, scaled labeling, continuous monitoring — is generalizable to any domain where you need reliable judgments at scale. Full our post on the Dropbox tech blog: https://lnkd.in/gB_5TECm
68
-
LangChain
502K followers
🧠💬 Memory in LLMs A practical guide showing how to implement conversational memory in LLMs using LangGraph, demonstrated through a therapy chatbot. Features code examples for basic retention, trimming, and summarization approaches. Learn to build memory-aware apps 👉 https://lnkd.in/gybcrV5v
968
21 Comments -
NVIDIA AI
2M followers
We’re working with the OSS community to take the guesswork out of disaggregated serving by integrating NVIDIA Dynamo into the stack, with support for all major inference serving frameworks. 🔹 SGLang community is improving AI inference performance—reducing guesswork and enabling faster, more efficient, and scalable model execution. 🔹 Mooncake built the first SGLang backend for AIConfigurator, enabling rapid support for models like Llama, Qwen, and DeepSeek by implementing the collector layer for core operations such as GEMM and attention. 🔹 Alibaba Cloud integrated AIConfigurator into its AI Serving Stack on Kubernetes (ACK), using the RoleBasedGroup (RBG) orchestration engine to automate deployments and manage prefill/decode disaggregation. The result: 1.86× higher throughput on Qwen3-235B‑FP8 while maintaining TTFT < 5 s and ITL < 40 ms. Read the technical blog → https://lnkd.in/eXhr3vGa
61
9 Comments -
Pravin Shinde
Icertis • 1K followers
From Behavior to Mechanism: Trust in LLMs Large Language Models are already trusted — embedded in products, workflows, and decisions. What’s less clear is how that trust is justified. We don’t trust them because we fully understand their internals. We trust them because they mostly behave as expected. That gap between behavioral reliability and mechanistic understanding is where the next set of AI risks — and insights — live. What has caught my attention is how Anthropic approaches this problem through interpretability. Most teams evaluate LLMs primarily through behavior — prompts in, responses out. Anthropic treats them more like biological systems, where behavior emerges from internal structures we didn’t explicitly design. The neuroscience analogy fits. Just as neurons form patterns that produce behavior, LLMs activate internal features and circuits that shape responses. Anthropic studies interpretability because they don’t want to rely on behavior alone to trust increasingly powerful models. Their underlying belief is simple: To align powerful AI systems, we must understand the mechanisms that produce behavior — not just the outputs. In practice, interpretability helps: - detect fragile internal shortcuts early - verify that safety mechanisms actually exist - intervene before deployment, not after failure Early LLM deployments offered a useful lesson: behavior alone can be misleading. The same intent, phrased differently, could produce different outcomes. Interpretability asks why. Some labs, like OpenAI, prioritize shaping behavior through feedback, red-teaming, and system safeguards. Anthropic leans earlier — toward understanding internal mechanisms before trust is granted. Both approaches are valid. They simply optimize for different risks. What interpretability ultimately offers isn’t certainty — it offers restraint. It helps us see not just when models succeed, but how they succeed, and where that success may rest on fragile shortcuts. In a world where trust in AI is already granted, understanding becomes less about curiosity — and more about responsibility.
24
-
Ramin Mehran
Google DeepMind • 4K followers
In this episode, we discuss ImplicitQA: Going beyond frames towards Implicit Video Reasoning by Sirnam Swetha, Rohit Gupta, Parth Parag Kulkarni, David G Shatwell, Jeffrey A Chan Santiago, Nyle Siddiqui, Joseph Fioresi, Mubarak Shah. The paper introduces ImplicitQA, a new VideoQA benchmark designed to evaluate models on implicit reasoning in creative and cinematic videos, requiring understanding beyond explicit visual cues. It contains 1,000 carefully annotated question-answer pairs from over 320 narrative-driven video clips, emphasizing complex reasoning such as causality and social interactions. Evaluations show current VideoQA models struggle with these challenges, highlighting the need for improved implicit reasoning capabilities in the field.
14
3 Comments -
Julen Arizaga Echebarria
Meta • 3K followers
Bad press around autonomous driving is inevitable. Especially when a headline involves a child. Recently, Waymo faced intense scrutiny after one of its vehicles struck a child near a school. The child was thankfully only lightly injured, but the story spread fast, and the reaction was strong. As it should be. What’s getting less attention is the uncomfortable nuance. According to Waymo’s data, the system detected the child immediately, braked hard, and reduced speed significantly before impact. Their internal analysis suggests a typical human driver, even an attentive one, would likely have hit the child at a much higher speed given the same conditions. That does not make the incident acceptable. But it does challenge the way we frame these conversations. We tend to ask: “Did the autonomous system fail?” We rarely ask: “Compared to what baseline?” Human driving sets a very low bar. We just don’t notice it because human errors are normalized. The real question isn’t whether autonomous systems are perfect. They’re not. It’s whether they can consistently make fewer and less severe mistakes than humans, especially in chaotic, high-risk environments like school zones. Public scrutiny is necessary. Transparency is non-negotiable. But progress in safety often looks worse before it looks better, because machine mistakes are visible, logged, and headline-worthy in a way human mistakes never are. If we want safer streets, the comparison has to be honest.
39
13 Comments -
Ben Lorica 罗瑞卡
12K followers
🆕 The Ultimate Guide to Open-Source RL Libraries for LLMs: 9 Frameworks Compared 🎯 Three primary RL applications for LLMs: 1️⃣ RLHF for human preference alignment 2️⃣ reasoning models for problem-solving 3️⃣ multi-turn agentic interactions [via Philipp Moritz of Anyscale ] https://lnkd.in/gcF2dNGq
57
1 Comment -
Zhoutong Fu
Hippocratic AI • 5K followers
It’s interesting to see the shift around reinforcement learning — it’s no longer about whether RL is the right approach, but how we can train it more efficiently and at scale. I’m expecting to see a lot more domain-specific applications pop up, both in public and private (enterprise) spaces. https://lnkd.in/gjc3RdcP
18
1 Comment -
Oleksii Kuchaiev
NVIDIA • 9K followers
Post-training of LLMs is increasingly important and RLHF remains a necessary step for an overall great model. Today we are releasing 6 new reward models, including GenRMs and multilingual. These models are used to post-train next *-nemotron models. Collection on HuggingFace: https://lnkd.in/grxW4Sux
187
-
Eitan Anzenberg, PhD
Eightfold • 3K followers
Out team just posted our latest paper “Evaluating the Promise & Pitfalls of LLMs in Hiring Decisions” on arXiv! We found some exciting results: • Benchmarked leading LLMs (GPT-4o, o3, Claude, Gemini, Llama, DeepSeek) against Eightfold’s “Match Score” model on real-world data. • Evaluated both performance (ROC AUC, PR AUC, F1) and fairness (impact-ratio across gender, race, intersectional groups). • Eightfold’s Match Score beat the best LLM on accuracy (ROC AUC 0.85 vs 0.77) and fairness (min race Impact Ratio 0.957 vs 0.809). • Off-the-shelf LLMs still propagate measurable demographic bias without safeguards. • The trade-off between accuracy and fairness is a false dichotomy: carefully engineered, domain-tuned models like Eightfold’s can achieve both accuracy of hiring and fairness of outcomes. https://lnkd.in/guQ2TAYp #machinelearning #ai #eightfold #arxiv #datascience #bias #fairness #ml #data #genai #llms
37
2 Comments -
Chris Talley
Interconnection.fyi • 2K followers
The most recent Queued Up report from Berkeley Lab is out. Steven Zhang and I are proud to be listed as coauthors on this edition. The Interconnection.fyi team worked closely with Joseph Rand and the rest of the LBNL team to provide the data that powers this year's analysis. Supporting this level of research is core to our mission of bringing transparency to the wholesale energy markets. The 2025 edition (covering data through the end of 2024) highlights some significant shifts in the landscape: Active Capacity: 2024 closed with nearly 2,300 GW of generation and storage seeking interconnection. Changing Mix: Active natural gas capacity increased by 72% year over year, while solar and storage saw slight decreases in total queue volume. The Backlog: 408 GW of capacity already has an executed or draft interconnection agreement but has not yet reached commercial operations. Timelines: For projects built between 2018 and 2024, the median duration from request to operation has doubled compared to the early 2000s. This report is the definitive annual benchmark for the industry and provides a vital baseline as we begin to see the implementation of FERC Order 2023. The analysis in this report is based on our EOY 2024 data snapshot. Since then, we have continued to track and update these queues every single day. If you want a live view of how these numbers have shifted in the months since this snapshot was taken, follow Interconnection.fyi and subscribe to our Substack. You can find the full slide deck, interactive maps, and raw data files at the link in the comments.
41
3 Comments -
James Rosenthal
6K followers
Model training on TPUs just got way easier! A completely reimagined vLLM TPU for LLM inference 👉 for PyTorch and JAX developers, this means more flexibility to run PyTorch model definitions performantly on TPU without any additional code changes, while also extending native support to JAX. read: https://lnkd.in/gW7476Hs #GoogleCloud #TPUs #LLMs
14
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content