Log inSign up
Yuandong Tian
1,141 posts
user avatar
Yuandong Tian
@tydsh
Co-founder of @Recursive_SI. ex-Meta FAIR Director. ex-Google. Reasoning, Optimization and Understanding LLM. Novelist in spare time. PhD in @CMU_Robotics.
California, USA
yuandong-tian.com
Joined December 2009
941
Following
44.5K
Followers
  • Pinned
    user avatar
    Yuandong Tian
    @tydsh
    Jun 11
    Early results from Recursive 🚀🚀 SotA results from our open-ended knowledge discovery system: 1️⃣NanoChat 5min pre-training (0.9372 bpb -> 0.9109 bpb, 2.8% lower Bits-Per-Byte than long-standing community SoTA) 2️⃣NanoGPT SpeedRun (79.7s -> 77.5s, 2.8% faster than long-standing
    user avatar
    Recursive
    @Recursive_SI
    Jun 11
    Article cover image
    Article
    First Steps Toward Automated AI Research
    Early results from Recursive’s automated AI research system on model training and GPU kernel benchmarks Today we are releasing early results from Recursive’s automated AI research system. Across three...
    57K
  • user avatar
    Yuandong Tian
    @tydsh
    Oct 23, 2025
    Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
    4.4M
  • user avatar
    Yuandong Tian
    @tydsh
    Oct 23, 2025
    I am really sorry Jiaxun😓. I wrote a recommendation letter for you to join Meta, and the team you joined promised you to do RL. But you ended up not doing anything related to RL, involved in never-ending reorgs, and got impacted after a few months as a fresh new hire!
    user avatar
    Jiaxun Cui 🐿️
    @cuijiaxun
    Oct 23, 2025
    Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)
    772K
  • user avatar
    Yuandong Tian
    @tydsh
    Jan 15, 2025
    Our Coconut work (learning continuous latent CoT) has opened sourced now. Welcome to play with it:
    GitHub - facebookresearch/coconut: Training Large Language Model to Reason in a Continuous Latent...
    From github.com
    161K
  • user avatar
    Yuandong Tian
    @tydsh
    Jun 18, 2025
    📢We show that continuous latent reasoning has a theoretical advantage over discrete token reasoning (arxiv.org/abs/2505.12514): For a graph with n vertices and graph diameter D, a two-layer transformer with D steps of continuous CoTs can solve the directed graph reachability
    arXiv logo
    arxiv.org
    Reasoning by Superposition: A Theoretical Perspective on Chain of...
    Large Language Models (LLMs) have demonstrated remarkable performance in many applications, including challenging reasoning problems via chain-of-thoughts (CoTs) techniques that generate...
    331K
  • user avatar
    Yuandong Tian
    @tydsh
    Oct 23, 2025
    Replying to @daemonzhang6
    That's the problem. People who are responsible for the issues are not the people who got laid off😅 In January, our team put down all the research we are currently doing, was (forced?) to move to GenAI <2 months before the llama 4 release deadline to help with all the
    272K
  • user avatar
    Yuandong Tian
    @tydsh
    Nov 12, 2025
    Hats off to @ylecun! FAIR shaped my career, period. I truly thanks @AIatMeta and FAIR to provide such a nice place for independent exploration and open research! End of an era and forever remember.
    Meta chief AI scientist Yann LeCun plans to exit and launch own start-up
    From ft.com
    181K
  • user avatar
    Yuandong Tian
    @tydsh
    Nov 24, 2023
    How likely is the hypothesis that Q* = Q-learning + A*? From my past experience on OpenGo (reproduction of AlphaZero), A* can be regarded as a deterministic version of MCTS with value (i.e., heuristic) function Q only. This should be suitable for tasks in which the state is easy
    359K
  • user avatar
    Yuandong Tian
    @tydsh
    Sep 17, 2024
    While CoT is super useful, I kindly disagree that blindly scaling it up is all we need. The paper proposes a universal approximation theorem by explicitly constructing Transformer weights to fit to the family of tasks. Although the depth can be constant, the length of CoT can
    user avatar
    Denny Zhou
    @denny_zhou
    Sep 16, 2024
    What is the performance limit when scaling LLM inference? Sky's the limit. We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.
    253K
  • user avatar
    Yuandong Tian
    @tydsh
    Jan 2, 2020
    Over the last decade: - Got PhD from CMU Robotics. - Start academic career and published the majority of current publication list. - 6 oral talks in top-tier conferences with 1 Marr Prize Honorable Mentions (1st author in ICCV 13). - Married and a house owner. - Wrote one novel.
  • user avatar
    Yuandong Tian
    @tydsh
    Jun 11, 2024
    I am looking for one fall'24 intern, along the direction of LLM+reasoning/planning. The intern shall start before Nov this year. If you have interest, please contact [email protected]. Thanks!
    136K
  • user avatar
    Yuandong Tian
    @tydsh
    Dec 14, 2024
    Unbelievable... This is explicit racial bias. How could this happen in NeurIPS? How could this be spoken by a top university professor, an invited keynote speaker?
    user avatar
    Jiao Sun
    @sunjiao123sun_
    Dec 14, 2024
    Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
    37K
  • user avatar
    Yuandong Tian
    @tydsh
    Feb 18, 2024
    One interesting component of Sora is that, the video is NOT created by next-frame prediction, but by constructing and refining latent token sequence first and then decoded them back. Doing planning/search in learnable latent space, rather than original space has its unique
    134K
  • user avatar
    Yuandong Tian
    @tydsh
    Feb 14, 2025
    Our new work Spectral Journey arxiv.org/abs/2502.08794 shows a surprising finding: when a 2-layer Transformer is learned to predict the shortest path of a given graph, 1️⃣it first implicitly computes the spectral embedding for each edge, i.e. eigenvectors of Normalized Graph
    arXiv logo
    arxiv.org
    Spectral Journey: How Transformers Predict the Shortest Path
    Decoder-only transformers lead to a step-change in capability of large language models. However, opinions are mixed as to whether they are really planning or reasoning. A path to making progress...
    55K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up