Log inSign up
Tao Yu
496 posts
user avatar
Tao Yu
@taoyds
@XLangNLP lab, asst. prof. @HKUniversity. author of OpenCUA, OSWorld, Aguvis, Spider, OpenAgents, Text2Reward, Instructor.
Seattle
taoyds.github.io
Joined March 2016
912
Following
6,033
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    Tao Yu
    @taoyds
    Oct 14, 2022
    A new way to work w. LMs! Binder, an easy neuro-symbolic paradigm: 1.Parse input➡️SQL/Python bound w. GPT3 Codex API calls 2.Codex+PL interpreter execute➡️answer No train&few-shot!➡️SOTA 🆚chain-of-thought: interpretable&robust⬆️ 🆚NL2Code: coverage⬆️ lm-code-binder.github.io
  • user avatar
    Tao Yu
    @taoyds
    Nov 22, 2022
    💥New benchmark💥 ds1000-code-gen.github.io DS-1000, a data science code generation benchmark with 1K questions about 7🐍libraries. Spent ~1200 expert hours! It is the only one that 1⃣ focuses on everyday applications 2⃣ includes natural intents & contexts 3⃣has test cases 1/🧵
  • user avatar
    Tao Yu
    @taoyds
    Oct 12, 2023
    🚀🚀🚀Lots of people working on LM agents recently! Open models like Llama/CodeLlama not quite up to ChatGPT's level? Our 🎉Lemur🎉- SOTA open foundation models for language agents, matching ChatGPT on🤖15 agent tasks🤖! arxiv.org/abs/2310.06830 github.com/OpenLemur/Lemur
    user avatar
    Yiheng Xu
    @yihengxu_
    Oct 12, 2023
    1/ 🧵 🎉 Introducing Lemur-70B & Lemur-70B-Chat: 🚀Open & SOTA Foundation Models for Language Agents! The closest open model to GPT-3.5 on 🤖15 agent tasks🤖! 📄Paper: arxiv.org/abs/2310.06830 🤗Model @huggingface : huggingface.co/OpenLemur More details 👇
    50K
  • user avatar
    Tao Yu
    @taoyds
    Jan 28, 2022
    📣UnifiedSKG: Lots of #NLProc researchers separately study tasks that link text to structured knowledge (Table/DB/KB..). We unify 21 such tasks into a Seq2Seq format with T5 to foster idea sharing&multitasking, performing very competitive! Paper&Code: github.com/hkunlp/unified… 👇
  • user avatar
    Tao Yu
    @taoyds
    Oct 22, 2024
    🍅Excited to see @AnthropicAI using 🚀our OSWorld🚀(NeurIPS'24) to benchmark computer use! 🍋OSWorld will soon support parallel cloud running, much faster! 🍓More multimodal agent open-source big projects coming soon from @XLangNLP in Nov- stay tuned! 👇os-world.github.io
    A benchmark comparison table showing performance metrics for multiple AI models including Claude 3.5 Sonnet (new), Claude 3.5 Haiku, GPT-4o, and Gemini models across different tasks.
    user avatar
    Anthropic
    @AnthropicAI
    Oct 22, 2024
    Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.
    28K
  • user avatar
    Tao Yu
    @taoyds
    Mar 30, 2023
    In Memory of My beloved Ph.D. Advisor @dragomir_radev 🕯️R.I.P. 🕯️
    user avatar
    Harlan Krumholz
    @hmkyale
    Mar 30, 2023
    The #AI community, the #computerscience community, the @YaleSEAS community, and humanity have suddenly lost a remarkable person, @dragomir_radev - kind and brilliant, devoted to his family and friends... gone too soon. A sad day @Yale @YINSedge @YaleCompsci #NLP2023
    34K
  • user avatar
    Tao Yu
    @taoyds
    Oct 17, 2023
    Beyond our Lemur: OPEN LMs for language agents Introducing 💥OpenAgents💥: an OPEN platform for language agents in the wild! Analyze data, call plugins, control your browser as ChatGPT Plus, but with OPEN SOURCE code!! 📑: arxiv.org/abs/2310.10634 Code: github.com/xlang-ai/OpenA…
    00:47
    user avatar
    Zhoujun (Jorge) Cheng
    @ChengZhoujun
    Oct 17, 2023
    💥OpenAgents💥: an OPEN platform for language agents in the wild Analyze data, call plugins, control your browser as ChatGPT Plus, but with OPEN Code for 1⃣Easy deployment 2⃣Full stack 3⃣Chat Web UI 4⃣Agent methods 5⃣… arxiv.org/abs/2310.10634 Code: github.com/xlang-ai/OpenA… 👇
    42K
  • user avatar
    Tao Yu
    @taoyds
    Aug 10, 2023
    After 5 month dedicated work from >15 researchers & developers, we're thrilled to introduce 🚀OPEN-SOURCE language model Agents🚀! Try demos: chat.xlang.ai 🥑 Stay tuned for open-source code, model, framework, evaluation & more at github.com/xlang-ai!
    user avatar
    XLANG NLP Lab
    @XLangNLP
    Aug 10, 2023
    1/6🚀Announcing XLang language model (LM) Agents: 📊Data Agent: LM + code & data tools 🔧Plugins Agent: LM + 200+ API plugins 🌐Web Agent: LM + web control Try demo: chat.xlang.ai Stay tuned for open-source code & models github.com/xlang-ai See more examples!👇
    00:00
    28K
  • user avatar
    Tao Yu
    @taoyds
    Dec 8, 2024
    Text-to-SQL has been my passion since Yale Spider 1.0! But as LLMs master it, real-world complexity demands more. 🚀After a year of work, Spider 2.0 shows the gap: o1 achieves just 17%! The path to production deployment is still long but exciting! more👉spider2-sql.github.io
    user avatar
    XLANG NLP Lab
    @XLangNLP
    Dec 8, 2024
    🎉Announcing Spider 2.0 Text-to-SQL challenge in the LLM era! 6 years after our Yale Spider 1.0, we're pushing it forward with: 🍊Real complex cloud DBs (3000+ cols) 🍋Multi-dialect SQL complexity 🍎Agentic coding workflows 🧐Best o1 only solves 17%! 👉spider2-sql.github.io
    21K
  • user avatar
    Tao Yu
    @taoyds
    Apr 12, 2024
    🚀Multimodal agents is on rise in 2024! But even building app/domain-specific agent env is hard😰. Our real computer OSWorld env allows you to define agent tasks about arbitrary apps on diff. OS w.o crafting new envs. 🧐Benchmarked #VLMs on 369 OSWorld tasks: #GPT4V >> #Claude3
    01:00
    user avatar
    Tianbao Xie
    @TianbaoX
    Apr 12, 2024
    🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks
    35K
  • user avatar
    Tao Yu
    @taoyds
    Feb 16, 2024
    🚀Instructor🚀embeddings recently hit 2M downloads on @huggingface! Now, excited to introduce 🚀GritLM🚀, the first SINGLE LM achieving SoTA in BOTH text embedding (MTEB) & generative tasks (BBH etc)! Great team effort w. @Muennighoff & @hongjin_su! 📰: arxiv.org/abs/2402.09906👇
    user avatar
    Niklas Muennighoff
    @Muennighoff
    Feb 16, 2024
    Introducing GRIT🦾to unify text embedding 🔢& generation 📝. GritLM is open SoTA on embedding (MTEB) & generative tasks (BBH etc.) – Both in 1 model. See 🧵for how GRIT🦾 makes RAG >60% faster & more 📜arxiv.org/abs/2402.09906 💻github.com/ContextualAI/g… 1/12
    12K
  • user avatar
    Tao Yu
    @taoyds
    Jan 23, 2025
    When working on OSWorld (os-world.github.io), we drew inspiration from Universe (openai.com/index/universe). While OpenAI dropped the idea (maybe too early), we persisted despite similar concerns, driven by our passion. 🚀Glad to see OpenAI return (openai.com/index/computer…)!
    user avatar
    OpenAI
    @OpenAI
    Jan 23, 2025
    Introduction to Operator & Agents openai.com/index/introduc…
    14K
  • user avatar
    Tao Yu
    @taoyds
    Oct 15, 2022
    📢📢 Play with our Binder demo: huggingface.co/spaces/hkunlp/…! Binder: an easy but sota neural-symbolic built on GPT-3 Codex & SQL/Python interpreter. Inject GPT-3 Codex prompt API calls in programming languages!
    00:00
  • user avatar
    Tao Yu
    @taoyds
    Aug 13, 2021
    Life update: Thrilled to join @HKUniversity🇭🇰as an asst. prof. and build the HKU #NLProc lab(nlp.cs.hku.hk) with @ikekong. We have multiple openings for PhD/RA👨‍🔬! Come and visit us if you’re ever in HK🏙! Also, I’ll spend a year at @uwnlp working with @nlpnoah & Mari!
This post is unavailable.