Jens Tuyls (@JensTuyls) / X

Jens Tuyls

145 posts

Jens Tuyls

@JensTuyls

PhD @PrincetonCS. Previously CS & Eng. @UCIrvine. Studying AI, ML, RL, NLP.

Silicon Valley, CA

Joined June 2016

Pinned
Jens Tuyls
@JensTuyls
Oct 14, 2025
Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!
22K
Jens Tuyls
@JensTuyls
Jul 19, 2023
Imitation learning is one of the most widely used methods in ML, but how does compute affect its performance? We explore this question in the challenging game of NetHack and find our scaled-up agent to outperform prior SOTA by 2x! arxiv.org/abs/2307.09423 [1/6]
22K
Jens Tuyls
@JensTuyls
Feb 14, 2022
How can RL agents deal with both sparse rewards and large, dynamic action spaces – a key challenge in text games? Our method eXploit-Then-eXplore (XTX) tackles these challenges and achieves a more than 2x improvement on Zork! arxiv.org/abs/2201.01251 #ICLR2022 Spotlight 📜[1/5]
Jens Tuyls
@JensTuyls
Dec 11, 2023
I’ll be at @NeurIPSConf this week! Feel free to reach out if you’d like to chat about anything scale in RL/IL, language agents (or broadly RL + NLP), or game theory!
2.8K
Jens Tuyls
@JensTuyls
Aug 30, 2016
Loving the new Alexa Skills Kit SDK for Node JS! github.com/alexa/alexa-sk… @alexadevs @amazonecho @AmazonAlexa #amazonecho
Jens Tuyls
@JensTuyls
Jul 19, 2023
Replying to @JensTuyls
See all of this and more in: Scaling Laws for Imitation Learning in NetHack by @JensTuyls, @DhruvMadeka, Kari Torkkola, Dean Foster, @karthik_r_n, @ShamKakade6 Paper: arxiv.org/abs/2307.09423 Project page: coming soon!
948
Jens Tuyls
@JensTuyls
Jul 19, 2023
Replying to @JensTuyls
More broadly, our results call for work in the larger IL and RL community to more carefully consider the role of scaling laws, which could provide large improvements in many other domains. Also check out prior work by @openai: arxiv.org/abs/2301.13442. [5/6]
526
Jens Tuyls
@JensTuyls
Jul 19, 2023
Replying to @JensTuyls
We train a suite of neural NetHack agents with different model sizes using Behavioral Cloning (BC) and analyze the loss and mean return isoFLOP profiles. We find both BC loss and mean return to follow clear power law trends with respect to FLOPs. [3/6]
462
Jens Tuyls
@JensTuyls
Jul 19, 2023
Replying to @JensTuyls
Using these power laws, we forecast the model and data size needed to train an agent aimed at recovering the underlying expert. While our agent falls short of expert performance, it sets a new SOTA (2.7K) in the unsolved game of NetHack, surpassing the prior best by 2x! [4/6]
368
Jens Tuyls
@JensTuyls
Jul 19, 2023
Replying to @JensTuyls
Prior works have found IL to consistently underperform the data-generating policy. However, these works often overlook the role of compute in terms of model and data size. Inspired by work around LLMs, we see if scaling up IL can provide similar performance gains. [2/6]
460
Jens Tuyls
@JensTuyls
Jul 8, 2016
Black smoke over the bay. What's happening? @ABC @CNN @CBSNews #fireInTheBay
Jens Tuyls
@JensTuyls
Feb 14, 2022
Replying to @JensTuyls
See all of this and more in: Multi-Stage Episodic Control for Strategic Exploration in Text Games By @JensTuyls, @ShunyuYao12, @ShamKakade6, @karthik_r_n Paper: arxiv.org/abs/2201.01251 Project page: sites.google.com/princeton.edu/… Code: github.com/princeton-nlp/…
arxiv.org
Multi-Stage Episodic Control for Strategic Exploration in Text Games
Text adventure games present unique challenges to reinforcement learning methods due to their combinatorially large action spaces and sparse rewards. The interplay of these two factors is...
Jens Tuyls
@JensTuyls
Feb 14, 2022
Replying to @JensTuyls
XTX employs a two-stage rollout in each episode to tackle these: (1) An *exploitation* policy trained on promising past trajectories returns to the frontier. (2) An *exploration* policy that uses past experience and curiosity explores the frontier. [3/5]
Jens Tuyls
@JensTuyls
Feb 14, 2022
Replying to @JensTuyls
XTX outperforms several competitive baselines across 12 games in the Jericho benchmark (avg norm. scores across games in fig) in both the deterministic and stochastic setting, showing the strength of our multi-stage approach with strategic exploration at the frontier. [4/5]