Kianté Brantley (@xkianteb) / X

Kianté Brantley

2,665 posts

Kianté Brantley

@xkianteb

Assistant Professor at Harvard University @KempnerInst and SEAS | Fitness enthusiast | (He/Him/His)

Joined May 2009

Pinned
Kianté Brantley
@xkianteb
Feb 27
Does LLM RL post-training need to be on-policy?
00:00
113K
Kianté Brantley
@xkianteb
Nov 15, 2024
I am recruiting PhD students to join my lab at Harvard in Fall 2025! (deadline Dec 15) If you are interested in solving problems at the intersection of reinforcement learning, imitation learning, and NLP, pls consider applying (bit.ly/4fnficx)! @hseas @KempnerInst
49K
Kianté Brantley
@xkianteb
Jun 30, 2023
New paper! Learning to Generate Better Than Your LLM (arxiv.org/abs/2306.11816) RLHF has become a powerful paradigm for fine-tuning LLM, but we only use general-purpose RL algorithms. We introduce new algorithmic paradigm that takes advantage of additional feedback for learning.
63K
Kianté Brantley
@xkianteb
Dec 9, 2021
I passed my dissertation defense today - I am officially Dr. Kianté Brantley. Though I officially graduated from @UMD @Clip, I very much consider myself an unofficial graduate and member of the @nyu @CILVRatNYU family. Thank you to those who supported me, including @MSFTResearch
Kianté Brantley
@xkianteb
Jun 17, 2020
I am very grateful for the support. Congrats to all the other awardees!
Microsoft Research
@MSFTResearch
Jun 17, 2020
From reducing sample complexity in RL to making gig platforms more inclusive for people w/ chronic illness and/or disabilities, the research represented by this year’s Microsoft Research Dissertation Grant recipients is cutting-edge. Learn about the work: aka.ms/AA8qpov
Kianté Brantley
@xkianteb
May 28, 2020
New #acl2020nlp paper "Active Imitation Learning with Noisy Guidance" We reduce the number of expert annotations needed for imitation learning by incorporating a heuristic function (e.g. gazetteers) using the classic active learning "Apple Tasting" framework.
Kianté Brantley
@xkianteb
Mar 5, 2021
What is the "right" embedding space for prediction, reinforcement learning, imitation learning, and planning? We try to tackle this problem in our AAAI paper -- Successor Feature Sets: Generalizing Successor Representations Across Policies (arxiv.org/pdf/2103.02650…)
Kianté Brantley
@xkianteb
Apr 30, 2020
The covariate shift problem has been a fundamental issue in imitation learning. We use disagreement among an ensemble of behaviour cloning policies to reduce covariate shift. Joint work with @HenaffMikael and Wen Sun. Paper: bit.ly/3bTZVaL Talk: bit.ly/2SmGQ9o
Kianté Brantley
@xkianteb
Jun 25, 2019
In RL, there are many ways to inject knowledge into algorithms in order to make training feasible (e.g. reward shaping/hacking, demonstration data, etc.). However, many key aspects of the desired behavior are more naturally expressed as constraints. (arxiv.org/abs/1906.09323)
arxiv.org
Reinforcement Learning with Convex Constraints
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For...
Kianté Brantley
@xkianteb
Apr 16, 2024
(1/N) New paper! Dataset Reset Policy Optimization for RLHF (arxiv.org/pdf/2404.08495…) RLHF is a popular paradigm for fine-tuning generative models. But the question is, can we design algorithms that take advantage of additional properties of the RLHF framework?
9K
Kianté Brantley
@xkianteb
Nov 14, 2023
We are excited to share our new RLHF library - TRIL - which provides tools to train LLM with reinforcement learning, imitation learning, and inverse reinforcement learning algorithms at scale! TRIL:
Jonathan Chang
@j_nadan_chang
Nov 14, 2023
Announcing 📣 an update to our paper "Learning to Search Better than Your LLM" and our new Transformers Reinforcement and Imitation Learning Library (TRIL)! Paper: arxiv.org/abs/2306.11816 Code: github.com/Cornell-RL/tril
GitHub - Cornell-RL/tril
From github.com
9.6K
Kianté Brantley
@xkianteb
Feb 4, 2021
New AAAI! Successor Feature Sets: Generalizing Successor Representations Across Policies — motivation: what is the "right" representation of the world for prediction, imitation, and planning? (In terms of our understanding, rather than efficiency or learnability)(1/N)
Kianté Brantley
@xkianteb
Sep 28, 2024
Yisong has some really good tips for CS faculty applications. I used them when I was applying last cycle.
Yisong Yue
@yisongyue
Sep 28, 2024
Just updated my Tips for CS Faculty Applications. Best of luck to everyone applying! yisongyue.medium.com/checklist-of-t…
9.3K
Kianté Brantley
@xkianteb
Oct 28, 2020
I’m co-organizing Interactive Learning for Natural Language Processing. Please vote for our proposal. Thanks!
NAACL HLT 2027
@naaclmeeting
Oct 22, 2020
Please vote for the workshop proposals for EACL/ACL-IJCNLP/EMNLP/ NAACL-HLT 2021 forms.gle/kkfsQZjjs2hFYi… @acl2020 @naacl @allenai_org @uwnlp @ACL_NLP #ACL2020 #naacl #NLP #ACL_NLP -- The EACL/ ACL-IJCNLP / EMNLP / NAACL-HLT 2021 Workshop chairs