Everyone's acting like models are ready to replace humans in work settings.
We put that to the test by creating an entire company and having 9 models act as a customer service agent handling 150 tickets and requests of increasing complexity.
Verdict: without common sense,
Surge AI
679 posts
Our mission is to raise AGI with the richness of humanity — curious, witty, imaginative, and full of breathtaking brilliance.
Joined June 2020
- RLHF helps to build state-of-the-art models like ChatGPT. Did you know that training RLHF LLMs involve 4 key steps? Here’s an illustrated guide to the process: 1 of 5
- Awesome RLHF Nice collection of research papers for RLHF, including code links, datasets, blogs, etc. github.com/opendilab/awes…
- Open-source RLHF implementations are on the rise! DeepSpeed Chat and ColossalChat are two open-source RLHF pipeline implementations announced in just the past couple of weeks. Here’s why they matter:
- Every week we cover key papers in RLHF LLMs. Last week we covered InstructGPT, and it got a lot of interest. We continue this week with DeepMind’s GopherCite paper. Here’s what you need to know in 5 tweets:
- Human feedback is important to train safe and helpful LLMs. Did you know there is now a large taxonomy of methods that leverage human feedback? This recent paper provides a comprehensive overview of recent methods. arxiv.org/abs/2305.00955
- Every week we cover key papers in RLHF and LLMs. Today’s paper explores whether humans + LLMs working together can outperform either alone on difficult tasks. Here’s the paper summary in 5 tweets:
- Every week we cover key papers in RLHF and LLMs. Today’s paper explores how RLHF can help train helpful and harmless assistants. Here’s the paper summary in 5 tweets:
- State of GPT and RLHF LLMs Great talk by @karpathy on the state of LLMs and the RLHF training pipeline: build.microsoft.com/en-US/sessions… Here are a few additional readings to learn more about RLHF LLMs:
- Every week we cover key papers in RLHF and LLMs. Today’s paper explores how instruction finetuning can help improve the performance and usability of pretrained language models. Here’s the summary in 5 tweets:
- Every week we cover key papers in RLHF and LLMs. Today we explore WebGPT - browser-assisted question-answering with human feedback. Here’s the paper summary in 5 tweets:
- What are the benefits of training LLMs with RLHF? The best way to show this is with examples. Let’s have a look:
- Last week, we covered key papers in RLHF LLMs. It got a lot of interest, so we will do a few paper explainers. This time we discuss InstructGPT:
- Brief History of RLHF LLMs Here are 5 important works to help you learn about RLHF LLMs:












