HuggingGPT
-Human requests something
-ChatGPT
1 Plans tasks
2 Selects AI models based on HuggingFace descriptions
3 Manages cooperation of expert models to execute subtasks
4 Summarizes results
Covers many sophisticated tasks across modalities & domains
arxiv.org/abs/2303.17580
John Nay
1,813 posts
founder & CEO of Norm Ai //
founding CEO of Brooklyn AI (acquired by TIAA Nuveen) //
more at linkedin.com/in/johnjnay/
- LLMs Are Better Than Human Data Annotators -GPT-3 was helpful but not better than humans (e.g. arxiv.org/abs/2108.13487) -GPT-3.5 is about on par w/ humans (e.g. arxiv.org/abs/2303.16854 w/ self-explanations) -GPT-4 is better than $25/hr humans (e.g. arxiv.org/abs/2304.03279)
- French researchers converted their tax code into computer code Compiles to Python & provides insights about "essence" of France's income tax computations Government is officially transitioning to this for production. Paper: arxiv.org/abs/2011.07966 Code: github.com/MLanguage/mlang
- LLMs as Generative Agents in Social Simulations Augment LLM: -Store record of agent's "life" -Synthesize its memories into reflections -Retrieve memories dynamically & plan Simulations produce human-like individual behavior & emergent social interaction arxiv.org/abs/2304.03442
- AI research output from prolific institutions: - Google - Microsoft - Stanford - Meta - Amazon - DeepMind - OpenAI
- LLMs are exhibiting emergent behaviors at scale (for e.g. see @_jasonwei's jasonwei.net/blog/emergence) In this context, revisiting books on emergence of social, economic & biological phenomena Complexity science may have a resurgence
- Forums for LLM Agents to Communicate Can Improve Outputs 1) Human provides task 2) "Decider" Agent produces output 3) "Researcher" & Decider Agents discuss 4) Decider decides Big improvement over base GPT4 on medical summarization & care plan generation arxiv.org/abs/2303.17071
- LLMs Can Iteratively Self-Refine -LLM creates draft -Provides its own feedback -Iteratively refines On all 7 eval tasks (review & code rewriting toxicity removal responses acronyms stories etc.) outputs are preferred by humans & by automated metrics arxiv.org/abs/2303.17651
- Gisting: 26x Compression of LLM Prompts -Trains LLM to compress prompts into smaller sets of "gist" tokens to be reused for compute efficiency -Can be easily trained as part of instruction fine-tuning -FLOPs reductions, time speedups & storage savings arxiv.org/abs/2304.08467
- Reflection-Based GPT-4 Agent is State-of-the-Art on Code Gen Iteratively refines code, shifting “accuracy bottleneck” from correct code gen to correct test gen HumanEval accuracy: -Reflexion-based GPT-4 88% -GPT-4 67.0% -CodeT 65.8% -PaLM 26.2% Code: github.com/noahshinn024/r…A Self-Reflecting LLM Agent Equips LLM-based agent w/ -dynamic memory -a self-reflective LLM -a method for detecting hallucinations Challenge agent to learn from its own mistakes -Evaluate on knowledge-intensive tasks -Outperforms ReAct agents Paper: arxiv.org/abs/2303.11366
- ChatGPT for Training Data 1 ChatGPT rephrases each training sentence into multiple conceptually similar but semantically different sentences 2 Train smaller model Outperforms SoTA data augmentation methods for few-shot learning text classification Paper arxiv.org/abs/2302.13007
- LlamaAcademy: Fine-tuning LLMs to Learn How to Talk to APIs Pipeline: -Crawling -GPT-4 data gen -Fine-tuning Vicuna-13B on synthetic data LLM can then read new API docs (Stripe Notion etc), gen code Instead of hosting API docs, host API implementation github.com/danielgross/Ll…
- Simple Self-Improvement of Code LLMs 1) Pre-train & Fine-tune code LLM, gaining knowledge 2) LLM then generates pseudo outputs 3) Add that to original data & train for next epoch Significantly improves code summarization & code generation performance arxiv.org/abs/2304.01228
- A Self-Reflecting LLM Agent Equips LLM-based agent w/ -dynamic memory -a self-reflective LLM -a method for detecting hallucinations Challenge agent to learn from its own mistakes -Evaluate on knowledge-intensive tasks -Outperforms ReAct agents Paper: arxiv.org/abs/2303.11366















