Skip to content

John-Ge/awesome-foundation-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-Foundation-Model-Papers

A library of foundation models in computer vision, natural language processing and multi-modal learning. This repo mainly include pretraining methods, foundation models, fine-tuning methods and some projects etc.

Contributions are welcome!

本项目是一个视觉,语言和多模态基础模型的仓库。主要包括预训练方法,基础模型,微调方法和成熟的项目等。未来计划整理一些可以使用的开源模型和数据资源。

欢迎大家为项目贡献!

Computer Vision

Pretraining

  1. MAE: Masked Autoencoders Are Scalable Vision Learners. [paper] [code](Masked Autoencoders Are Scalable Vision Learners)
  2. EVA: Visual Representation Fantasies from BAAI. [01-paper] [02-paper] [code]
  3. Scaling Vision Transformers. [paper] [code]
  4. Scaling Vision Transformers to 22 Billion Parameters. [paper]
  5. Segment Anything. [paper] [code] [project]
  6. UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer. [paper] [code]

Generation

  1. Deep Floyd -IF [project]
  2. Consistency Models. [paper] [code]
  3. Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise. [paper] [code]
  4. Edit Anything. [code]
  5. GigaGAN: Scaling up GANs for Text-to-Image Synthesis. [paper]
  6. Parti: Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. [paper] [project]

Unified Architecture for Vision

  1. Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
  2. Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
  3. SegGPT: Segmenting Everything In Context. [paper] [code]
  4. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. [paper] [code]
  5. SAM: Segment Everything Everywhere All at Once. [paper] [paper]
  6. X-Decoder: Generalized Decoding for Pixel, Image, and Language. [paper] [code]
  7. Unicorn 🦄 : Towards Grand Unification of Object Tracking. [paper] [code]
  8. UniNeXt: Universal Instance Perception as Object Discovery and Retrieval. [paper] [code]
  9. OneFormer: One Transformer to Rule Universal Image Segmentation. [paper] [code]
  10. OpenSeeD: A Simple Framework for Open-Vocabulary Segmentation and Detection. [paper] [code]
  11. FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation. [paper] [code]
  12. Pix2seq: A language modeling framework for object detection. [v1-paper] [v2-paper] [code]
  13. TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding. [paper] [supplementary] [code]
  14. Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts. [paper]
  15. Fast Segment Anything. [paper] [code]

NLP Foundation Models

Pretraining

  1. GPT: Improving language understanding by generative pre-training.
  2. GPT-2: Language Models are Unsupervised Multitask Learners. [paper]
  3. GPT-3: Language Models are Few-Shot Learners [paper]
  4. GPT-4. [paper]
  5. LLaMA: Open and Efficient Foundation Language Models. [paper] [code]
  6. Pythia: Interpreting Autoregressive Transformers Across Time and Scale. [paper] [code]
  7. PaLM: Scaling Language Modeling with Pathways. [paper]
  8. RedPajama. [blog]
  9. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instruction [paper] [code]
  10. MPT. [blog] [code]
  11. BiLLa: A Bilingual LLaMA with Enhanced Reasoning Ability. [paper]
  12. OpenLLaMA: An Open Reproduction of LLaMA. [code]
  13. InternLM. [code]

Instruction Tuning

  1. InstructGPT: Training language models to follow instructions with human feedback. [paper] [blog]
  2. Principle-Driven Self-Alignment of Language Modelsfrom Scratch with Minimal Human Supervision. [paper] [code]
  3. Scaling instruction-finetuned language models. [paper]
  4. Self-Instruct: Aligning Language Model with Self Generated Instructions. [paper] [code]
  5. LIMA: Less Is More for Alignment. [paper]
  6. Orca: Progressive Learning from Complex Explanation Traces of GPT-4. [paper]
  7. WizardLM: An Instruction-following LLM Using Evol-Instruct. [paper] [code]
  8. QLoRA: Efficient Finetuning of Quantized LLMs. [paper] [code]
  9. Instruction Tuning with GPT-4. [paper] [code]

RLHF

  1. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback [paper] [code]
  2. RRHF: Rank Responses to Align Language Models with Human Feedback without tears. [paper] [code] [blog]
  3. Beaver. [code]
  4. MOSS-RLHF. [code]

Chat Models

  1. Stanford Alpaca: An Instruction-following LLaMA Model. [code]
  2. Alpaca LoRA. [code]
  3. Vicuna. [code]
  4. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. [code] [paper] [v2-paper]
  5. Stable Vicuna [project]
  6. Koala: A Dialogue Model for Academic Research. [paper] [code]
  7. Open-Assistant. [project]
  8. GPT4ALL. [code] [demo]
  9. ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human. [paper] [code]
  10. CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society. [paper] [code]
  11. MPTChat. [blog] [code]
  12. ChatGLM2 [code]

Chinese Support

  1. MOSS [code]
  2. Luotuo [code]
  3. Linly [code] [blog]
  4. FastChat-T5. [code]
  5. ChatGLM-6B. [code]
  6. Chat-RWKV. [code]
  7. baize. [paper] [code]

Multi-Modal Learning

Pretraining

  1. CLIP: Learning Transferable Visual Models From Natural Language Supervision. [paper] [code]
  2. ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation. [paper] [code]
  3. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. [paper] [code]
  4. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality [paper] [code] [dome] [blog]
  5. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [code]
  6. Kosmos-1: Language Is Not All You Need: Aligning Perception with Language Models. [paper] [code]
  7. Versatile Diffusion: Text, Images and Variations All in One Diffusion Model [code]
  8. LLaVA: Large Language and Vision Assistant. [paper] [project] [blog]
  9. PaLM-E: An Embodied Multimodal Language Model. [paper] [code]
  10. BEiT-3: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks. [paper]
  11. X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages. [paper]
  12. IMAGEBIND: One Embedding Space To Bind Them All. [paper] [code]
  13. PaLM 2. [paper]
  14. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. [paper]

Visual Chat Models

  1. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. [paper] [code]
  2. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. [code] [paper] [v2-paper]
  3. MMGPT:MultiModal-GPT: A Vision and Language Model for Dialogue with Humans. [paper] [code]
  4. InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language [paper] [code]
  5. VideoChat : Chat-Centric Video Understanding. [paper]
  6. Otter: A Multi-Modal Model with In-Context Instruction Tuning. [paper] [code]
  7. DetGPT: Detect What You Need via Reasoning. [paper]
  8. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. [paper]
  9. LLaVA: Large Language and Vision Assistant. [paper] [project] [blog]
  10. VisualGLM. [code]
  11. PandaGPT: One Model to Instruction-Follow Them All. [project]
  12. ChatSpot. [demo]

Datasets

  1. DataComp: In search of the next generation of multimodal datasets. [paper] [project]

Evaluation

  1. MME. [paper]
  2. Multimodal Chatbot Areana. [demo]

有一些更有影响力的仓库总结了大模型的相关工作:

Contributions

Contributions are welcome! Anyone interested in this program could send pull requests. I may list you as a contributor in this repo.

欢迎大家提交 pull request 来更新这个项目~我会将你列为项目的贡献者。

Citation

Please cite the repo if you find it useful.

@misc{chunjiang2023tobeawesome,
  author={Chunjiang Ge},
  title = {Awesome-Foundation-Model-Papers},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/John-Ge/awesome-foundation-models}},
}

About

A library of foundation models in computer vision and multi-modal learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors