Argilla (@argilla

Argilla

1,611 posts

Argilla

@argilla_io

Making AI data go brrrr (acquired by 🤗 Hugging Face)

World

github.com/argilla-io

Joined August 2021

Following

4,282

Followers

Pinned
Argilla
@argilla_io
Jun 13, 2024
Today is a big day for Argilla and the Open Source AI community: We’re joining @huggingface🤗 ! Time to double down on community and data-centric, open source AI Read all about it and ask questions:
@dvilasuero on Hugging Face: "Today is a huge day in Argilla’s history. We couldn’t be more excited...
From huggingface.co
41K
Argilla
@argilla_io
Dec 27, 2023
🚀 Open-source AI strikes again! Announcing Notux 8x7B, a fine-tune of Mixtral Instruct with high-quality chat data and DPO. Notux now the top ranked MoE on the Open LLM leaderboard. huggingface.co/argilla/notux-…
71K
Argilla
@argilla_io
Feb 26, 2024
🚀🧙🏼‍♂️Introducing OpenHermesPreferences: the largest open dataset for RLHF & DPO Built together with the @huggingface H4 team, it's a 1M preferences dataset on top of the amazing @teknium's dataset. huggingface.co/datasets/argil… Let's dive in! 🧵
86K
Argilla
@argilla_io
Nov 21, 2021
Build a news classifier from scratch with weak supervision 1. Programatically label 38.000 examples with rules and Snorkel. 2. Train a downstream classifier with scikit-learn to achieve 0.81 macro avg. f1-score. Tutorial link below 👇 #nlproc #ml #datascience #opensource
Argilla
@argilla_io
Dec 1, 2023
🔥Open-source, open-science, and data curation for the win! Meet Notus 7B, a new LLM tuned with DPO on a new curated UltraFeedback dataset, surpassing Zephyr and Claude 2 on AlpacaEval. Built on the shoulders of giants: 🙌@huggingface Alignment Handbook
Meet Notus-7B: Data Curation and Open Science go a long way in shaping AI's future
From argilla.io
90K
Argilla
@argilla_io
Jan 10, 2024
🔥 More is less for DPO, high quality matters! 📢 Dropping our first open dataset and LLM of the year: 💾Meet distilabel Orca Pairs DPO, an improved version of the now famous dataset from @intel 🏛️And a new OpenHermes model outperforming baselines with 54% less DPO pairs 🧵
57K
Argilla
@argilla_io
Feb 10, 2024
🔥 Introducing a new open dataset for the Open Source AI community: OpenHermes2.5-dpo-binarized-alpha built atop the amazing dataset by @teknium huggingface.co/datasets/argil… This time we use OSS models for everything, even for the preference step! 🧵
28K
Argilla
@argilla_io
Nov 28, 2021
Training a text classifier without labelled data using @PyTorch End-to-end weak supervision with Weasel & @huggingface transformers Guide and more details below 👇 #nlproc #datascience #python #opensource
Argilla
@argilla_io
Feb 7, 2023
🚀Data labeling from the @huggingface Hub is here. No more excuses to build great NLP datasets! argilla.io/blog/launching…
110K
Argilla
@argilla_io
Feb 27, 2024
🤖Yesterday, we shared a large AI feedback dataset 👩‍💻Today, @argilla_io & @huggingface are thrilled to release a high-quality human feedback dataset, built with and for the community! 10K Prompts Ranked: +14K human ratings from +300 contributors! huggingface.co/datasets/DIBT/…
30K
Argilla
@argilla_io
Jan 31, 2024
🚀 The OSS AI community needs more open datasets for improving LLMs: 🎁 Excited to ship a new open DPO dataset for boosting chat models: ⚗️ distilabel capybara-dpo, a multi-turn preference dataset built atop the awesome dataset by @ldjconfirmed huggingface.co/datasets/argil… 🧵
17K
Argilla
@argilla_io
Jan 7, 2024
⚗️ distilabel, our OSS framework for building LLM finetuning & alignment datasets just crossed 300 stars Let's do a quick overview of its current features
GitHub - argilla-io/distilabel: Distilabel is a framework for synthetic data and AI feedback for...
From github.com
15K
Argilla
@argilla_io
Feb 14, 2024
We're building a high-quality prompt dataset together with the community. The result will be published with an open, commercial-friendly license, anyone can use to build eval, sft and DPO datasets. We need your help! This is how simple is to contribute: huggingface.co/spaces/DIBT/pr…
00:00
21K
Argilla
@argilla_io
Jan 4, 2024
🔦 In this paper released by Apple, they introduced an efficient LLM inference model for devices with limited memory, showing inference speeds of 4-5x in CPU, and 20-25x in GPU. Argilla's GitHub: buff.ly/3tmZOmm distilabel: buff.ly/41DzWzk #nlproc #llms
12K