user avatar
Daniel Vila Suero
@dvilasuero
ML & data @huggingface
Joined January 2011
Posts
  • Pinned
    user avatar
    Introducing @huggingface Sheets: Excel meets AI and unstructured data πŸ“ Run prompts and models over your data 🌐 Web search for accuracy and real-time information 🎯 Manual edits are used to improve generation πŸ’― Hundreds of open models and leading inference providers 1/2
    00:00
  • user avatar
    Parsr: Transforms PDF, Documents and Images into Enriched Structured Data Nice Python toolchain for cleaning, parsing and data extraction from documents. github.com/axa-group/Parsr #python #opensource #ocr
  • user avatar
    Yesterday, we released Notus-7B. Today, I want to share the process of building it. This is a great example of what the Open Source AI community can build together 🧡 πŸ‘‡
  • user avatar
    COSINE: Fine-tuning pre-trained LMs without any labeled data Fine-tune models with weak supervision only (+ unlabeled data), label denoising via contrastive self-training GitHub: github.com/yueyu1030/COSI… Paper: arxiv.org/abs/2010.07835 #python #opensource #NLProc #transformers
  • user avatar
    Rubrix: Python framework for Data-centric NLP Build your own data collection and quality workflows mixing weak supervision, active learning & hand labeling Works with any NLP library Follow @rubrixml for updates GitHub: github.com/recognai/rubrix #python #opensource #nlproc #ml
  • user avatar
    Rubrix: Python framework for Data-centric NLP Build text classifiers directly from user rules & labeling functions Use any weak supervision model (Snorkel, Flyingsquid, @PyTorch) github.com/recognai/rubrix Guide: rubrix.readthedocs.io/en/stable/guid… #python #opensource #NLProc #datascience
  • user avatar
    Rubrix: Python framework for data-centric NLP Log model predictions for model monitoring, custom dashboards, and production data collection Easy integration with the amazing @FastAPI library GitHub: github.com/recognai/rubrix #python #opensource #MLOps #nlproc
  • user avatar
    COSINE: Fine-tuning pre-trained LMs without any labeled data Fine-tune models with weak supervision only (+ unlabeled data), label denoising via contrastive self-training GitHub: github.com/yueyu1030/COSI… Paper: arxiv.org/abs/2010.07835 #python #opensource #NLProc #transformers
  • user avatar
    skweak: A Python toolkit for weak supervision applied to NLP tasks Cool library to label data programmatically for NER & text classification. Tightly integrated with @spacy_io GitHub: github.com/NorskRegnesent… Paper: arxiv.org/abs/2104.09683 #python #OpenSource #NLProc #ML
  • user avatar
    If you are GPU poor, make sure you become data rich. It will pay off More on this in a few days
  • user avatar
    Yesterday, I shared how we built Notus with the UltraFeedback dataset. Today I want to share how you can build and curate your own UltraFeedback datasets for fine-tuning custom LLMs. Using distilabel, our open-source framework and @argilla_io 🧡
    GIF
  • user avatar
    WeightWatcher: Tool for predicting the accuracy of Deep Neural Networks Is your model over-fitted/parameterized? Data-free model monitoring. Supports @TensorFlow,@PyTorch and πŸ€— transformers Follow @CalcCon for updates GitHub: github.com/CalculatedCont… #python #Opensource #ml
  • user avatar
    Fine-tuning a text classifier for your own use case? Iteratively build a training set & fine-tune a Hugging Face sentiment classifier with @rubrixml Tutorial: rubrix.readthedocs.io/en/master/tuto… GitHub: github.com/recognai/rubrix #python #nlproc #ml #datascience #transformers
  • user avatar
    Rubrix: Python framework for data-centric NLP Leverage zero-shot models for bootstrapping data annotation & weak supervision Works with any framework (Flair TARS, Hugging Face) GitHub: github.com/recognai/rubrix #python #opensource #nlproc See Flair's zero-shot NER example πŸ‘‡