Pinned
Very excited to announce BeyondWeb, @datologyAI’s synthetic pretraining data generation paradigm. BeyondWeb is a rephrasing-based approach that substantially outperforms existing public synthetic pretraining data baselines, and is a core part of our curation pipeline.
















