Datawizz

Datawizz · 2025-04-30T16:03:19.246Z

Are newer AI models hallucinating MORE? Hallucinations have plagued early AI models since the first research LLMs were introduced - but many assumed that this is a technical bug that will get solved with newer version. New reporting from OpenAI suggest that newer models - especially reasoning models - actually hallucinate more, not less. It is apparent that hallucinations are not a "bug" per-se, but a byproduct of the way LLMs are built and trained. It's unclear that the bigger and better approach for model scaling will have any positive impact on hallucinations in the near future. Turns out hallucination reduction is where Specialized Language Models shine. By fine tuning SLMs with custom, human labeled & verified data (RLHF), our customers have been able to meaningfully reduce hallucinations in their AI applications. Platforms like Datawizz help companies build up datasets of past LLM interactions. With human labeling, we can identify hallucinations and tune better specialized models to avoid these hallucinations. If you are dealing with hallucinations, check out other blog post on using RLHF - alongside 5 other methods - to reduce LLM hallucinations. Link in the first comment.

Data Infrastructure and Analytics

San Francisco, California 896 followers

Your agent workforce, coming soon.

View all 7 employees

About us

Building horizontal AI agents to automate the manual, repetitive prep work behind your job.

Website: http://www.datawizz.ai
External link for Datawizz
Industry: Data Infrastructure and Analytics
Company size: 2-10 employees
Headquarters: San Francisco, California
Type: Privately Held

Locations

360 Pine St

Suite 400

San Francisco, California 94104, US

Get directions

Employees at Datawizz

See all employees

Updates

Datawizz

896 followers
2mo
Report this post
Model training 🤝 runtime

Iddo Gino
2mo

Today we're launching Continuous Learning in Datawizz. The typical specialized model lifecycle is: collect data, fine-tune, eval, deploy, move on. Then a few months later a better base model drops, the use case evolves, or you're sitting on far more production data than you started with. So you start over and rebuild the whole pipeline. The work doesn't compound. Continuous Learning makes that cycle compounding instead of episodic. Traditionally, training and runtime are separate environments. That separation makes it hard to connect runtime feedback to training signals or replicate real-world distributions in synthetic training data. To enable continuous learning, you need a platform that unifies both. Datawizz collapses the boundary between runtime and training time: - Runtime experience becomes training data. Requests, traces, and outcomes turn into labels, preference pairs, and reward signals. - Failures become gradients, not tickets. Model mistakes and human overrides feed reinforcement learning and fine-tuning loops. - Evaluation runs on real distributions. Changes are gated against live traffic patterns instead of static test sets. - Fine-tuning is signal-driven. Updates happen when prompting saturates or regressions appear. If you're running specialized models in production and this resonates, I'd love to chat. Link to more details in the first comment

1 Comment

Like Comment Share
Datawizz

896 followers
3mo Edited
Report this post
🚀 Datawizz now supports Tinker for LLM fine-tuning! Tinker is a training API from Thinking Machines that handles distributed GPU infrastructure at scale. We've integrated it into Datawizz to give you enterprise-grade training power without the usual complexity. What you get: ✅ Zero setup - No GPU configuration, no infrastructure headaches. Select Tinker, click train, done. ✅ Full control - Configure LoRA parameters, batch sizes, learning rates, warmup steps—everything you need for professional training. ✅ Real-time monitoring - Watch your loss curves and custom evaluation metrics update live in your dashboard. ✅ Custom evaluators - Add your own evaluation logic directly in the UI. It runs automatically during training so you can track what actually matters. ✅ Train to deploy - Once training completes, deploy your fine-tuned model to Datawizz Serverless with one click. No exporting, no uploading, no extra steps. Enterprise-grade training infrastructure, consumer-grade simplicity. 📖 Read the full guide: https://lnkd.in/gDZu6-W4 Ready to fine-tune? Try it today! 🎯 #AI #MachineLearning #LLM #FineTuning

Datawizz x Tinker Integration for Better Fine Tuning and Continuous Learning - Datawizz datawizz.ai

Like Comment Share
Datawizz reposted this
AI Builders by Frontlines.io

20 followers
4mo
Report this post
Iddo Gino built and exited RapidAPI - a platform that served 10 million developers with at least one team at 75% of Fortune 500 companies using and paying for it. After the exit, he tried retiring. Got bored in weeks. Tried angel investing. Hated being the backseat-driver investor he'd always despised as a founder. But reviewing pitches revealed a pattern that pulled him back in: Nearly every AI agent company assumed models would simultaneously get 1-2 orders of magnitude better AND 1-2 orders of magnitude cheaper. "Wait, so everybody's basically banking for agents to work and for AI to work at scale on models getting significantly better and cheaper at the same time. And I just don't see there is any data to support that both of these can happen simultaneously." That was his "Big Short" moment. So he founded Datawizz to solve the unit economics problem through continuous reinforcement learning infrastructure. But building his second company revealed lessons that contradicted his first success: At RapidAPI, he had massive distribution - 10 million developers, 75% of Fortune 500. The synergy story was compelling: developers discover APIs on the marketplace, then their companies buy the enterprise product. The reality? "There were actually two very different businesses and the buying journey didn't actually go from one to the other." He could always find anecdotes of teams using RapidAPI before doing an enterprise POC. The data looked good. But the critical question he should have asked earlier: "Is self-service really the driver for why we're winning deals, or is it a nice-to-have contributor?" That narrative belief capped how big RapidAPI could scale. For Datawizz, he inverted the approach: → Spent months doing pure discovery, not sales disguised as research → Aimed for 29 of 30 minutes asking questions before explaining what they build → Qualified prospects by asking: "If we improved accuracy by 20%, how impactful is that?" → Disqualified companies who said "it's good enough" - even if they had budget → Focused on companies spending six figures monthly on LLM inference → Targeted the first $10M at believers in continuous learning, not skeptics The most valuable discovery calls weren't the ones that converted to POCs. They were conversations with people who thought about the problem completely differently. In hot markets with abundant budgets, it's dangerously easy to collect false positive feedback while building the wrong thing. His prediction: By 2030, 50-60% of AI tokens will flow through specialized, fine-tuned models instead of frontier models. Listen to the full episode of BUILDERS with Iddo Gino to learn how he's approaching category creation differently the second time: https://lnkd.in/ebA75-Gm
5 Comments

Like Comment Share
Datawizz reposted this
Iddo Gino

Datawizz•7K followers
5mo
Report this post
Every time you call GPT-5 or Claude-4.5, hidden parameters control creativity, reasoning depth, and token selection. These dramatically impact results—yet most people never touch the defaults. We've helped companies improve LLM performance by 20%+ just by tuning inference parameters like temperature, top_p, and reasoning_effort. Here's why you should too. Each use case demands its own parameter mix: Creative writing thrives on higher temperature, while data extraction needs predictability. Complex reasoning requires more reasoning_effort; simple Q&A doesn't—it just adds latency and cost. That's why evaluating different parameter configurations is critical. Datawizz lets you test and optimize these settings without touching code. Watch the video below and check our blog post (first comment) to learn more.

4 Comments

Like Comment Share
Datawizz

896 followers
5mo
Report this post
Use Datawizz to determine the GPT-5.1 reasoning effort level you should use!

Iddo Gino
5mo

OpenAI just released GPT 5.1 in the API, giving you more control over the model’s reasoning effort - from the default no reasoning mode to low-medium-high reasoning. But when should you use reasoning, and how much? We benchmarked GPT 5.1 across different reasoning levels and discovered key insights: Don't overthink it—there are diminishing returns. In our tests, performance jumped initially then plateaued. On college-level CS questions, accuracy improved from 83% (no reasoning) to 98% (low reasoning), but high reasoning only added 1% more. The cost of thinking is steep. Reasoning generates "thinking" tokens that multiply latency and cost. In our CS benchmark: - No reasoning: 16 output tokens - Low reasoning: 10x more tokens (125 avg. tokens) - High reasoning: 70x more tokens (1,100 avg. tokens) That's a ~70x cost increase from none to high. Speed matters too. No reasoning averaged 0.94 seconds. High reasoning took 11.82 seconds—a 12.5x slowdown. The takeaway: Reasoning helps, but excessive reasoning carries heavy penalties. Test different levels to find your optimal cost-benefit balance. Watch the video below to see how Datawizz helps you test reasoning levels for your task in just 5 minutes! Links to our GPT-5.1 evals are in the first comment.

Like Comment Share
Datawizz

896 followers
5mo
Report this post
Level up your prompting with Liquid Templates - now supported on Datawizz. Something interesting is happening at the prompt layer of LLM applications. What started as simple variable substitution has evolved into complex conditional flows, dynamic context management, and multi-stage reasoning—all happening in the prompt itself. We're essentially programming in natural language + template logic now. This makes sense when you think about it. Prompts are where your domain knowledge lives. They're where you encode business rules, handle edge cases, and manage context. As agents get more sophisticated, more logic naturally migrates to this layer. We just added Liquid template support to complement our Mustache templates at Datawizz, allowing for more intricate prompt and context engineering right inside Datawizz. Check it out here: https://lnkd.in/g4yCJB7E #LLMs #PromptEngineering #AI #TechnicalArchitecture

Advanced Prompt Management with Liquid Templates - Datawizz datawizz.ai

Like Comment Share
Datawizz reposted this
Iddo Gino

Datawizz•7K followers
7mo Edited
Report this post
After taking some time off post-Rapid, I'm excited to share what I’ve been up to since: Datawizz! We’ve raised a $12.5M Seed led by HUMAN CAPITAL to make AI 10x cheaper, 2x more accurate and 15x faster by transitioning from LLMs to SLMs. AI is eating the world. But unit economics are eating AI. Looking at the fastest growing AI products, they all share two traits - growing fast, and painful inference bills. General-purpose LLMs are just too expensive to run. A big reason for that is we train LLMs to be good at everything - answer any question, be an expert on any topic. The big labs dub this "generalisation", but for real-world applications, it is unnecessary. In reality - many AI applications need models to be experts in one thing - and do that thing extremely well. Your coding model doesn’t need to memorize ancient recipes for Garum sauce. This is where Datawizz comes in - we sit between the AI applications and automatically create smaller (100x-1,000x) specialized models to handle specific aspects of your work. By focusing the model and combining industry-data in the distillation process - we end up with models that beat SOTA LLMs at a fraction of the cost. We created Datawizz to make AI specialized and scalable. We’re early in the journey, but have already been able to save companies 90%+ on their inference bill and speed up their apps by 10x. Excited to build better AI platforms? Join the Datawizz team (link in first comment) Thanks for the support: 91 Ventures, Mythos Ventures, Valyrian, BGV and others :-)

66 Comments

Like Comment Share
Datawizz

896 followers
11mo
Report this post
The death of RAG? Do we still need RAG with larger contexts? Short answer - yes, but a lot less than we used to. - RAG was broadly necessary when model contexts were very limited - so you had to send only the most relevant info - New models have much larger context windows AND are much better with larger contexts. Farther, with prompt caching and other methods -- lager contexts are more economically feasible now - This means that for more static content - you can pass it all in the context window -- even if there's a lot (100k+ tokens). - So for instance, if you are building an agent and want to give it access to platform docs, you can just put everything in the context (rather than use RAG to retrieve specific information). - For bots that handle user specific data, we still technically need "RAG" - we need to retrieve all the user specific information. But we may not need full-blown RAG with similarity search and vectorized content - just static retrieval. You can just "dump" all the user data into the prompt. RAG might not be “dead” yet, but its default‑status is. In 2025, start with the simplest viable architecture: if your knowledge base fits comfortably in 200k–1M tokens, skip the retrieval stack and lean on prompt caching. Add retrieval only when scale, freshness, latency or privacy truly require it. The result is less infrastructure, fewer failure modes, and often better answers. Read our full post:

Do we still need RAG in the age of large LLM Contexts? - Datawizz datawizz.ai

Like Comment Share
Datawizz

896 followers
12mo
Report this post
Are newer AI models hallucinating MORE? Hallucinations have plagued early AI models since the first research LLMs were introduced - but many assumed that this is a technical bug that will get solved with newer version. New reporting from OpenAI suggest that newer models - especially reasoning models - actually hallucinate more, not less. It is apparent that hallucinations are not a "bug" per-se, but a byproduct of the way LLMs are built and trained. It's unclear that the bigger and better approach for model scaling will have any positive impact on hallucinations in the near future. Turns out hallucination reduction is where Specialized Language Models shine. By fine tuning SLMs with custom, human labeled & verified data (RLHF), our customers have been able to meaningfully reduce hallucinations in their AI applications. Platforms like Datawizz help companies build up datasets of past LLM interactions. With human labeling, we can identify hallucinations and tune better specialized models to avoid these hallucinations. If you are dealing with hallucinations, check out other blog post on using RLHF - alongside 5 other methods - to reduce LLM hallucinations. Link in the first comment.
1 Comment

Like Comment Share

Browse jobs

Funding

Datawizz 2 total rounds

Last Round

Seed Oct 28, 2025

US$ 12.5M

Investors

Human Capital + 4 Other investors

See more info on crunchbase

Datawizz

Data Infrastructure and Analytics

San Francisco, California 896 followers

Your agent workforce, coming soon.

About us

Locations

Employees at Datawizz

Eric Buatois

Benhamou Global Ventures•14K followers

Iddo Gino

Datawizz•7K followers

Mike Reid

Datawizz•3K followers

Yitian Wang

Datawizz•1K followers

Updates

Join now to see what you are missing

Similar pages

Rapid (acquired by Nokia)

Datomize

Thatch (acq'd by Mindtrip.ai)

HUMAN CAPITAL

DataWizz.ai

Darktrace

Skyhook

91 Ventures

Atero AI

Naboo

Browse jobs

Business Process Owner jobs

Engineer jobs

Customer Team Lead jobs

Head of Customer Experience jobs

Customer Success Manager jobs

Head of Operations jobs

Director jobs

Senior Data Engineer jobs

Software Engineer jobs

Intern jobs

Funding