Weighing the Triveritas
A third party puts the Triveritas itself to the test
A gentleman scientist by the name of Keruru put the Triveritas through its paces to see if it holds up under multi-modal stress-testing on multiple subjects:
We took ten analytical essays from nine outlets — Phenomenal World, Bruegel, American Affairs, PIIE, Carnegie Endowment, Chartbook, Chatham House, Valdai Discussion Club, and UNCTAD — and scored each on three dimensions derived from Vox Day’s triadic epistemological criterion: logical validity (L), mathematical coherence (M), and empirical anchoring (E). The selection was by recency within each outlet, not by quality or perspective.
One human rater (not a trained economist, though with practical investment experience) scored all ten pieces blind. Then six AI models across four providers ran structured stress tests on each piece: DeepSeek R1 and Mistral Large on Leg 1 (logic), GPT-5.2 with reasoning and Qwen 3.5 on Leg 2 (maths), and GPT-5.2 with deep thinking, Gemini 2.5 Pro, and Perplexity on Leg 3 (empirical). Claude (Anthropic) was excluded from scoring because it served as the study’s composition and analysis interface — a limitation we document rather than hide.
The scoring is binary: Pass or Fail. This is deliberately crude. During the course of this study, Vox Day published the complete Veriphysics reference scales (0–100 per dimension) on his Substack, providing the granular scoring instrument that sits behind the binary framework. Binary scoring was the right choice for Phase II: it reduces coder disagreement and makes human-model comparison tractable for a first empirical test. The reference scales allow finer discrimination once the binary framework has been validated — which this study does. The question for Phase II is not “how good is this essay?” but “does this dimension hold or not?” The scales answer the follow-up: “how badly does it fail, and in what specific way?”
Total model runs: approximately 70 across the corpus…
The Triveritas framework does what it claims to do: it separates analytical writing into three independently testable dimensions and detects failures that single-dimension reading misses. The empirical validation confirms that logical validity, mathematical coherence, and empirical anchoring are genuinely independent — each has characteristic failure modes that the other two cannot detect from within their own domain.
The specific value of AI-assisted review is forensic numerical verification on Leg 2 and systematic source-checking on Leg 3. These are capabilities that most human readers do not possess and cannot acquire without tools. The framework makes explicit what good analytical readers do implicitly, and adds machine verification where human attention is structurally limited.
The study also reveals the practical constraints of independent AI-assisted research: model instability, platform restrictions, payment infrastructure barriers, hardware limitations, and undisclosed usage caps on paid subscriptions. These are not minor inconveniences; they shape what research is possible and who can do it. The framework is designed to be replicable at minimal cost — but the cost is time, attention, and the willingness to work within constraints that commercial AI providers do not make transparent.
The data are complete. The patterns are stable. The framework works.
At this point, it should be readily apparent, if not entirely conclusive, that the Triveritas is one of the most significant advances in human epistemology since, well, somewhere between Aristotle and Newton. We don’t actually know how significant it will prove to be, but the fact that it defeated the Agrippan Trilemma, solved multiple problems hitherto believed to be impossible, and has now demonstrated that its framework works to the satisfaction of third parties does tend to indicate that it is of historic significance to science and philosophy alike.


Thank you for the link. Non mathematical is still numerical for date sequences and historical economics or logistics and the question is for the AI then to construct this from first principles. Each AI has its own quirky weaknesses and I use two per arm for anything serious. Get an AI tow were the prompt: they do it better than you or I can. And push back. The final work is better for this internal stress testing.
Question: I started to try and use the Triveritas to test historical events with Opus. It came back to me that no historical event could be scored higher than 76 because of the inability to prove historical events mathematically, something along those lines. Intuition tells me that something is wrong with that statement. Would you say that the triveritas is useful when evaluatikg historical hypotheses, or is it mainly aimed at logical/scientific hypothesis? From what I read, it appears to me that it could be useful in scoring facts in really any field of reality.