How Hallucinations Cause Lawyers to Accidentally Commit Malpractice with AI - and How to Prevent It
Why Prompting to Create Transparent Reasoning and Verification Are Non‑Negotiable
Used carelessly, AI can mislead lawyers into filing inaccurate briefs, making unsupported arguments, or relying on false factual premises, all of which carry real risks of malpractice and professional discipline.
The danger is not hypothetical. AI does not understand truth, relevance, or professional responsibility. It understands patterns. When those patterns are wrong, the consequences fall not on the developers of AI but on the lawyer who relied on the output.
Therefore, the only defensible way to use AI in your legal practice is to force it to act less like an oracle and more like a Daubert‑compliant expert witness, one required to reveal its methodology, expose its assumptions, and demonstrate that its conclusions rest on reliable reasoning.”
This article describes two prompting techniques that do exactly that: chain‑of‑thought prompting and fact‑check prompting. Used together, they convert AI from a hallucinating prediction machine into a transparent and accountable reasoning partner. For lawyers working with forensic evidence, expert analysis, or complex records, these techniques are indispensable risk‑management tools.
Chain-of-Thought Prompting: Forcing AI to Think Out Loud
A chain‑of‑thought (CoT) prompt is a way of instructing an AI to show its reasoning instead of jumping straight to an answer. You’re telling the model, in effect:
“Don’t just give me the conclusion—walk me through how you got there.”
This makes the output more transparent, easier to verify, and more reliable, which is why it’s often compared to requiring an expert to explain their methodology.
Consider a DUI case involving COBRA breath-test data generated by an Intoxilyzer 9000, Michigan’s exclusive breath testing instrument for DUI arrests. The dataset reflects duplicate results of 0.143 g/210 L. A conventional prompt such as “Does this data support a suppression argument?” often yields a tidy answer divorced from a useful explanation. The model may announce that the test is unreliable or inadmissible without exposing the assumptions underlying that conclusion.
A chain-of-thought prompt imposes discipline:
“Explain, step by step, how a 0.027 variance between duplicate breath samples affects reliability under Michigan Administrative Code R 325.2655 and supporting case law. Identify each inference and its evidentiary significance.”
Under this constraint, the AI must begin with the raw numbers, note that the variance falls within Michigan’s ±0.02 tolerance, identify that one sample logged a zero-volume air blank, recall that R 325.2655(1)(d) requires an air blank before each test, and connect that deviation to Michigan case law holding that procedural noncompliance may support a motion for inadmissibility. Only after laying this groundwork can the model reach a conclusion regarding reliability, expert testimony, or trial strategy.
This methodology scales to more complex forensic questions. When analyzing instrument reliability across an entire operational dataset, a structured chain-of-thought prompt might require:
“Analyze this Intoxilyzer 9000 data and show your reasoning at each step: The dataset shows 150 total tests with 15 unacceptable samples, 14 deficient samples, 4 RFI detections, 7 other interferents, 3 operator-stopped tests, and 1 refusal. First, count the total exceptions and show your addition. Second, calculate the exception rate by dividing total exceptions by 150 and show this math. Third, reason through the implications: if this instrument fails at this rate, what does that tell us about trusting the results it does produce? Explain your complete reasoning process, not just your conclusion.”
By requiring the AI to show its work, literally displaying “15 + 14 + 4 + 7 + 3 + 1 = 44 exceptions; 44/150 = 29.3%”, allowing the lawyer to verify each calculation, challenge each logical leap, and understand exactly where judgment enters the analysis. The prompt transforms the AI from an oracle into a collaborator whose reasoning can be inspected, questioned, and refined.
What matters is that the reasoning is visible. Chain-of-thought prompting creates transparency-based trust. It allows the lawyer to audit the analysis, isolate weak assumptions, and decide where additional evidence or expert input is required. The AI’s reasoning becomes inspectable rather than mystical.
Fact Check Prompting: Making AI Prove Its Claims
Transparency alone, however, is not enough. A witness who reasons clearly but inaccurately is still unreliable. Fact-check prompting supplies the second safeguard: verification.
A fact‑check prompt tells an AI to verify its own statements before answering, rather than presenting information as if it’s automatically correct. You’re essentially instructing the model:
“Double‑check what you’re saying. If something seems uncertain, say so.”
This reduces hallucinations and makes the answer more cautious, reliable, and transparent. It shifts the model’s role from advocate to auditor. The question becomes not “Does this sound right?” but “What authority supports this?”
Imagine submitting the following paragraph to a stand-alone fact-check prompt:
“Under Michigan’s administrative rules, each test requires a fifteen-minute observation period, duplicate samples within a specified range (see R 325.2655) g/210 L, and automatic air-blank purges before and after each sample. The device is self-calibrating and does not require periodic accuracy checks. A missing purge cycle automatically renders the test inadmissible under Michigan law.”
A properly framed fact-check prompt requires the AI to classify each sentence as accurate, partially accurate, or unsupported, and to cite primary legal authority or peer-reviewed science for every determination. In doing so, the model confirms the instrument used and the duplicate tolerance requirements, rejects the claim that the device requires no periodic calibration, and corrects the assertion that a missing purge cycle results in automatic inadmissibility by citing applicable case law and Michigan administrative rules.
This methodology extends to forensic data analysis. Consider a breath test challenge based on instrument exception rates. A lawyer might receive AI-generated analysis stating: “This instrument’s 29.3% exception rate is three times the industry standard of less than 10%, making results unreliable.” A fact-check prompt demands verification:
“Fact-check this claim sentence by sentence. First: What is the source for the ‘10% industry standard’, cite the manufacturer specification, peer-reviewed study, or forensic science organization guideline that establishes this benchmark. Second: Is the calculation accurate, show the math: 44 exceptions divided by 150 tests equals what percentage? Third: What authority supports the conclusion that an elevated exception rate affects admissibility or weight of evidence, cite Michigan case law or administrative rules. Classify each claim as verified, partially verified, or unsupported.”
Under this constraint, the AI must produce citations, perhaps a CMI, Inc. technical manual specifying acceptable performance parameters, the actual arithmetic (44 ÷ 150 = 0.2933 = 29.3%), and case law establishing that evidence of instrument malfunction affects evidentiary weight. If no authoritative source exists for the “10% standard,” the AI must acknowledge the claim as an expert opinion rather than established fact, and the lawyer knows to obtain supporting affidavit testimony.
This process builds evidence-based trust. The AI’s output is no longer persuasive by tone alone; it is constrained by authority. Unsupported assertions are exposed. Overstatements are tempered. Language becomes litigation-ready.
Reason, Then Verify: The Lawyer’s Sequence
Used together, chain‑of‑thought and fact‑check prompts mirror the basic way lawyers analyze problems: explain your reasoning, then verify each step. One shows how the answer was reached. The other checks whether the answer holds up.
This two‑step approach fits naturally with the requirements of Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993) and Michigan Rule of Evidence 702. Courts don’t rely on how confident an expert sounds. They look at whether the expert’s reasoning is reliable and based on sound methods applied to the facts.
AI that is required to reason openly and check its own claims starts to look less like a black box and more like an expert who could survive cross‑examination.
Teaching Machines the Lawyer’s Craft
At a deeper level, these techniques show that trustworthy AI is about accountability. Chain‑of‑thought prompting mirrors legal reasoning by making implicit logic explicit. Fact‑check prompting mirrors legal research by anchoring every claim to verifiable authority. Together, they train the model to operate within the epistemic rules of the legal system.
The irony is that these prompts discipline the lawyer as much as the model. They force us to slow down, surface our own assumptions, and demand from AI the same rigor we expect from ourselves and our experts. In this way, AI sharpens rather than diminishes our professional judgment.
Trust as a Discipline
Trustworthy AI isn’t something software automatically provides—it’s something lawyers create through disciplined interaction. When we require AI to reason aloud and verify its claims, its output becomes far less prone to hallucination and far easier to check. The antidote to unreliable AI isn’t fear or prohibition, but the same discipline the law has always demanded: show your work, then prove it.
About the Author
Patrick Barone is a Michigan criminal defense attorney focused on DUI litigation, forensic evidence, and the responsible use of artificial intelligence in legal practice. He writes for lawyers who want practical, disciplined ways to integrate AI into real cases without compromising professional judgment or ethical obligations.
More about his work can be found at https://www.baronedefensefirm.com.
Invitation to Engage
If you use AI in your practice, how do you test whether it deserves your trust? Do you require it to show its reasoning, verify its sources, both, or neither? If you have developed techniques that increase your confidence in AI-assisted legal analysis—especially in evidentiary or forensic contexts—I invite you to share your experience in the comments.


