Ask a general-purpose LLM about a clinical trial and you will get a fluent paragraph that weaves together registry data, published results, mechanism-of-action commentary, and speculative comparisons. It reads well. The problem is that a reader cannot tell which sentences came from ClinicalTrials.gov, which came from a PubMed abstract, and which the model invented to make the paragraph flow.
In a consumer chatbot, this is fine. In a system used by researchers evaluating competitive landscapes, clinicians reviewing trial options, or analysts writing regulatory submissions, it is dangerous. The failure mode is not "the answer is wrong" but "the answer is right in some parts and wrong in others, and the reader cannot tell which is which."
The answer-typing contract
ClariTrial enforces a structural convention in every response that cites tool data. The system prompt requires the model to organize its reply under three headings:
Facts: direct data pulls from tools. NCT IDs, phases, enrollment numbers, SQL row fields, PubMed PMIDs. Each fact block is tied to a named source (ClinicalTrials.gov, AACT preset name, PubMed PMID).
Summary: aggregates the model computed from those facts. Counts, sorted lists, "3 of 5 trials are Phase 2." No new claims beyond what the facts support.
Interpretation: mechanism hypotheses, tradeoffs, competitive positioning, "what this might mean" language. Clearly labeled so the reader knows they are entering inference territory.
This is not a suggestion in the prompt. It is described as "required for material claims," and the system checks compliance after every response.
Post-response compliance checking
After the model finishes generating, a lightweight check scans the response for the expected headings. When tools were called (indicating the model had access to real data) but the Facts heading is absent, the system flags the response as non-compliant. The trace panel displays a warning:
"Facts/Summary/Interpretation headings not detected. Tool data was present; verify claims against the cited sources."
This is a soft guardrail, not a hard block. The response is still delivered. But the user and any downstream reviewer are notified that the model did not follow the separation convention, which means facts and interpretation may be mixed.
Draft-analysis detection
A second heuristic catches a subtler problem: the model crossing from data reporting into guidance-like language. Phrases like "you should consider," "this trial may be suitable for patients who," or "the recommended approach" trigger a draft-analysis flag.
When this flag is set, a visible alert banner appears above the response:
"Verify before acting. This response may include guidance-like language. Verify claims against the primary record on ClinicalTrials.gov or PubMed, and consult your care team for personal decisions."
This is not about catching errors. It is about catching a category shift: the model moving from "here is what the data says" to "here is what you should do," which requires a different level of scrutiny in clinical and regulated contexts.
Why structure beats disclaimers
The standard industry approach to this problem is a disclaimer: "This is not medical advice. Consult your physician." Disclaimers are legally useful and practically useless. Nobody reads them, and they do not help a reader distinguish good data from bad inference within the body of the response.
Structural separation works because it is embedded in the content, not appended to it. A reader scanning a ClariTrial response can jump to the Facts section and verify the NCT IDs. They can read the Summary and check the arithmetic. They can read the Interpretation with appropriate skepticism, knowing that the model has been told to label it as such.
The compliance check and draft-analysis banner add a second layer: even if the model occasionally fails to follow the convention, the system detects and flags the failure.
Applying this to other domains
The Facts/Summary/Interpretation pattern maps to any domain where the distinction between data and opinion matters:
- Legal research: Holdings (facts from case law), Summary (aggregation of precedent), Interpretation (application to the current matter).
- Financial analysis: Reported figures (SEC filings, market data), Summary (computed metrics, comparisons), Interpretation (projections, recommendations).
- Scientific literature review: Findings (extracted from papers), Summary (synthesis across studies), Interpretation (implications, gaps, suggested directions).
The implementation is a system-prompt instruction, a regex-based post-check, and a UI component that renders a warning when the check fails. The technical cost is near zero. The trust benefit is that every response carries its own verifiability scaffold.