Determining the publication, effective, and comment-close dates of regulatory documents, proposed rules, guidelines, and internal policies. A head-to-head on a production run of 55 documents: Archer Evolv's tuned extraction + knowledge base + EITL methodology against a raw LLM.
In risk and compliance, the date attached to a regulatory document is the trigger for everything downstream. The effective date sets when an obligation becomes binding. The comment-close date is a hard window that, once missed, cannot be reopened. The publication date anchors version control, supersession, and audit trails. A wrong date doesn't produce a slightly-off answer — it produces a missed filing, an out-of-date control, or an obligation tracked against the wrong calendar.
That is why a "usually right" model is the dangerous case. An answer that is wrong but plausible and confident flows silently into a compliance calendar and is only discovered when the deadline has already passed.
Anchors which version of a rule governs, what it replaced, and the audit trail regulators expect. Wrong here corrupts the system of record.
Determines when an obligation becomes binding and when controls must be live. An error means operating out of compliance without knowing it.
The fixed deadline to influence a proposed rule. Miss it and the only remedy is litigation or living with the outcome.
Every document was run through the raw-LLM determination mechanism and independently adjudicated by an Expert-in-the-Loop. The raw model was correct on fewer than half. Evolv, applying source-specific extraction configs and a tuned knowledge base, holds error below 5%.
Of the 20 answers the raw LLM rated high confidence, 7 were flatly wrong — a 35% false-assurance rate. You cannot filter risk by trusting the model's own confidence score; the failures it hides are precisely the ones a reviewer would have waved through.
These aren't near-misses. The raw model defaulted to tidy, plausible dates — the first of a month, the first of a year — while the true date sat in a statutory citation it never retrieved. Several confident answers were off by years or decades. The correct date in every case is traceable to a specific legal reference.
| Source | LLM said | Conf. | Actual | Off by | Authority for the correct date |
|---|---|---|---|---|---|
| DE · Gen. Assembly | 1996-02-02 | high | 2024-06-25 | ~28 yrs | 84 Del. Laws, c. 277, §§ 2,3 — latest amendment approved |
| MT · Sec. of State | 2024-10-01 | medium | 2007-03-22 | ~17 yrs | Sec. 1, Ch. 38, L. 2007 — amendment approved |
| UT · Sec. of State | 2025-12-06 | high | 2025-10-14 | ~2 mo | Ch. 17, 2025 Special Session 1 — signed by governor |
| DE · Gen. Assembly | 2023-08-03 | high | 2025-06-30 | ~2 yrs | 85 Del. Laws, c. 44, § 1 — latest amendment approved |
| CA · regulatory | 2007-08-01 | medium | 2014-08-13 | ~7 yrs | CA Regulatory Notice Register — register filing |
| DE · Gen. Assembly | 2024-07-01 | medium | 2026-01-30 | ~1.5 yrs | 85 Del. Laws, c. 233, § 10 — latest amendment approved |
Each correction was supplied by EITL adjudication and traces to a source-specific authority — exactly the signal Evolv's extraction configs are tuned to find and that retrieval confirms.
Per request, the raw LLM averaged ~4 seconds against a 5-second timeout; Evolv serves a verified, persisted date in ~0.05 seconds — about 80× faster. But the real divergence is repetition: when an agent or analyst asks for the same document's date again, the raw model re-computes from scratch — re-incurring latency, inference cost, and a fresh, non-deterministic chance of being wrong. Evolv answers from cache: compute once, verify once, serve forever.
* Wrong-answer estimate applies the measured rates to answers actually served: raw LLM 25.5% (incorrect, returned as if valid) on every recompute; Evolv <5%, caught at ingestion and routed to review rather than shipped. Latency assumptions: raw 4.0s/call, Evolv 0.05s/cache read. Inference is incurred once per document at ingestion for Evolv. Figures are illustrative and scale with the sliders.
Evolv manages reusable, content-source-specific collections of extraction-configs and KB models, each linked to one or many sources. When a new document enters the infer_document_key_date pipeline, the specialized AI operator applies the matching config to drive its determination strategy.
Raw LLM: 44% correct, 56% wrong or failed, and confidently wrong 35% of the time. Evolv: >95% verified, with the residual caught and routed — never silently shipped.
~4s per raw request against a 5s timeout, recomputed on every ask. Evolv serves verified dates from persistence in ~0.05s — roughly 80× faster on repeat.
Raw pays inference, latency, and fresh error risk on every lookup. Evolv computes once at ingestion; every subsequent answer is near-free and already verified.
For a function where a wrong date is a missed deadline, the raw LLM's defining failure isn't that it's wrong — it's that it's confidently wrong, repeatedly, and at full cost each time. Evolv replaces that with a verified, cached, expert-governed answer.