Completed - M2 dissertation

EIMLIA-TEU

Hybrid Simulation Framework for Pre-Clinical Evaluation of AI-Assisted Emergency Triage

Project Description

EIMLIA-TEU is a hybrid simulation framework for the pre-clinical evaluation of AI-assisted triage across four dimensions simultaneously: clinical, organizational, economic, and technical. It combines Discrete Event Simulation (DES), Multi-Agent Systems (MAS), four-dimensional digital twins (structural, behavioral, temporal, technical), process mining, and a graphical systems formalization, calibrated on 600,000 anonymized ED patient records (CHU Lille, 2018–2023).

The three AI triage architectures from the TIAEU proof-of-concept (TRIAGEMASTER: Doc2Vec + MLP; URGENTIAPARSE: FlauBERT + XGBoost; EMERGINET: JEPA + VICReg) were retrained on 340,536 patients and evaluated under a dual-regime framework designed to disentangle algorithmic approximation from clinical validity.

Approach

Hybrid DES + Multi-Agent Systems simulation, 4D digital twins

Calibration data

600,000 ED visits (CHU Lille, 2018–2023)
18.5M event logs (process mining)

Retraining cohort

340,536 patients
(60:20:20 split; test n = 68,108)

Models

TRIAGEMASTER (NLP)
URGENTIAPARSE (LLM)
EMERGINET (JEPA)

Dual-Regime Evaluation Framework

The central methodological contribution is an explicit separation between two evaluation regimes, designed to make visible an artefact the AI-triage literature has tended to leave implicit.

R1 — Algorithmic approximation: training and evaluation aligned on the reconstructed FRENCH-scale ideal label. Measures each architecture's capacity to recover the deterministic algorithm (test set n = 68,108).
R2 — Clinical validity: evaluation against a 5-physician expert consensus on a stratified 3,000-case sub-sample, externally moderated by Dr R. Dewilde (CH Maubeuge, TRIADE co-investigator, unaffiliated with the AI team).

Key Results — Dual-Regime Benchmarking

Architecture	R1 (κw)	R2 (κw, projected)
URGENTIAPARSE (LLM)	0.9956	0.81 [0.78; 0.84]
EMERGINET (JEPA)	0.9391	0.74 [0.71; 0.77]
TRIAGEMASTER (NLP)	0.8945	0.69 [0.66; 0.72]
Nurse baseline (literature)	0.65	0.65
Deployment threshold	≥ 0.80	≥ 0.80

Dual-regime benchmarking. R1 vs. FRENCH ideal labels (algorithmic approximation, n = 68,108). R2 vs. 5-physician expert consensus (clinical validity, n = 3,000). 95% bootstrap CIs in brackets.

Central finding

Under R1, all three architectures converge to κw ≥ 0.89, confirming algorithmic-approximation capacity but not clinical validity. Under R2, all three exceed the nurse baseline (κw ≈ 0.65), but only URGENTIAPARSE (κw = 0.81) clears the κw ≥ 0.80 deployment threshold. The R1→R2 gap of ≈ 0.18, replicated across architectures, quantifies the artefact of training on algorithmically reconstructed labels: prior literature evaluating AI triage against such labels likely overstates clinical validity by 0.10–0.20 of κw.

Simulation Results

Hybrid DES-MAS runs (3-day fast mode), under the R2-corrected performance estimates, yield the following change in mean length of stay (∆LOS) versus the manual baseline:

Scenario	∆ Mean LOS	AI concordance
S2b — URGENTIAPARSE	−7.1% [−9.4; −4.8]	76.5%
S2a — TRIAGEMASTER	−3.2% [−5.5; −1.0]	62.3%
S2c — EMERGINET	+1.2% [−0.8; +3.1]	91.0%
S3 — Crisis hybrid (200% load)	+8.8%	90.5%

Scenario comparison (3-day fast-mode runs, R2-corrected error injection). The hybrid crisis scenario preserves clinical safety (concordance 90.5%). Stress tests pass design targets (SURGE n = 220 ≥ 180; availability 99.1% ≥ 99.0%).

URGENTIAPARSE is the only architecture that simultaneously clears the R2 clinical-validity threshold and produces a convergent ecological flow benefit (∆LOS = −7.1%). The framework also documents the "URGENTIAPARSE simulation paradox": in earlier recorded-label cycles, the model produced an apparent ∆LOS of −13.2% while critical sensitivity collapsed to ≈ 0.002 — a safety hazard invisible to any isolated predictive metric, detectable only through multidimensional simulation.

Medico-Economic Projection (CHEERS 2022 + PSA)

Probabilistic sensitivity analysis (Monte Carlo, 50,000 iterations) under the realistic scenario:

Scenario	3-yr ROI (95% CrI)	ICER	P(dominant)
Pessimistic	80% [−30; 320]	12,500 €/QALY	8%
Realistic	480% [210; 1,250]	1,840 €/QALY [dominated; 8,300]	32%
Optimistic	2,100% [890; 4,100]	dominant	64%

PSA Monte Carlo results (50,000 iterations) by scenario. The intervention is cost-effective at the HAS 50,000 €/QALY threshold in 99.4% of simulations (Cost-Effectiveness Acceptability Curve).

The five parameters explaining 87% of ROI variance are, in decreasing order: inappropriate hospitalization avoidance rate (38%), GHS unit cost (21%), imaging avoidance rate (12%), annual ED visit volume (10%), and AI inference latency P95 (6%). An earlier deterministic projection (ROI 10,260%, ICER 93 €/QALY) was explicitly withdrawn in favour of these probabilistic figures.

Evaluation Dimensions

Clinical

R1/R2 weighted κw, critical sensitivity, under-triage rate

Organizational

Length of stay, waiting time, overcrowding rate

Economic

ROI, ICER, CEAC via Monte Carlo PSA (CHEERS 2022)

Technical

Availability, inference latency, MTTR

Ethics

Single-center retrospective study (CHU Lille Adult ED, 2018–2023), CESREES-approved, MR-004 methodology with declaration N° 27797006 to the Health Data Hub. GDPR-compliant, CNIL MR-004 reference framework.

Conclusion and Next Step

EIMLIA-TEU demonstrates that AI triage models trained on algorithmic labels recover the algorithm rather than clinical judgment; that ecological flow simulation is necessary to detect safety-flow trade-offs invisible to predictive metrics; and that under realistic R2 evaluation only URGENTIAPARSE clears the stringent κw ≥ 0.80 threshold. The TRIADE prospective multicenter cluster-RCT (CHU Lille, CH Maubeuge, CH Dunkerque) is the indispensable next step.

Project Team

Dr Edouard Lansiaux

Author

Lille University Hospital

Pr Hayfa Zgaya-Biau

Research Director

METRICS ULR 2694 & CRIStAL UMR 9189

Pr Mehdi Ammi

Co-supervisor

LIASD, University Paris 8

Pr Emmanuel Chazard

Methodologist

METRICS ULR 2694

Pr Eric Wiel

Clinical Coordinator

METRICS ULR 2694 & Lille University Hospital

Dr R. Dewilde

External R2 Moderator

CH Maubeuge

Dr Ramy Azzouz

AI in Healthcare Expertise

Lille University Hospital