Algorithmic Risk Stratification for Clinical Trial Enrichment
Moving beyond binary inclusion and exclusion criteria with ML-driven continuous risk scoring at population scale.
Key insight: Traditional trial eligibility uses binary, threshold-based criteria—in or out. ML-based risk stratification goes further: it assigns every eligible patient a continuous risk score, enabling trial designers to rank and select those most likely to experience endpoints. The result is enriched cohorts that deliver more events from fewer patients—smaller samples, shorter follow-up, and stronger statistical power.
The limitation of binary eligibility criteria
Trials like DELIVER (Solomon et al., NEJM 2022) define eligibility through binary, threshold-based inclusion and exclusion criteria: LVEF >40%, NT-proBNP above a cutoff, absence of specific comorbidities. These rules determine who can enter the trial—but they do not differentiate which eligible patients are most likely to experience the primary endpoint. Among those who pass the same binary filter, baseline risk can vary by an order of magnitude. ML-based risk stratification adds a continuous scoring layer on top of standard criteria, enabling the selection of trial-eligible patients with the highest probability of contributing endpoint events.
Our approach
Using the UK Biobank (N ≈ 500,000) as a large-scale population proxy, we built time-to-event risk models for cardiovascular death and heart failure hospitalisation, aligning outcomes and exclusions with the DELIVER protocol. Models were trained across a range of data modalities—from routinely available clinical data through toricher sources—and predicted risk scores were used to rank individuals and set thresholds for cohort selection:
- Demographics: age, sex, ethnicity, socioeconomic indicators
- Clinical: routine labs, comorbidities, medications, vital signs
- Genetics: polygenic risk scores for cardiovascular traits
- Proteomics: circulating protein biomarkers
- Imaging: cardiac MRI metrics including LVEF, LA volume, and LV mass index
Outcome and cohort definitions were mapped to DELIVER criteria using ICD-10 code sets, diagnosis-history exclusions, and optional LVEF-based restrictions in the imaging subset.
Results: cohort enrichment at a 3-year horizon
Composite outcome (CV death + HF hospitalisation), 3-year horizon. Individuals ranked by predicted risk and selected at different thresholds:
| Cohort | Fraction selected | Recruitment N | Recruitment efficiency | Event rate |
|---|---|---|---|---|
| Unselected (baseline) | 100% | 713 | 1.00x | 12% |
| Top 10% risk | 10% | 72 | 0.10x | 28% |
| Top 20% risk | 20% | 143 | 0.20x | 19% |
| Top 50% risk | 50% | 357 | 0.50x | 15% |
Selecting the top 10% by predicted risk concentrates outcome events to a 28% event rate—over 2× the unselected population.



With C-index performance around 0.75, risk stratification is clinically meaningful: in most comparable patient pairs, the individual predicted as higher risk experiences the cardiovascular event first.
Implications for Trial Design
- Reduced sample sizes: Higher baseline event rates mean fewer participants are needed for equivalent statistical power.
- Shorter follow-up: Events accumulate faster in enriched cohorts, compressing trial timelines.
- Stronger statistical power: Enrichment increases the event-to-noise ratio and improves the detection of treatment effects.
- Scalable and modular: Models can be built from routinely available data and adapted across therapeutic areas.
Beyond Binary: ML as a Companion to Trial Design
Standard eligibility criteria define who is permitted in a trial. Our ML models identify who, among those eligible, is most likely to contribute endpoint events. This is complementary, not overlapping: DELIVER establishes dapagliflozin efficacy; our framework enables scalable prioritisation of the highest-risk patients within that eligible population. Using UK Biobank as a proof-of-concept proxy, we show that this continuous risk layer can be built from routinely available data and adapted to any therapeutic area or trial design.
Explore what trial enrichment could look like in your programme
Contact us to discuss how algorithmic stratification can accelerate your next cardiovascular trial.

