# The new v3 model: architecture, parameters, and validation

## Design philosophy

Start simple. The previous AI tried to build an 8-compartment, 7-species PBPK and
got predictions wrong by factors of millions because of unit errors. We do the
opposite:

1. **One plasma compartment per species** (E2, E1, E1S, SHBG). No tissues, no
   gut, no portal vein as separate compartments.
2. **Real ODE integrator** (`scipy.integrate.solve_ivp` LSODA). No
   hand-iteration.
3. **Mass balance written in pg/day**, with explicit unit comments. No
   "concentration = total dose" errors.
4. **Anchor to published Cmax / steady-state values**, not to PBPK first
   principles, because the metabolic clearance rates are well-measured (Longcope
   1968; Ruder 1972) but tissue concentrations are not.
5. **Validation table at the end of every run**: prints model vs anchor with
   "within 2×" verdict, computed from the actual output.  No hardcoded ✓
   labels.

The result is a 3-species kinetic model with an SHBG state variable, sublingual
+ oral + transdermal + IM routes, and validation against eight scenarios.
**Seven of eight pass within 2× of literature anchors**; the eighth (sublingual)
sits at 2.18× on E1S, against an anchor that itself has wide uncertainty in the
literature.

## State variables

Concentrations are in **pg/mL** (matching how labs report).

| Index | Variable | Meaning |
|---|---|---|
| 0 | E2 | total measured serum 17β-estradiol (free + protein-bound) |
| 1 | E1 | total estrone |
| 2 | E1S | estrone-3-sulfate |
| 3 | SHBG | sex hormone binding globulin, in nM |
| 4 | D_oral_E2 | oral depot of E2 (pg, awaiting absorption) |
| 5 | D_oral_E1 | oral depot of E1 (pre-conjugated by gut wall) |
| 6 | D_oral_E1S | oral depot of E1S (pre-conjugated) |
| 7 | D_subl_E2 | sublingual depot of E2 |
| 8 | D_IM_E2V | IM depot of E2-valerate |

Depots release first-order into systemic; the absorption rate constants are
species-and-route-specific.

## Core ODE

For each estrogen species:

```
dC/dt = (input_rate - clearance + sum(interconversions)) / Vd
```

where:

- `input_rate` (pg/day) = constant terms + sum of depot releases
- `clearance` = `MCR × C` (× 1000 conversion factor, see code unit note)
- `interconversions` = `f_ij × MCR_j × C_j` for each producing species `j`
- `Vd` = apparent volume of distribution (L)

The interconversion fractions `f_ij` give the share of species j's metabolic
clearance that converts into species i (rather than terminal excretion).

For SHBG:

```
dSHBG/dt = (SHBG_target - SHBG) / τ_SHBG
SHBG_target = SHBG_baseline × (1 + (induction_max - 1) × Hill(free_hepatic_E2))
```

with `τ_SHBG = 3 days` (transcription + translation + plasma half-life of SHBG),
and a Hill curve with EC50 = 1500 pg/mL free hepatic E2 and max induction = 10×.

The "free hepatic E2" includes a portal-vein amplification term: during oral or
sublingual absorption, the hepatic E2 transiently rises by ~50× the systemic
concentration (because the absorbed drug enters the portal vein before
distributing).

## Parameters (with confidence scores)

Confidence legend: **c1** = textbook-solid, **c2** = well-established but
variable across studies, **c3** = derived/inferred, **c4** = mostly intuited.

### Clearances (L/day)

| Parameter | Value | Conf | Source |
|---|---|---|---|
| MCR_E2 | 1500 | c1 | Longcope 1968; central of 1300–1700 range |
| MCR_E1 | 2200 | c1 | Longcope 1968 |
| MCR_E1S | 150 | c1 | Ruder 1972 (157 ± 70) |

### Volumes of distribution (L)

| Parameter | Value | Conf | Note |
|---|---|---|---|
| Vd_E2 | 135 | c2 | Computed from MCR × t½/ln2 to give ~1.5 h half-life |
| Vd_E1 | 250 | c3 | Tuned to give ~2 h apparent half-life |
| Vd_E1S | 108 | c2 | Consistent with 12 h half-life |

### Interconversion fractions

| Flux | Fraction | Conf | Source |
|---|---|---|---|
| E2 → E1 | 0.20 | c2 | Longcope ρ = 0.15; bumped to 0.20 for first-pass realism |
| E2 → E1S | 0.50 | c2 | Ruder ρ = 0.65; ~0.08 of that goes through E1 first, so direct ≈ 0.57 |
| E1 → E1S | 0.54 | c1 | Ruder 1972 ρ = 0.54 |
| E1 → E2 | 0.05 | c1 | Longcope 1968 ρ = 0.05 |
| E1S → E1 | 0.21 | c1 | Ruder 1972 ρ = 0.21 |
| E1S → E2 | 0.014 | c2 | Ruder 1972 |

### First-pass fates of oral dose

For each milligram of oral E2 absorbed across the gut wall (which is most of the dose; gut absorption is high — it's metabolism that destroys most of it):

| Fraction | Value | Conf | Note |
|---|---|---|---|
| Reaches systemic as E2 | 4% | c2 | Kuhl 2005; consistent with absolute F = 5% |
| Reaches systemic as E1 | 30% | c2 | Calibrated to give E1/E2 ≈ 5 ratio at steady state |
| Reaches systemic as E1S | 32% | c2 | Calibrated to give E1S ≈ 2500 pg/mL at 1 mg/day |
| Lost to glucuronide / urine | ~34% | c3 | Inferred mass balance |

### Sublingual: only the fraction absorbed sublingually bypasses first-pass

| Fraction | Value | Conf | Note |
|---|---|---|---|
| Sublingually absorbed → systemic E2 | 5% | c3 | Doll 2022: AUC(SL)/AUC(PO) ≈ 1.8, so SL F = 1.8 × oral F |
| Swallowed (75% of dose) → goes through oral first-pass | per above | c2 | |

### SHBG

| Parameter | Value | Conf | Source |
|---|---|---|---|
| Baseline | 50 nM | c1 | Typical premenopausal female |
| Max induction | 10× | c2 | Pregnancy 5–10×, oral much less |
| Time constant | 3 days | c2 | SHBG protein turnover + mRNA dynamics |
| Kd for E2 | 20 nM | c1 | **Corrected from old AI's 1 nM (DHT value)** |
| Induction EC50 (free hepatic E2) | 1500 pg/mL | c3 | Selva & Hammond 2009 HepG2 |
| Hill coefficient | 1.0 | c4 | Default; not measured |

### Albumin

| Parameter | Value | Conf | Source |
|---|---|---|---|
| Concentration | 600,000 nM (40 g/L × 25 nM/g) | c1 | Standard |
| Kd for E2 | 12,000 nM (12 µM) | c2 | Mendel 1990 et al.; gives 2% free fraction with SHBG = 50 nM |

### SULT1E1 substrate inhibition

| Parameter | Value | Conf | Note |
|---|---|---|---|
| Ki (hepatic [E2]) | 25,000 pg/mL | c3 | Tuned to give ~0.3× sulfation rate at pregnancy hepatic E2 |
| Hill n | 1.5 | c3 | |
| Hepatic / systemic E2 factor | 2 | c4 | Coarse — actual partition is route-dependent |

## Validation: model vs literature anchors

Steady-state predictions (averaged over the last 5 simulation days):

| Scenario | E2 model / anchor (×ratio) | E1 model / anchor | E1S model / anchor | Verdict |
|---|---|---|---|---|
| Cycling follicular | 72 / 50 (×1.4) | 58 / 50 (×1.2) | 819 / 960 (×0.9) | ✓ |
| Oral 1 mg/d PMP | 48 / 35 (×1.4) | 218 / 250 (×0.9) | 4101 / 2560 (×1.6) | ✓ |
| Oral 2 mg/d PMP | 94 / 70 (×1.3) | 415 / 500 (×0.8) | 8024 / 5000 (×1.6) | ✓ |
| Transdermal 50 µg/d | 36 / 50 (×0.7) | 29 / 50 (×0.6) | 409 / 600 (×0.7) | ✓ |
| Transdermal 100 µg/d | 70 / 100 (×0.7) | 37 / 70 (×0.5) | 644 / 900 (×0.7) | ✓ |
| Sublingual 1 mg BID | 139 / 100 (×1.4) | 333 / 200 (×1.7) | 6537 / 3000 (×**2.18**) | ✗ borderline |
| IM EV 5 mg q5d | 393 / 200 (×1.97) | 113 / 200 (×0.6) | 2841 / 1500 (×1.9) | ✓ |
| Pregnancy term | 20672 / 20000 (×1.0) | 8133 / 7000 (×1.2) | 53658 / 100000 (×0.5) | ✓ |

Pregnancy E1S sits at the 0.5× edge (model is *under-predicting*), which we
flag as a known limitation: capturing pregnancy E1S accurately would need
explicit modeling of placental sulfotransferase activity and fetoplacental E1S
output, which we don't include.

Sublingual is the one full failure (×2.18 on E1S), but the sublingual E1S
anchor of 3000 pg/mL is genuinely uncertain — I picked it from rough triangulation of Cirrincione
2021 + the AI's own discussion, and I would not bet much on it being accurate
within 2×. The model's prediction of 6500 pg/mL is consistent with
"intermediate to oral-like" hepatic exposure on sublingual, which is the
qualitative conclusion Cirrincione + Bar 2024 actually support.

## Outputs the model produces

For each scenario, the model gives time courses of:

- E2 (total serum, pg/mL)
- E1 (pg/mL)
- E1S (pg/mL)
- SHBG (nM)
- Free E2 fraction (%, computed from SHBG via binding equation)
- Free E2 (pg/mL, = total × free fraction)
- Estimated hepatic ER-α occupancy (Hill curve)

All in a single ODE integration, all with explicit unit tracking.

## What the model **doesn't** capture

Be clear about this. The model is a quantitative scaffold, not a precision predictor.

1. **No real tissue compartments**. Adipose, breast, brain, uterus all lumped
   into the systemic plasma compartment. Tissue concentrations cannot be
   predicted.

2. **No portal vein as a real compartment**. Oral/SL first-pass amplification
   is modeled as a 50× multiplier on hepatic free E2 during absorption — a
   coarse heuristic. A real PBPK would integrate portal flow × concentration
   over the absorption time course.

3. **No CYP catechol pathway**. 2-OH-E2 and 4-OH-E2 (and their glucuronides)
   are lumped into "other clearance".

4. **No glucuronides as separate state variables**. E2-3-glucuronide,
   E2-17-glucuronide, E1-3-glucuronide all lumped into "cleared / excreted".

5. **No estriol**. Pregnancy is modeled as a high E2 + E1 input; the actual
   placental DHEA-S → E3 pathway is absent. Estriol's contribution to ER
   activation and to the E3-glucuronide-driven enterohepatic loop is not modeled.

6. **No enterohepatic recirculation**. Gut bacterial β-glucuronidase
   reactivation of biliary glucuronides is absent. This means the model
   under-predicts apparent oral E2 half-life by some amount (the ~13–20 h
   apparent half-life of oral E2 is partly driven by EHC, which we miss).

7. **No inter-individual variability**. Single population-mean trajectory; no
   covariate effects (age, BMI, smoking, ethnicity, UGT/SULT genotype).

8. **SHBG model is rough**. The Hill curve EC50 of 1500 pg/mL is
   calibration-based, not measured. Pregnancy SHBG rise is captured at ~3×
   (model output) vs the empirical 5–10×, suggesting the dynamics are too
   sluggish for the very-high-E2 regime. Possibly needs a nonlinear SHBG
   mRNA dynamics term.

9. **No EE (ethinylestradiol) or other synthetic estrogens**. Just 17β-E2.

10. **Hepatic ER-α occupancy is illustrative, not predictive**. The systemic
    free-E2 to hepatic-occupancy mapping has no validated parameters; we use
    it for relative-ranking-by-route only.

## What it's actually useful for

- **Comparing routes at a coarse level**: which route puts the most E1S in
  the reservoir, which route raises SHBG the most, which route has the
  highest peak E2.
- **Reasoning about timing**: how long does it take to reach steady state on
  oral vs IM vs transdermal? Day-cycle variation? Trough-to-peak ratio?
- **Sanity-checking claims in chatbot conversations or message boards** —
  e.g., "is it plausible that 2 mg oral E2 gives the same E2 as 100 µg
  transdermal?" (yes, see the validation table).
- **Building intuition for the order-of-magnitude effects of switching
  routes or doses** — the model is a defensible factor-of-2 predictor across
  routes, which is enough for thinking.

## What it is **not** useful for

- Personal dose adjustment decisions. The inter-individual variability
  (especially in UGT, SULT, SHBG levels, body composition) means a model
  output of "70 pg/mL" should be read as "somewhere between 35 and 140 pg/mL
  for a generic postmenopausal woman".
- Predicting VTE risk for a specific person. The ER-α occupancy → SHBG
  → procoagulant chain in the model is illustrative; using it to predict
  whether a specific dose for a specific person crosses some risk threshold
  is irresponsible. Use the model to think about *route differences* and
  *direction of effect*, not to set numeric risk thresholds.
- Clinical decision support. This is a thinking tool, not a medical device.

## How to run the model

```bash
cd /workspace/trans/estradiol/model
source .venv/bin/activate     # uv venv from BASICS.md
python v3_calibrated.py       # prints validation table
python plot_v3.py             # writes plots to figures/v3/
```

## How to extend it

The natural next moves:

1. **Add a real liver compartment** with portal-vein concentration tracked
   separately. Use Plowchalk & Teeguarden 2002 + Karelina 2017 PBPK structure.
2. **Add inter-individual variability**: sample MCRs, SULT1E1 Vmax,
   SHBG baselines from log-normal distributions; run Monte Carlo.
3. **Couple to existing estrannaise.js IM ester models** for the
   well-fitted Bayesian parameters of EV, EEn, EC, EB, EUn.
4. **Add the catechol pathway (CYP1A1/1B1) and glucuronides as
   separate state variables**, especially the 4-OH-E2 / quinone pathway
   if interested in carcinogenicity reasoning.
5. **Replace the SHBG Hill curve with a transcription/translation
   ODE chain** so the pregnancy 5–10× rise can be reproduced.
6. **Estriol pathway** for pregnancy. Fetoplacental DHEA-S → STS in placenta
   → DHEA → ... → E3 → E3-3-glu → maternal urine.

Each of these is a meaningful upgrade. None is required for the current model
to be useful.
