Improving the worm→Chopin pipeline through Applied Statistics
Goal: the best model we can build that teaches a C. elegans to play Chopin's Nocturne in C♯ minor.
Polimi · AppStat 2026 · Companion to PyANNOW (NAML) · v0.8.0
Engineering doc: docs/STATISTICAL_DIAGNOSTICS.md
Notebook: notebooks/01_appstat_lecture_audit.ipynb
21 PyANNOW issues tagged [@appstat-audit] — 11 resolved in v0.8.0
Teach a worm to play Chopin. That means two things:
ion channels → spikes → muscles → MIDI → audio
PyANNOW already implements 8 NAML steps along this chain.
Quantify "Chopin-likeness" rigorously enough that optimising it produces music.
PyANNOW reports `onset_loss` and `musical_f1`. Both are too coarse.
PyANNOW ISSUE-016 and ISSUE-017 patched the metric — but the underlying pipeline has 10 structural logic problems that no metric correction can fix. Each AppStat lecture maps to a tool that addresses one of them.
v0.8.0 fixes ✅: (a) 96-cell Boyle model, k≥4 PCs; (b) Procrustes standardized; (c) 100% pitch coverage. Still open: ISSUE-033 magic threshold.
| # | Problem | Fix by AppStat | Issue |
|---|---|---|---|
| 1 | 302-neuron matrix is synthetic (8 blocks + noise) | L01 PCA biplot reveals rank ≈ 8 | 029 |
| 2 | Step 0 mislabelled "random" | L00 IOI distribution shows it's deterministic body-wave | 030 |
| 3 | 8-muscle pitch bottleneck | L06 pitch-aware F1 ceiling | 031 |
| 4 | Procrustes between unstandardized incommensurate spaces | L04 standardize-first; L01 biplot | 032 |
| 5 | Shared `find_peaks(height=mean)` across all steps | L05 calibrated logistic + Youden | 033 |
| 6 | Chopin features lossily compressed to k=8 | L01 cumvar audit | 034 |
| 7 | Onset-only metric ignores pitch & velocity | L06 pitch-aware F1 + velocity Pearson r | 035 |
| 8 | Step 8 PINN is oscillator-PINN, not ion-channel PINN | L07 RF baseline reveals what physics buys | 036 |
| 9 | Biological ceiling assumes any muscle = any pitch | L06 reachable-pitch confusion | 037 |
| 10 | No cross-validation anywhere | L06 StratifiedKFold + bootstrap CIs | 038 |
The metric paradox is invisible from scalar numbers but obvious from distributions.
Step 0's IOI distribution = sharp peak at one body-wave period (~220 ms).
Chopin's IOI = broad, high kurtosis.
Step 0 is not "random" — it's deterministic and structurally incompatible with Chopin.
descriptive.collect_step_stats() → DataFrame of `{n, ioi_mean, ioi_std, ioi_skew, ioi_kurtosis, ...}`
plot_ioi_distributions() — KDE overlay
ISSUE-022
Run PCA on the 302-neuron matrix; report cumvar at k=8.
Expected: cumvar @ k=8 ≈ 100% → the matrix has rank ≤ 8, not 302.
This closes logic #1 visually.
2-D nonlinear projection coloured by KMeans label.
If colours mix, the motor primitives are arbitrary.
Manifold sanity check for Step 2.
ISSUE-023 — biplot · ISSUE-024 — t-SNE/UMAP · ISSUE-034 — Chopin k=8 audit
| Method | PyANNOW | What it adds |
|---|---|---|
| KMeans | ✅ | (in Step 2) |
| Ward + dendrogram + cophenet | 🔴 | Is the k=4 hierarchy natural? |
| DBSCAN | 🔴 | Outlier worm-behavior detection |
| GMM (soft, BIC) | 🔴 | Smooth biological transitions |
clustering.compare_methods() returns silhouettes + pairwise ARI for all four.
GMM is the most biologically defensible (motor primitives transition gradually, not abruptly).
ISSUE-025
✅ Ridge + RidgeCV; R² vs α curve
🔴 No diagnostics — residuals, QQ, BP, DW, VIF, Cook
🔴 No Lasso — we never check whether k=4 PCs are needed
regression.diagnose_ridge() — Lab V table per Chopin feature dim
regression.lasso_path_selection() — true k_effective from LassoCV
Standardize before fitting (closes logic #4)
ISSUE-026 · ISSUE-032
PyANNOW's onset detection at every step is one line:
peaks, _ = find_peaks(activ, distance=int(0.28/0.5e-3), height=activ.mean())
That is a 1-feature classifier with a hardcoded threshold. Different step activations → different ideal thresholds, but the same `mean()` rule is used everywhere.
We swap for a calibrated LogisticRegression(class_weight='balanced') with
Youden's-J or best-F1 threshold tuning. The gap between calibrated
F1 and PyANNOW's reported F1 measures how much each step's poor score was
caused by the hardcoded threshold rather than the activation itself.
ISSUE-027 · ISSUE-033
F1 vs tolerance {10, 25, 50, 100, 200, 400} ms
PR overlay (preferred under heavy imbalance)
ROC overlay + AUC
Bootstrap 95% CI + paired-bootstrap H₀ test
StratifiedKFold over Chopin sub-windows
Onset matched in BOTH time AND pitch
Plain F1 ignores wrong notes; pitch-aware F1 doesn't.
Bipartite matching: greedy by time, must share pitch-class.
Closes logic #7
ISSUE-018, 019, 020, 021, 035
PyANNOW jumps from Linear (Ridge) to MLP (Adam) to L-BFGS to PINN — a deep-model escalation. AppStat L07 inserts a basic question before any of that:
Does a stock Random Forest already match the MLP?
Steps 4-6 are buying nothing.
The deep model is over-engineering; ship the RF.
The MLP captures genuine nonlinear structure that RF can't.
Justifies the extra complexity.
ISSUE-028 · ISSUE-036 (Step 8 PINN: which physics is it?)
A worm-Chopin model is "Chopin-like" iff every floor below is passed:
| Component | What it measures | Floor |
|---|---|---|
| pitch_aware_f1 @ 50 ms | onsets matched in BOTH time AND pitch | ≥ 0.20 |
| AUC-PR (binned onset classification) | threshold-free quality of the activation | ≥ 0.30 |
| ioi_similarity (already in pyannow) | rhythmic distribution overlap | ≥ 0.30 |
| velocity_correlation | Pearson r over matched-note velocities (dynamics) | ≥ 0.20 |
| bootstrap 95 % CI | uncertainty around every claim | CI excludes Step 0 |
Reported by wormuse_analytics.pipeline.ImprovedPipeline.score_all().
None of these is currently in PyANNOW's `losses` dict.
| ID | Title | Lec | Status |
|---|---|---|---|
| ✅ 018 | Builder/notebook desync | L06 (infra) | v0.7.0 |
| 019 | Cell 20 left panel still misleads | L06 | open |
| ✅ 020 | Multi-tol F1 + PR + ROC sweep | L06 | v0.8.0 |
| ✅ 021 | Bootstrap CIs for per-step F1 | L06 | v0.8.0 |
| 022 | Descriptive stats of outputs | L00 | open |
| 023 | PCA biplot + standardization | L01 | open |
| 024 | t-SNE / UMAP manifold view | L02 | open |
| 025 | Compare four clustering methods | L03 | open |
| 026 | Lab V diagnostics on Ridge | L04 | open |
| 027 | Logistic onset detector | L05 | open |
| 028 | RF baseline + permutation importance | L07 | open |
| ✅ 029 | Synthetic 302-neuron matrix | L01 (logic #1) | v0.7.0 |
| 030 | Step 0 mislabelled "random" | L00 (logic #2) | open |
| ✅ 031 | 8-muscle pitch bottleneck | L06 (logic #3) | v0.7.0 |
| ✅ 032 | Procrustes feature mismatch | L04 (logic #4) | v0.8.0 |
| 033 | Magic-threshold peak detector | L05 (logic #5) | open — P1 |
| ✅ 034 | Chopin lossy k=8 compression | L01 (logic #6) | v0.8.0 |
| ✅ 035 | Pitch-aware F1 missing | L06 (logic #7) | v0.8.0 |
| ✅ 036 | Step 8 PINN ≠ ion-channel PINN | L04 (logic #8) | v0.8.0 |
| ✅ 037 | Biological ceiling overstated | L06 (logic #9) | v0.8.0 |
| ✅ 038 | No cross-validation anywhere | L06 (logic #10) | v0.8.0 |
11 resolved ✅ (7 in v0.7.0, 4 in v0.8.0) · 10 open · one md per issue under docs/proposed_pyannow_issues/
| Step | Method | F1 (pitch-aware) | vs baseline |
|---|---|---|---|
| 0 | Rule-based baseline | 0.186 | — |
| 1 | SVD + Procrustes (standardize=True) | 0.000 | residual 89.8→4.06 ✅; ISSUE-033 open |
| 2 | K-means | 0.110 | −0.076 |
| 3 | Ridge | 0.000 | pending investigation |
| 4-6 | MLP + Adam + L-BFGS | 0.193 | +0.007 first beat ↑ |
procrustes_align(standardize=True) — z-score W_k before SVD.
PC scale imbalance was 6.9×. Residual: 89.807 → 4.059.
build_chopin_features(k_chopin=None) — 90% cumvar auto-selection.
For 10s clip: k=5 captures 97.7% variance (was k=8 fixed).
Next priority: ISSUE-033 — logistic onset detector to fix Step 1 regression.
A pipeline that measurably improves toward Chopin, with uncertainty quantified, logic problems explicit, and deep-model claims falsifiable.
The 21 [@appstat-audit] issues form a roadmap: close them in priority order
(P1 first) and PyANNOW's notebook 03 ends up reporting a scalar pitch-aware F1 that
actually correlates with how Chopin-like the audio sounds.
For full details: docs/STATISTICAL_DIAGNOSTICS.md.
For executable demonstration: notebooks/01_appstat_lecture_audit.ipynb.
For drop-in PyANNOW issues: docs/proposed_pyannow_issues/ISSUE-{018..038}.md.