PyANNOW · Politecnico di Milano · NAML 2025-26 · v2.0.0

The Worm Dances
to Chopin

From musical patterns to biological locomotion —
a NAML pipeline: RSVD · K-means · Pearson · Least Squares

Piano-roll M ∈ ℝ^{56×T} RSVD K-means 8 patterns Pearson Excitability Z_worm ↔ V_p LS-reg. 🐛 Dance 96 muscles → live

Vahid Ghayoomie · wormuse / PyANNOW · May 2026

Seyed Vahid Ghayoomie

🐛
M.Sc. HPC Eng.
Politecnico di Milano
2025 – present

🔬 Research — C. elegans Systems Neuroscience

OpenWorm Foundation (Delaware non-profit, 2014 – present)
Led ChannelWorm — Python platform for ion-channel modeling in C. elegans: patch-clamp data curation, HH fitting, API + validation suite.

chModeler (2018 – present) — ion-channel "supermodel" + ML predictor for kinetics from amino-acid sequences (~270 channels, Xenopus oocyte dataset).

Phil. Trans. R. Soc. B 2018 F1000Research 2016 ★ Invited Speaker — The Royal Society, London 2018

🎓 Education & Context

B.Sc. IT Engineering · M.Sc. studies in Biology & Mechatronics · Currently: M.Sc. HPC Engineering @ PoliMi (CUDA, OpenMP, NAML, NLA, NMPDE).
Industrial: Big Data @ Sadad/Melli Bank (2015–18); co-founder Neursal (2022–23).

vahidghayoomi@gmail.com · github.com/vahidgh · github.com/VahidGh

What we'll cover today

Context
🔬 v1.0.0 — What we built
HH simulation, 10 NAML steps, F1 progression
Part I
🎹 The Data
Piano-roll M, 96-muscle architecture
Part II
📐 RSVD + Eckart-Young
Best rank-k musical basis
Part III
🔵 K-means + Silhouette
Discovering 8 musical states
Part IV
⚡ Excitability
Pearson r + least-squares mapping
Part V
🐛 The Dance
Body wave, L/R fix, live viz →
Synthesis
🔗 NAML Connections
All 7 methods from the course
Wrapup
💡 Insights + Open questions
What biology taught us about ML

The v1 question: Worm → Music

Approach
HH simulation → note generation
302 neurons, Hodgkin-Huxley ion channels, 96 BWMs
10 NAML steps
SVD → K-means → Ridge → MLP → Adam → L-BFGS
Each step: more NAML, (hopefully) better F1
Best result
F1 = 0.879 (Step 9: worm + Fourier)
vs Step 0 random baseline F1 = 0.186

The problem with v1:
HH simulation reaches a fixed-point attractor — the worm locks into one muscle pattern, making raw muscle data useless for varied visualization.

v2 inversion:
Instead of Worm→Music, we ask:
Music → Patterns → Worm Dance
Extract recurring patterns from Chopin,
drive the worm's 96 muscles from music.

Part I

🎹 The Data

Two matrices — a piano and a worm — connected by linear algebra

The piano-roll: M ∈ ℝP×T

$$M_{p,t} = \begin{cases}1 & \text{pitch } p \text{ active at time } t \\ 0 & \text{otherwise}\end{cases}$$

Dimensions

P = 56 unique pitches (MIDI bins)
T = 11 711 time frames at 20 ms resolution
Duration: 234 s · BPM: 69
Density: ~9% active

Why binary?

MIDI note-on/off events → binary presence. Velocity and duration encoded separately. This is the pitch × time representation standard in music information retrieval (MIR).

Chopin Nocturne in C# minor (first 30 s) pitch rank (0–55) time (s) → 0 ............... 30 U[:,0] projection

96 body-wall muscles = 96 piano keys

Boyle et al. 2012 — 4 quadrants × 24 segments:
DL (dorsal-left) ≡ MIDI 24–47 (C1–B2, bass)
VL (ventral-left) ≡ MIDI 48–71 (C3–B4, tenor)
DR (dorsal-right) ≡ MIDI 72–95 (C5–B6, alto)
VR (ventral-right) ≡ MIDI 96–119 (C7–B8, treble)

Phase structure

DL/DR fire in-phase (dorsal body-wave).
VL/VR fire 180° anti-phase (ventral).
Bilateral pairs have 0.05 rad lateral offset.
Body-wave travels head→tail within each quadrant.

C. elegans body-wall muscles head DL · bass (24–47) DR · alto (72–95) VL · tenor (48–71) VR · treble (96–119) s=0 s=23 body wave →
Part II

📐 RSVD + Eckart-Young

Finding the best musical basis — the same theorem that compresses images

The best rank-k approximation

$$M = U \Sigma V^T, \quad \hat{M}_k = U_k \Sigma_k V_k^T \;\;\text{minimises}\;\; \|M - \hat{M}_k\|_F$$

Components (piano-roll M)

$U_k \in \mathbb{R}^{56 \times k}$ — pitch profiles: which pitches co-activate in each pattern

$\sigma_i$ — pattern energy: $\sigma_i^2 / \|\sigma\|^2$ = fraction of musical variance

$V_k \in \mathbb{R}^{T \times k}$ — temporal envelopes: when each pattern is active

Course connection (Lab01)

In Lab01 we compressed images by keeping the top-k singular values. Here we compress a musical score the same way.

The k=12 most energetic patterns explain 90% of the piano-roll's Frobenius energy.

Each pattern $U_k[:,i]$ is a pitch chord; $V_k[:,i]$ is its rhythmic envelope.

Larger $\sigma_i$ = more musical energy.
$\sigma_1$ alone captures ~42% of variance
(the nocturne's repeating motif).

Why Randomized SVD?

Problem
M is 56 × 11 711 — dense and wide
Full SVD: O(PT·min(P,T)) — wasteful for k≪P
Algorithm (course version)
1. Sketch: Ω ← random (T×(k+p))
2. Y = M Ω · (M M^T)^q Y → power iteration
3–4
Q = orth(Y), B = Q^T M, SVD(B) = Û Σ V^T
Final: U = Q Û · Cost: O(PT·k) ≪ O(PT·min(P,T))
Result
k=12, p=10 oversampling, q=2 power iters
~90% variance in <0.5 s · same as Lab01 image compression
Piano-roll scree plot var. frac. component k k=12 90% 1 5 9 12

k=12 chosen by 90% cumulative variance threshold

Part III

🔵 K-means + Silhouette

Discovering recurring musical states from temporal modes

Musical states from Vk temporal modes

Input: $V_k \in \mathbb{R}^{T \times k}$ — each row is a time frame represented as a k-dimensional mode-coordinate vector
K-means clusters the T=11711 frames into K groups. Each cluster = one recurring musical state (same harmony + rhythm profile)
Analogy to Lab02/Lab10: same as clustering MNIST digits in PCA space — here we cluster music frames in SVD mode space
Each cluster centroid $\mu_j \in \mathbb{R}^k$ describes the average modal fingerprint of a musical pattern
$$\min_{\{c_j\}} \sum_{t=1}^T \| V_k[t,:] - \mu_{c_t} \|_2^2$$
V_k frame clusters (schematic, 2 modes shown) P1 P2 P3 P4 mode V[:,0] V[:,1]

How many patterns? K* = 8

$$s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} \in [-1, 1]$$

$a(i)$ = mean intra-cluster distance · $b(i)$ = nearest-cluster distance

Interpretation

$s \to 1$: tight, well-separated clusters → good K
$s \to 0$: frames on cluster boundaries → ambiguous
$s \to -1$: misclassified → bad K

Result: K*=8 musical states

8 clusters capture the main harmonic transitions of the nocturne: opening motif, bridge, development, recapitulation… Matches human perception of phrase structure.

Silhouette score vs K K*=8 Silhouette K 2 3 4 5 8 10 12

Peak silhouette at K=8 — 8 musical patterns

Part IV

⚡ Excitability + Least Squares

Which Chopin patterns make the worm's nervous system resonate?

Ranking patterns by worm excitability

$$r_{ij} = \frac{\langle V_p[:,i],\, Z_\text{worm}[:,j] \rangle} {\|V_p[:,i]\|\,\|Z_\text{worm}[:,j]\|}$$

$\text{excitability}(i) = \max_j |r_{ij}|$

$V_p[:,i]$ — temporal envelope of Chopin pattern $i$ (from RSVD)
$Z_\text{worm}[:,j]$ — worm neural score $j$ (from forward HH simulation, projected onto mode $j$)
High $|r|$ → the worm's nervous system naturally oscillates in sync with that Chopin pattern
Patterns ranked by excitability → highest-ranked patterns drive the worm's dance poses

Why Pearson (not Euclidean)?

Pearson r is scale-invariant — it measures shape correlation, not amplitude. The worm's neural oscillations may have different amplitude from Chopin's temporal modes but the same rhythmic structure.

Excitability ranking (mock):
Pattern 3
0.88
Pattern 1
0.72
Pattern 7
0.61
Pattern 2
0.45

Neural scores → 96 muscles via lstsq

$$W_{nm} = \arg\min_W \| Z_\text{worm} W - V_\text{mus} \|_F$$ $$W_{nm} = Z_\text{worm}^+ \, V_\text{mus} = \mathtt{lstsq}(Z_\text{worm}, V_\text{mus})$$

Shapes

$Z_\text{worm} \in \mathbb{R}^{T_p \times k_w}$ — worm neural scores
$V_\text{mus} \in \mathbb{R}^{T_p \times 96}$ — muscle modes
$W_{nm} \in \mathbb{R}^{k_w \times 96}$ — neural→muscle map

What it means

Each column of $W_{nm}$ tells us how neural modes linearly combine to activate one muscle. This is the biological motor program encoded as a matrix.

Why pseudoinverse?
Z_worm is often underdetermined
More muscles (96) than neural modes (k_w ≈ 4–8). $Z^+ = V \Sigma^+ U^T$ handles rank-deficiency gracefully.
Per-pattern pose
avg(Z_worm[cluster==j]) × W_nm → pose_j ∈ ℝ^{96}
Each of the K*=8 musical patterns maps to a 96-muscle activation pose — the worm's dance.

# step2_clustering/motor_primitives.py
W_nm = np.linalg.lstsq(Z_worm_p, V_mus_p, rcond=None)[0]
pose = {j: (Z_worm_p[labels==j].mean(0) @ W_nm)
        for j in range(K)}
      
Part V

🐛 The Worm Dance

From piano patterns to body waves — watch it happen on the right →

Piano modes → locomotion signals

synthMusFromVp(vp) — 4 mode roles

Mode 0 — amplitude: scales overall muscle contraction
Mode 1 — body-wave phase: $\phi(s) = \phi_0 + s \cdot 2\pi/24$
Mode 2 — D/V bias: dorsal vs ventral imbalance
Mode 3 — L/R offset: lateral turning signal

$$\text{mus}[q,s] = A \cdot \bigl(1 + \sin(\phi(s) + \phi_q)\bigr)/2$$

$\phi_q \in \{0, \pi, \epsilon, \pi+\epsilon\}$ for DL/VL/DR/VR quadrants
Wave travels head→tail as $s$ increases from 0 to 23.

From HH to synthetic
Why not use raw HH muscle output?
The HH simulator reaches a fixed-point attractor — every pattern drives the same steady muscle state. No variety. Synthetic wave from piano modes gives rich, pattern-specific motion.
Navigation from muscles
headDL vs headDR → L/R turn
Both dorsal quadrant head segments (s=0..7) — avoids D/V anti-phase cancellation (proved below).
Dorsal body-wave (propagating) head tail → direction

Why DL + VL never gives L/R signal

Mathematical proof:

DL body wave: $A \cdot \tfrac{1}{2}(1 + \sin\theta)$ where $\theta = \phi_0 + s\omega$
VL body wave: $A \cdot \tfrac{1}{2}(1 + \sin(\theta + \pi)) = A \cdot \tfrac{1}{2}(1 - \sin\theta)$

$\text{leftM} = \frac{\text{DL} + \text{VL}}{2} = \frac{A}{2}\left[\frac{1+\sin\theta}{2} + \frac{1-\sin\theta}{2}\right] = \frac{A}{4}$ ← constant!

The $\sin\theta$ terms cancel exactly. DL+VL is always $A/4$, regardless of body wave phase. No L/R information survives the averaging.

Fix: use same-side quadrants

headDL = mean(DL[s=0..7]) — dorsal-left head
headDR = mean(DR[s=0..7]) — dorsal-right head

Both dorsal → no D/V cancellation.
Bilateral offset $\varepsilon=0.05$ rad preserved → real L/R signal.

Behavior distribution (verified)
After fix — 800 patterns sampled
HALT
51%
FORWARD
26%
FWD+R
15%
FWD+L
8%
Key insight: Anti-phase is not a bug — it is how C. elegans generates propulsive thrust. We must measure it on one side only.

The worm dances while Chopin plays

🐛 Left canvas — Worm body

Body-wave from piano temporal modes. Color = current pattern (P1–P8). Trail = locomotion history (FORWARD / TURN / HALT).

🔌 Right canvas — Neural circuit

302-neuron connectome compressed to 6 MN classes (DA/DB/VA/VB/DD/VD). Connections visible at resting activation. CMD nodes: AVA/AVD/AVB/PVC.

📊 Temporal timeline

Pattern ID bar at bottom. Each color segment = one K-means musical state. Pattern transitions align with harmonic changes in the nocturne.

🎵 Piano patterns

8 circular pattern indicators. Active pattern glows and shows excitability score. Pattern 3 most excitable (r≈0.88).

→ → Look at the right panel → →

Every method comes from NAML lectures

Method NAML reference Applied to
RSVD (Halko 2011) L06–L09 · Lab01 Extract top-k musical basis from piano-roll M
Eckart-Young theorem L06 · core theorem Guarantees RSVD is optimal rank-k approximation
PCA equivalence L08–L10 · Lab02 V_k rows = PCA coordinates of time frames
K-means L10 · Lab02 / Lab05 Cluster T frames in V_k space → 8 musical states
Silhouette score Lab02 / AppStat Data-driven choice of K* = 8
Least squares / pseudoinverse L07 · L09 · Lab03 Neural scores → 96-muscle activation (W_nm)
Pearson correlation Lab02 / AppStat Rank patterns by biological excitability
The full pipeline is a chain of NAML primitives: $M \xrightarrow{\text{RSVD}} U_k, \Sigma_k, V_k \xrightarrow{\text{K-means}} \text{labels} \xrightarrow{\text{Pearson}} \text{excitability} \xrightarrow{\text{lstsq}} W_{nm} \xrightarrow{\text{synth}} \text{muscles} \xrightarrow{} \text{dance}$

Insights from building PyANNOW v2

Eckart-Young in practice
SVD is not just compression
The top-k singular vectors of a music matrix are musically interpretable: U[:,0] = the tonic chord, V[:,0] = the repeating rhythmic motif. The theorem tells us they are optimal.
K-means + silhouette
K*=8 matches musical intuition
A nocturne has 4–8 distinct phrase types. The silhouette criterion recovered this without listening to the music — pure geometry on V_k.
Pseudoinverse
Underdetermined systems everywhere in biology
302 neurons driving 96 muscles: system is rank-deficient. lstsq / pseudoinverse handles this naturally — the same algebra as Lab03 Ridge on California housing.
Anti-phase cancellation
Biology breaks naive signal processing
DL + VL is identically constant — the biological anti-phase design cancels the locomotion signal. You must understand the data-generating process before computing a mean.
HH fixed-point attractor
Simulation ≠ exploration
Without external input, HH neurons settle to a periodic attractor — not ideal for varied visualization. Synthetic waves from piano modes give richer dynamics.
OpenWorm connection
C. elegans is the perfect ML lab animal
302 neurons, fully mapped connectome, published kinematics. Large enough to be interesting; small enough to simulate in Python.

Where could this go next?

🔬 Real electrophysiology data

Replace synthetic HH with actual patch-clamp data from chModeler (~270 channels, Xenopus oocytes). Can the ML predictor recover Chopin-correlated kinetics?

🧠 PINN for HH dynamics

Use a Physics-Informed Neural Network (NAML L27) to replace the HH ODE solver. Loss = data residual + HH PDE residual. Faster inference, differentiable w.r.t. channel parameters.

🔄 Closed loop

Music → worm → sensor signal → modify music. The worm's movement generates a new piano-roll → re-run the RSVD pipeline → update dance. Biological feedback as an artistic medium.

🎓 NAML extensions

Replace K-means with kernel K-means (L12-13) for non-linear musical clusters. Replace lstsq with Tikhonov regularization when W_nm is ill-conditioned.

Thank you!

What we built
Piano → RSVD → K-means → Pearson → lstsq → Worm Dance
7 NAML methods, 8 musical patterns, 96 muscles, one live visualization →
Eckart-Young
The best rank-k basis is the SVD — in music and biology alike
σ₁ captures 42% of Chopin's variance; the top modes are musically interpretable
Key surprise
Anti-phase cancellation: DL + VL ≡ constant regardless of body wave
A mathematical property of worm locomotion — and a lesson about understanding data before computing means

Code + notebook:

github.com/vahidgh/wormuse · PyANNOW/notebooks/06_chopin_patterns_worm_dance_v2.ipynb

vahidghayoomi@gmail.com

The worm is still
dancing →
PyANNOW v2.0.0 — Live Worm Dance chopin_nocturne_csharp_minor · 96 BWM · 302 neurons
← → navigate · F full-screen · Space next