Group B: Typed Neuronal Species
A separate research stream from the main NEAT/neuroevolution work. Started 2026-05-07 after the main stream plateaued at 99.73% MNIST on a [128, 64] architecture and we ran out of obvious architectural levers.
The hypothesis
Real brains don’t have identical neurons. ReLU/Sigmoid/Tanh nodes with scalar I/O are a computational caricature, not a biological reality. What if we let evolution discover typed neuronal species — nodes with non-scalar I/O signatures and specialized computational primitives, not just different scalar activations?
The first species under investigation is the patch matcher: a node that takes a vector of pixels and a vector of weights and emits a scalar. The minimal version is a learnable convolutional filter as a single node.
The strategy
Prove the inductive bias pays before building evolutionary machinery for typed species. Run stripped-down hand-coded experiments in src/bin/ that share nothing with the NEAT genome/phenotype code. If a result is strong enough to justify the integration cost, the relevant primitive can be lifted into the main genome representation later.
The pages
- Group B Journal — chronological observations and per-experiment narrative for the typed-species stream
- Group B Experiments — structured experiment records (B1, B2, …)
Headline so far
| Experiment | Setup | Best test acc |
|---|---|---|
| B1 | Raw dot-product prototype matcher | 64.99% |
| B2 | Cosine prototype matcher (argmax) | 83.53% |
| B3 | Cosine features → trained linear | 84.85% |
| B4 | 32 random 5×5 patches, frozen | 67.56% |
| B5 | Prototype-slice patches, frozen, sweep N | 93.67% |
| B6 | Random patches, frozen, sweep count | 94.01% |
| B7 | Trained patches, sweep count | 97.27% |
| B8 | Trained patches, sweep size × count | 98.04% |
| B9 | Rectangular patches, single seed | (suggestive, retracted) |
| B9-stats | Same with 5-seed paired stats | 97.38% (5×7 mean) |
| B10 | Rectangular patches on Fashion-MNIST | 86.53% (4×6 mean) |
| B11 | Rectangular patches on rotated MNIST | sign of preference flipped (✓) |
| B12 | Extreme aspect ratios (1:3 → 1:21) | non-monotonic (1:9 null surprise) |
| B13 | Random-index “patches” on MNIST | spatial wins (+0.61pp ***) |
| B14 | Multi-scale (3/5/7) vs single 5×5 | mix wins at low N only (+0.53pp ***) |
| B15 | Multi-layer (hidden ReLU) | depth hurts at fixed budget (−1.7pp) |
| B16 | A3 cross-task on Fashion-MNIST | locality is MNIST-specific (ns) |
| B17 | A3 across patch sizes 3/5/7 (MNIST) | locality robust at every size (***) |
| B18 | Task difficulty calibration (KMNIST/EMNIST/mixes) | KMNIST is best workhorse |
| B19 | Locality on KMNIST | sign FLIP — indexed wins (***) |
| B20 | Locality on EMNIST balanced | spatial wins like MNIST (***) |
| B21 | Multi-scale on KMNIST | doesn’t replicate (all ns) |
| B22 | Locality on MNIST+KMNIST mix | cancels to null (ns) |
| B23 | Locality across sizes on KMNIST | flip robust at every size (***) |
| B24 | Multilayer on KMNIST | hurt at fixed budget (later reversed) |
| B25 | Multilayer KMNIST + LR-decay schedule | +2.78pp *** — B15 conclusion REVERSED |
| B27 | Pixel-correlation probe | null result; simple correlation doesn’t predict locality |
| B28 | Scaling sweep on KMNIST/EMNIST | KMNIST 72→92%, EMNIST 52→80% |
| B29 | Rectangular on KMNIST | 3×9/9×3 sign flips on KMNIST (*) |
| B30 | Multi-scale on EMNIST | also doesn’t replicate (all ns) |
| B31 | Per-pixel discriminability + autocorr d=5 | predicts locality direction perfectly |
| B32 | Multilayer MNIST + schedule | null on saturated MNIST |
| B33 | Rectangular on EMNIST | follows MNIST/Fashion +0.98pp *** |
| B34 | Multilayer EMNIST + schedule | hurts even with schedule (−1.11pp) |
| B35 | Wider multilayer EMNIST (M=128, 256) | bottleneck hypothesis disproved |
The original Group B hypothesis — patch matchers as a learnable typed species — is supported. SGD-trained patches feeding a linear classifier reach 98.04% MNIST with 640 7×7 patches, approaching the main stream’s [128] dense-hidden result (98.7%) but with locality as the inductive bias rather than dense connectivity.
Two methodological inflections worth noting:
B5’s apparent “meaningful content matters” win was partially an illusion — at sufficient feature count (640), random patches catch up to prototype-slice patches. Without B6 to control for capacity we’d have walked away with the wrong reading.
B9’s single-seed “5×7 N=320 = 97.66%” headline was a lucky-seed artifact. The 5-seed mean is 97.38%, statistically tied with 6×6 (97.34%). B9-stats stood up reusable stats infrastructure (paired-t, Cohen’s d_z) and corrected several B9 conclusions. Going forward, any quantitative claim about a sub-0.3pp difference in this codebase needs multi-seed paired stats.
The rectangular-patch arc: at extreme aspect ratios (1:3) on MNIST, wide patches significantly outperform tall patches (3×9 vs 9×3: Δ=+0.51pp, t=4.22, d_z=1.89, ***). The effect replicates and strengthens on Fashion-MNIST (Δ=+0.93pp, ***), refuting the original “digit-stroke geometry” guess and pointing at a more general principle: kernels should be perpendicular to the dominant feature orientation. B11 confirmed this mechanistically by rotating MNIST 90° CW — the 3×9 vs 9×3 sign cleanly flipped to −0.40pp *. The pattern moves with the data, ruling out architectural bias or placement-geometry artifact.
B12 added a surprise: the wide-tall gap is not monotonic with aspect ratio. There’s a null at 1:9 (1×9 vs 9×1: Δ=+0.01pp, ns) sandwiched between large effects at 1:3 and 1:15+. The mechanism turns out to be at least two phenomena: thick rectangular patches (≥3 px perpendicular) win wide because they cut across vertical features; long single-pixel-thick strips (1×15, 1×21) win wide because horizontal cross-sections are more discriminative for digit width than vertical cross-sections are for digit height. 1×9 has neither property — zero perpendicular extent and too short to be a discriminative cross-section sampler — so no preference appears.
B13-B17 explored three architectural levers in parallel — multi-scale (A1), multi-layer (A2), and locality / random-index (A3) — and produced three different shapes of result:
-
Multi-scale (B14): conditional positive. Mixing 3/5/7 patch sizes wins at low N (+0.53pp *** at N=240), but the advantage vanishes at high N (N=480, ns). Receptive-field diversity helps when no single size has enough patches to fully exploit it; doesn’t matter when capacity is abundant.
-
Multi-layer (B15): conditional negative. Adding a hidden ReLU layer hurts uniformly across M ∈ {32, 64, 128} (−1.5 to −1.7pp) at our fixed 10-epoch / fixed-LR training budget. Opposite of what depth did in the main NEAT stream — likely because main stream runs 1.8M steps with LR decay vs our 500K fixed-LR. Depth here would need retuned hyperparameters before being declared dead.
-
Locality (B13/B16/B17): task-specific positive. On MNIST, spatial 5×5 trained patches beat 25-random-pixel-index “patches” by +0.61pp *** (B13), with the effect robust across patch sizes (B17: +0.6 to +0.9pp *** at 3×3, 5×5, 7×7). On Fashion-MNIST the advantage vanishes (B16: −0.12pp ns). Spatial contiguity matters a lot on digit-stroke data where adjacent pixels carry highly correlated information; less so on Fashion where pixel correlations are weaker.
Group B has now produced both transferability profiles: rectangular-patch wide preference is task-general (replicated on Fashion, flipped with rotation); spatial-locality advantage is MNIST-specific (absent on Fashion). For the eventual NEAT integration, that means architectural choices in the typed-species framework will have heterogeneous transferability — useful to know before committing to the integration refactor.
B18-B24 ran the architectural levers on harder tasks (KMNIST, EMNIST balanced, mixed datasets) to escape MNIST saturation. The picture changed substantially:
-
Locality finding (A3) flips sign on KMNIST. Spatial wins on printed-character data (MNIST +0.6-0.9pp ***, EMNIST +1.03pp ***); spatial loses on cursive Japanese characters (KMNIST −1.16 to −1.38pp *** at every patch size). Texture data (Fashion) is null. Mixed positive+negative tasks cancel to null (MNIST+KMNIST = +0.09pp ns).
-
Multi-scale finding (A1) was MNIST-specific. On KMNIST, mixing 3/5/7 patch sizes vs single-5×5 is non-significant at every patch count tested. Whatever benefit receptive-field diversity provided on MNIST didn’t generalize.
-
Multilayer hurt (A2) replicates on KMNIST. With 7pp more headroom than MNIST, adding a hidden ReLU layer still hurt by 1.15-1.85pp. The under-training mechanism from the original B15 reading was right; this is now a robust negative.
So of five findings tested across multiple tasks, two are task-general (patch-matchers themselves, multilayer-hurts-at-fixed-budget) and three are task-specific to varying degrees. The most striking is the locality sign flip — what looked like a robust architectural property of patches (B13/B16/B17) turned out to be a property of the data’s spatial correlation structure that varies dramatically across superficially similar tasks.
For the eventual NEAT integration: the genome should support evolving patch geometry, placement strategy, and contiguity per task, not lock in MNIST-derived defaults. KMNIST and EMNIST are now part of the standard task battery.
B25-B35 produced a major correction and the cleanest mechanistic story so far.
B25 reverses B15. The earlier “multilayer hurts” conclusion was a training-budget artifact. With 20 epochs + LR decay 0.05→0.005 on KMNIST, multilayer goes from −1.85pp at fixed budget to +2.78pp *** with proper schedule. B32 confirmed the reversal isn’t an MNIST phenomenon (depth is null there because MNIST saturates near 98% for this patch capacity). B34/B35 found depth still hurts on EMNIST even with proper schedule and wide hidden layers (M=128, M=256), so the simple “headroom” theory fails — depth’s value is task-conditional in a way that depends on the specific structure of class-discriminative features.
B31 cracks the locality puzzle. Compute the per-pixel class-discriminability map (F-like ratio of between-class to within-class variance), then take its spatial autocorrelation at the patch scale (d=5). The values rank the four datasets exactly in the order of the locality effect: MNIST 0.32 → EMNIST 0.37 (spatial wins), Fashion 0.13 (null), KMNIST −0.07 (spatial loses). KMNIST’s class-discriminative information is not spatially clustered at the patch scale — fixed-position 5×5 patches can’t reliably catch concentrated discriminative content; random-index “patches” sampling 25 pixels from anywhere distribute their sampling more effectively. The locality direction is a measurable data property — discoverable without training.
The rectangular wide-preference holds on 3 of 4 tasks (B29/B33). MNIST +0.51, Fashion +0.93, EMNIST +0.98 (all ***). KMNIST: −0.40 *. Cursive Japanese characters apparently have a different dominant feature orientation than printed-character or texture data.
Multi-scale is fully MNIST-specific (B30 confirms B21): doesn’t replicate on KMNIST or EMNIST. The B14 finding was a lucky MNIST-specific quirk.
Updated transferability tally: of 5 architectural findings tested across multiple datasets, only the bare patch primitive is fully task-general. Every detail (geometry, placement, depth, training schedule, mixing) is task-conditional. KMNIST is the most frequent outlier — it inverts locality, inverts rectangular preference, and is the only task where multilayer clearly helps with proper schedule. The MNIST-derived Group B story doesn’t transfer; the genome design for typed-species NEAT integration needs to evolve all of these axes per task.
B8 adds a parameter-efficiency picture: smaller patches at high count win at low parameter budgets (coverage beats receptive field when params are scarce), 3×3 has a hard ceiling around 96.5%, and all sizes converge at 96.5%-98% by N=640. Patch size is an axis to evolve, not a hyperparameter to fix.