Group B: Typed Neuronal Species

A separate research stream from the main NEAT/neuroevolution work. Started 2026-05-07 after the main stream plateaued at 99.73% MNIST on a [128, 64] architecture and we ran out of obvious architectural levers.

The hypothesis

Real brains don’t have identical neurons. ReLU/Sigmoid/Tanh nodes with scalar I/O are a computational caricature, not a biological reality. What if we let evolution discover typed neuronal species — nodes with non-scalar I/O signatures and specialized computational primitives, not just different scalar activations?

The first species under investigation is the patch matcher: a node that takes a vector of pixels and a vector of weights and emits a scalar. The minimal version is a learnable convolutional filter as a single node.

The strategy

Prove the inductive bias pays before building evolutionary machinery for typed species. Run stripped-down hand-coded experiments in src/bin/ that share nothing with the NEAT genome/phenotype code. If a result is strong enough to justify the integration cost, the relevant primitive can be lifted into the main genome representation later.

The pages

Headline so far

Experiment Setup Best test acc
B1 Raw dot-product prototype matcher 64.99%
B2 Cosine prototype matcher (argmax) 83.53%
B3 Cosine features → trained linear 84.85%
B4 32 random 5×5 patches, frozen 67.56%
B5 Prototype-slice patches, frozen, sweep N 93.67%
B6 Random patches, frozen, sweep count 94.01%
B7 Trained patches, sweep count 97.27%
B8 Trained patches, sweep size × count 98.04%
B9 Rectangular patches, single seed (suggestive, retracted)
B9-stats Same with 5-seed paired stats 97.38% (5×7 mean)
B10 Rectangular patches on Fashion-MNIST 86.53% (4×6 mean)
B11 Rectangular patches on rotated MNIST sign of preference flipped (✓)
B12 Extreme aspect ratios (1:3 → 1:21) non-monotonic (1:9 null surprise)
B13 Random-index “patches” on MNIST spatial wins (+0.61pp ***)
B14 Multi-scale (3/5/7) vs single 5×5 mix wins at low N only (+0.53pp ***)
B15 Multi-layer (hidden ReLU) depth hurts at fixed budget (−1.7pp)
B16 A3 cross-task on Fashion-MNIST locality is MNIST-specific (ns)
B17 A3 across patch sizes 3/5/7 (MNIST) locality robust at every size (***)
B18 Task difficulty calibration (KMNIST/EMNIST/mixes) KMNIST is best workhorse
B19 Locality on KMNIST sign FLIP — indexed wins (***)
B20 Locality on EMNIST balanced spatial wins like MNIST (***)
B21 Multi-scale on KMNIST doesn’t replicate (all ns)
B22 Locality on MNIST+KMNIST mix cancels to null (ns)
B23 Locality across sizes on KMNIST flip robust at every size (***)
B24 Multilayer on KMNIST hurt at fixed budget (later reversed)
B25 Multilayer KMNIST + LR-decay schedule +2.78pp *** — B15 conclusion REVERSED
B27 Pixel-correlation probe null result; simple correlation doesn’t predict locality
B28 Scaling sweep on KMNIST/EMNIST KMNIST 72→92%, EMNIST 52→80%
B29 Rectangular on KMNIST 3×9/9×3 sign flips on KMNIST (*)
B30 Multi-scale on EMNIST also doesn’t replicate (all ns)
B31 Per-pixel discriminability + autocorr d=5 predicts locality direction perfectly
B32 Multilayer MNIST + schedule null on saturated MNIST
B33 Rectangular on EMNIST follows MNIST/Fashion +0.98pp ***
B34 Multilayer EMNIST + schedule hurts even with schedule (−1.11pp)
B35 Wider multilayer EMNIST (M=128, 256) bottleneck hypothesis disproved

The original Group B hypothesis — patch matchers as a learnable typed species — is supported. SGD-trained patches feeding a linear classifier reach 98.04% MNIST with 640 7×7 patches, approaching the main stream’s [128] dense-hidden result (98.7%) but with locality as the inductive bias rather than dense connectivity.

Two methodological inflections worth noting:

B5’s apparent “meaningful content matters” win was partially an illusion — at sufficient feature count (640), random patches catch up to prototype-slice patches. Without B6 to control for capacity we’d have walked away with the wrong reading.

B9’s single-seed “5×7 N=320 = 97.66%” headline was a lucky-seed artifact. The 5-seed mean is 97.38%, statistically tied with 6×6 (97.34%). B9-stats stood up reusable stats infrastructure (paired-t, Cohen’s d_z) and corrected several B9 conclusions. Going forward, any quantitative claim about a sub-0.3pp difference in this codebase needs multi-seed paired stats.

The rectangular-patch arc: at extreme aspect ratios (1:3) on MNIST, wide patches significantly outperform tall patches (3×9 vs 9×3: Δ=+0.51pp, t=4.22, d_z=1.89, ***). The effect replicates and strengthens on Fashion-MNIST (Δ=+0.93pp, ***), refuting the original “digit-stroke geometry” guess and pointing at a more general principle: kernels should be perpendicular to the dominant feature orientation. B11 confirmed this mechanistically by rotating MNIST 90° CW — the 3×9 vs 9×3 sign cleanly flipped to −0.40pp *. The pattern moves with the data, ruling out architectural bias or placement-geometry artifact.

B12 added a surprise: the wide-tall gap is not monotonic with aspect ratio. There’s a null at 1:9 (1×9 vs 9×1: Δ=+0.01pp, ns) sandwiched between large effects at 1:3 and 1:15+. The mechanism turns out to be at least two phenomena: thick rectangular patches (≥3 px perpendicular) win wide because they cut across vertical features; long single-pixel-thick strips (1×15, 1×21) win wide because horizontal cross-sections are more discriminative for digit width than vertical cross-sections are for digit height. 1×9 has neither property — zero perpendicular extent and too short to be a discriminative cross-section sampler — so no preference appears.

B13-B17 explored three architectural levers in parallel — multi-scale (A1), multi-layer (A2), and locality / random-index (A3) — and produced three different shapes of result:

Group B has now produced both transferability profiles: rectangular-patch wide preference is task-general (replicated on Fashion, flipped with rotation); spatial-locality advantage is MNIST-specific (absent on Fashion). For the eventual NEAT integration, that means architectural choices in the typed-species framework will have heterogeneous transferability — useful to know before committing to the integration refactor.

B18-B24 ran the architectural levers on harder tasks (KMNIST, EMNIST balanced, mixed datasets) to escape MNIST saturation. The picture changed substantially:

So of five findings tested across multiple tasks, two are task-general (patch-matchers themselves, multilayer-hurts-at-fixed-budget) and three are task-specific to varying degrees. The most striking is the locality sign flip — what looked like a robust architectural property of patches (B13/B16/B17) turned out to be a property of the data’s spatial correlation structure that varies dramatically across superficially similar tasks.

For the eventual NEAT integration: the genome should support evolving patch geometry, placement strategy, and contiguity per task, not lock in MNIST-derived defaults. KMNIST and EMNIST are now part of the standard task battery.

B25-B35 produced a major correction and the cleanest mechanistic story so far.

B25 reverses B15. The earlier “multilayer hurts” conclusion was a training-budget artifact. With 20 epochs + LR decay 0.05→0.005 on KMNIST, multilayer goes from −1.85pp at fixed budget to +2.78pp *** with proper schedule. B32 confirmed the reversal isn’t an MNIST phenomenon (depth is null there because MNIST saturates near 98% for this patch capacity). B34/B35 found depth still hurts on EMNIST even with proper schedule and wide hidden layers (M=128, M=256), so the simple “headroom” theory fails — depth’s value is task-conditional in a way that depends on the specific structure of class-discriminative features.

B31 cracks the locality puzzle. Compute the per-pixel class-discriminability map (F-like ratio of between-class to within-class variance), then take its spatial autocorrelation at the patch scale (d=5). The values rank the four datasets exactly in the order of the locality effect: MNIST 0.32 → EMNIST 0.37 (spatial wins), Fashion 0.13 (null), KMNIST −0.07 (spatial loses). KMNIST’s class-discriminative information is not spatially clustered at the patch scale — fixed-position 5×5 patches can’t reliably catch concentrated discriminative content; random-index “patches” sampling 25 pixels from anywhere distribute their sampling more effectively. The locality direction is a measurable data property — discoverable without training.

The rectangular wide-preference holds on 3 of 4 tasks (B29/B33). MNIST +0.51, Fashion +0.93, EMNIST +0.98 (all ***). KMNIST: −0.40 *. Cursive Japanese characters apparently have a different dominant feature orientation than printed-character or texture data.

Multi-scale is fully MNIST-specific (B30 confirms B21): doesn’t replicate on KMNIST or EMNIST. The B14 finding was a lucky MNIST-specific quirk.

Updated transferability tally: of 5 architectural findings tested across multiple datasets, only the bare patch primitive is fully task-general. Every detail (geometry, placement, depth, training schedule, mixing) is task-conditional. KMNIST is the most frequent outlier — it inverts locality, inverts rectangular preference, and is the only task where multilayer clearly helps with proper schedule. The MNIST-derived Group B story doesn’t transfer; the genome design for typed-species NEAT integration needs to evolve all of these axes per task.

B8 adds a parameter-efficiency picture: smaller patches at high count win at low parameter budgets (coverage beats receptive field when params are scarce), 3×3 has a hard ceiling around 96.5%, and all sizes converge at 96.5%-98% by N=640. Patch size is an axis to evolve, not a hyperparameter to fix.