Group B: Typed Neuronal Species

A separate research stream from the main NEAT/neuroevolution work. Started 2026-05-07 after the main stream plateaued at 99.73% MNIST on a [128, 64] architecture and we ran out of obvious architectural levers.

The hypothesis

Real brains don’t have identical neurons. ReLU/Sigmoid/Tanh nodes with scalar I/O are a computational caricature, not a biological reality. What if we let evolution discover typed neuronal species — nodes with non-scalar I/O signatures and specialized computational primitives, not just different scalar activations?

The first species under investigation is the patch matcher: a node that takes a vector of pixels and a vector of weights and emits a scalar. The minimal version is a learnable convolutional filter as a single node.

The strategy

Prove the inductive bias pays before building evolutionary machinery for typed species. Run stripped-down hand-coded experiments in src/bin/ that share nothing with the NEAT genome/phenotype code. If a result is strong enough to justify the integration cost, the relevant primitive can be lifted into the main genome representation later.

The pages

Group B Journal — chronological observations and per-experiment narrative for the typed-species stream
Group B Experiments — structured experiment records (B1, B2, …)

Headline so far

Experiment	Setup	Best test acc
B1	Raw dot-product prototype matcher	64.99%
B2	Cosine prototype matcher (argmax)	83.53%
B3	Cosine features → trained linear	84.85%
B4	32 random 5×5 patches, frozen	67.56%
B5	Prototype-slice patches, frozen, sweep N	93.67%
B6	Random patches, frozen, sweep count	94.01%
B7	Trained patches, sweep count	97.27%
B8	Trained patches, sweep size × count	98.04%
B9	Rectangular patches, single seed	(suggestive, retracted)
B9-stats	Same with 5-seed paired stats	97.38% (5×7 mean)
B10	Rectangular patches on Fashion-MNIST	86.53% (4×6 mean)
B11	Rectangular patches on rotated MNIST	sign of preference flipped (✓)
B12	Extreme aspect ratios (1:3 → 1:21)	non-monotonic (1:9 null surprise)
B13	Random-index “patches” on MNIST	spatial wins (+0.61pp ***)
B14	Multi-scale (3/5/7) vs single 5×5	mix wins at low N only (+0.53pp ***)
B15	Multi-layer (hidden ReLU)	depth hurts at fixed budget (−1.7pp)
B16	A3 cross-task on Fashion-MNIST	locality is MNIST-specific (ns)
B17	A3 across patch sizes 3/5/7 (MNIST)	locality robust at every size (***)
B18	Task difficulty calibration (KMNIST/EMNIST/mixes)	KMNIST is best workhorse
B19	Locality on KMNIST	sign FLIP — indexed wins (***)
B20	Locality on EMNIST balanced	spatial wins like MNIST (***)
B21	Multi-scale on KMNIST	doesn’t replicate (all ns)
B22	Locality on MNIST+KMNIST mix	cancels to null (ns)
B23	Locality across sizes on KMNIST	flip robust at every size (***)
B24	Multilayer on KMNIST	hurt at fixed budget (later reversed)
B25	Multilayer KMNIST + LR-decay schedule	+2.78pp * — B15 conclusion REVERSED**
B27	Pixel-correlation probe	null result; simple correlation doesn’t predict locality
B28	Scaling sweep on KMNIST/EMNIST	KMNIST 72→92%, EMNIST 52→80%
B29	Rectangular on KMNIST	3×9/9×3 sign flips on KMNIST (*)
B30	Multi-scale on EMNIST	also doesn’t replicate (all ns)
B31	Per-pixel discriminability + autocorr d=5	predicts locality direction perfectly
B32	Multilayer MNIST + schedule	null on saturated MNIST
B33	Rectangular on EMNIST	follows MNIST/Fashion +0.98pp ***
B34	Multilayer EMNIST + schedule	hurts even with schedule (−1.11pp)
B35	Wider multilayer EMNIST (M=128, 256)	bottleneck hypothesis disproved

The original Group B hypothesis — patch matchers as a learnable typed species — is supported. SGD-trained patches feeding a linear classifier reach 98.04% MNIST with 640 7×7 patches, approaching the main stream’s [128] dense-hidden result (98.7%) but with locality as the inductive bias rather than dense connectivity.

Two methodological inflections worth noting:

B5’s apparent “meaningful content matters” win was partially an illusion — at sufficient feature count (640), random patches catch up to prototype-slice patches. Without B6 to control for capacity we’d have walked away with the wrong reading.

B9’s single-seed “5×7 N=320 = 97.66%” headline was a lucky-seed artifact. The 5-seed mean is 97.38%, statistically tied with 6×6 (97.34%). B9-stats stood up reusable stats infrastructure (paired-t, Cohen’s d_z) and corrected several B9 conclusions. Going forward, any quantitative claim about a sub-0.3pp difference in this codebase needs multi-seed paired stats.

The rectangular-patch arc: at extreme aspect ratios (1:3) on MNIST, wide patches significantly outperform tall patches (3×9 vs 9×3: Δ=+0.51pp, t=4.22, d_z=1.89, ***). The effect replicates and strengthens on Fashion-MNIST (Δ=+0.93pp, ***), refuting the original “digit-stroke geometry” guess and pointing at a more general principle: kernels should be perpendicular to the dominant feature orientation. B11 confirmed this mechanistically by rotating MNIST 90° CW — the 3×9 vs 9×3 sign cleanly flipped to −0.40pp *. The pattern moves with the data, ruling out architectural bias or placement-geometry artifact.

B12 added a surprise: the wide-tall gap is not monotonic with aspect ratio. There’s a null at 1:9 (1×9 vs 9×1: Δ=+0.01pp, ns) sandwiched between large effects at 1:3 and 1:15+. The mechanism turns out to be at least two phenomena: thick rectangular patches (≥3 px perpendicular) win wide because they cut across vertical features; long single-pixel-thick strips (1×15, 1×21) win wide because horizontal cross-sections are more discriminative for digit width than vertical cross-sections are for digit height. 1×9 has neither property — zero perpendicular extent and too short to be a discriminative cross-section sampler — so no preference appears.

B13-B17 explored three architectural levers in parallel — multi-scale (A1), multi-layer (A2), and locality / random-index (A3) — and produced three different shapes of result:

Multi-scale (B14): conditional positive. Mixing 3/5/7 patch sizes wins at low N (+0.53pp *** at N=240), but the advantage vanishes at high N (N=480, ns). Receptive-field diversity helps when no single size has enough patches to fully exploit it; doesn’t matter when capacity is abundant.
Multi-layer (B15): conditional negative. Adding a hidden ReLU layer hurts uniformly across M ∈ {32, 64, 128} (−1.5 to −1.7pp) at our fixed 10-epoch / fixed-LR training budget. Opposite of what depth did in the main NEAT stream — likely because main stream runs 1.8M steps with LR decay vs our 500K fixed-LR. Depth here would need retuned hyperparameters before being declared dead.
Locality (B13/B16/B17): task-specific positive. On MNIST, spatial 5×5 trained patches beat 25-random-pixel-index “patches” by +0.61pp *** (B13), with the effect robust across patch sizes (B17: +0.6 to +0.9pp *** at 3×3, 5×5, 7×7). On Fashion-MNIST the advantage vanishes (B16: −0.12pp ns). Spatial contiguity matters a lot on digit-stroke data where adjacent pixels carry highly correlated information; less so on Fashion where pixel correlations are weaker.

Group B has now produced both transferability profiles: rectangular-patch wide preference is task-general (replicated on Fashion, flipped with rotation); spatial-locality advantage is MNIST-specific (absent on Fashion). For the eventual NEAT integration, that means architectural choices in the typed-species framework will have heterogeneous transferability — useful to know before committing to the integration refactor.

B18-B24 ran the architectural levers on harder tasks (KMNIST, EMNIST balanced, mixed datasets) to escape MNIST saturation. The picture changed substantially:

Locality finding (A3) flips sign on KMNIST. Spatial wins on printed-character data (MNIST +0.6-0.9pp ***, EMNIST +1.03pp ***); spatial loses on cursive Japanese characters (KMNIST −1.16 to −1.38pp *** at every patch size). Texture data (Fashion) is null. Mixed positive+negative tasks cancel to null (MNIST+KMNIST = +0.09pp ns).
Multi-scale finding (A1) was MNIST-specific. On KMNIST, mixing 3/5/7 patch sizes vs single-5×5 is non-significant at every patch count tested. Whatever benefit receptive-field diversity provided on MNIST didn’t generalize.
Multilayer hurt (A2) replicates on KMNIST. With 7pp more headroom than MNIST, adding a hidden ReLU layer still hurt by 1.15-1.85pp. The under-training mechanism from the original B15 reading was right; this is now a robust negative.

So of five findings tested across multiple tasks, two are task-general (patch-matchers themselves, multilayer-hurts-at-fixed-budget) and three are task-specific to varying degrees. The most striking is the locality sign flip — what looked like a robust architectural property of patches (B13/B16/B17) turned out to be a property of the data’s spatial correlation structure that varies dramatically across superficially similar tasks.

For the eventual NEAT integration: the genome should support evolving patch geometry, placement strategy, and contiguity per task, not lock in MNIST-derived defaults. KMNIST and EMNIST are now part of the standard task battery.

B25-B35 produced a major correction and the cleanest mechanistic story so far.

B25 reverses B15. The earlier “multilayer hurts” conclusion was a training-budget artifact. With 20 epochs + LR decay 0.05→0.005 on KMNIST, multilayer goes from −1.85pp at fixed budget to +2.78pp *** with proper schedule. B32 confirmed the reversal isn’t an MNIST phenomenon (depth is null there because MNIST saturates near 98% for this patch capacity). B34/B35 found depth still hurts on EMNIST even with proper schedule and wide hidden layers (M=128, M=256), so the simple “headroom” theory fails — depth’s value is task-conditional in a way that depends on the specific structure of class-discriminative features.

B31 cracks the locality puzzle. Compute the per-pixel class-discriminability map (F-like ratio of between-class to within-class variance), then take its spatial autocorrelation at the patch scale (d=5). The values rank the four datasets exactly in the order of the locality effect: MNIST 0.32 → EMNIST 0.37 (spatial wins), Fashion 0.13 (null), KMNIST −0.07 (spatial loses). KMNIST’s class-discriminative information is not spatially clustered at the patch scale — fixed-position 5×5 patches can’t reliably catch concentrated discriminative content; random-index “patches” sampling 25 pixels from anywhere distribute their sampling more effectively. The locality direction is a measurable data property — discoverable without training.

The rectangular wide-preference holds on 3 of 4 tasks (B29/B33). MNIST +0.51, Fashion +0.93, EMNIST +0.98 (all ***). KMNIST: −0.40 *. Cursive Japanese characters apparently have a different dominant feature orientation than printed-character or texture data.

Multi-scale is fully MNIST-specific (B30 confirms B21): doesn’t replicate on KMNIST or EMNIST. The B14 finding was a lucky MNIST-specific quirk.

Updated transferability tally: of 5 architectural findings tested across multiple datasets, only the bare patch primitive is fully task-general. Every detail (geometry, placement, depth, training schedule, mixing) is task-conditional. KMNIST is the most frequent outlier — it inverts locality, inverts rectangular preference, and is the only task where multilayer clearly helps with proper schedule. The MNIST-derived Group B story doesn’t transfer; the genome design for typed-species NEAT integration needs to evolve all of these axes per task.

B8 adds a parameter-efficiency picture: smaller patches at high count win at low parameter budgets (coverage beats receptive field when params are scarce), 3×3 has a hard ceiling around 96.5%, and all sizes converge at 96.5%-98% by N=640. Patch size is an axis to evolve, not a hyperparameter to fix.