Raw, unedited Group C journal — typed-species NEAT integration stream. Reproduced exactly as produced.

2026-05-13: Stream opens

Group B closed at commit e70bffd with a clear directive for integration: the genome should be able to evolve patch geometry, placement strategy, depth, and training schedule per-task. Group C exists to do that work in the main NEAT system.

Plan: baselines first, then design, then implement, then evolve.

Survey of integration surfaces

Mapped the current code:

The integration is mostly mechanical once the data layout is settled. The risk is in the phenotype: the forward inner loop has been tuned hard (Phase 1-3 perf work) and a naive branch on NodeKind per inner iteration would tank the cache story.

Working answer: store patch-matcher fan-in in a parallel structure to HotConnection/conn_ranges, indexed by node topo-position. Forward pass adds a “patch contribution” pass before the connection sum, only iterating over the (small) set of patch nodes. Backward, symmetrically.

Full design written up in design.md.

NodeGene: additive change landed

Step 1 of the integration plan (add an Option<PatchParams> slot on NodeGene, default None at every construction site) ships before anything else, since it’s strictly additive and lets later steps land in smaller commits. With patch: None everywhere, the existing system is byte-identical. Builds clean.

Experiment C1 (in progress): 4-way joint MLP baselines

While the integration work was being scoped, kicked off baselines on the full 4-way joint task (77 classes: MNIST + Fashion + KMNIST + EMNIST-balanced). Three architectures × 3 seeds, 3 epochs, lr 0.01→0.001 linear, online SGD. Train fraction 5/6, test fraction 1/6 of each dataset.

[64] results (3 seeds): overall ~75.8%. Per-dataset: MNIST ~90%, KMNIST ~80%, Fashion ~77%, EMNIST ~65%. Difficulty ordering matches Group B’s per-task headroom intuition — MNIST easiest, EMNIST hardest.

Experiment C2: integrated patch-matcher works end-to-end

While the baselines were still running, landed the mechanical core of the integration in three additive steps:

  1. Added Option<PatchParams> to NodeGene. Defaulted None at every construction site. No behavior change.
  2. Extended Network with PatchTopo (parallel to HotConnection/conn_ranges), is_patch_node/patch_slot lookups, a patch branch in forward, a patch branch in backward, and patch-weight write-back in write_weights_to_genome. With zero patch nodes in any genome the new code paths are inert — byte-identical behavior.
  3. Added Genome::new_with_patches(input, output, img_width, n_patches, patch_side, PatchInit::{Spatial,Random}, …) plus InnovationTracker::allocate_patch_node_id.

Verifier binary (group_c_patch_verify, C2): construct a Group-B-shaped genome with 320 spatial 5×5 patches feeding a linear 10-class classifier on MNIST, train through the integrated forward/backward for 3 epochs at lr=0.05, single seed.

Result: 96.28% / 96.49% / 96.64% MNIST test accuracy across the three epochs. Matches Group B’s proto_patches_random_idx.rs spatial result (~95-96%) and confirms:

Mutations and crossover for patch nodes remain to do. The architectural integration is proven.

Mutations and crossover for patches

Added two new mutation operators, both default-off in MutationConfig so existing experiments are unaffected:

Crossover (crossover.rs) extended: when both parents have a node with the same id and both have patch.is_some(), the child takes the patch params (indices, weights, bias) from either parent 50/50. Same convention NEAT uses for matching Connection genes.

Experiment C1 (continued): [128, 64] reveals init quality issue

The [128, 64] arm started at 7% (epoch 1) and recovered slowly: 40% (epoch 2), 59% (epoch 3) for the first seed. Compare [128] which reached 78% by epoch 3.

The cause is initialization. Genome::new_seeded uses U(-1, 1) for connection weights, which was calibrated for the existing system’s sparse start (hidden_input_fraction = 0.10). The baselines binary uses dense layers (hidden_input_fraction = 1.0), so the first hidden layer’s pre-activation has σ ≈ √(784/3 × 0.13) ≈ 5.8 — much larger than what He init would give. With two hidden layers and U(-1, 1) inter-layer weights, output logits have σ ≈ 74 at init, so the softmax saturates and gradient flow through the first epoch is very small.

This is a baseline characterization, not a Group C blocker — the existing system runs fine at hidden_input_fraction = 0.10. Recording it as a data point: the dense-MLP “baseline” is hostage to the existing init, and the floor for two-hidden-layer dense MLPs on this task is below what either depth or width naively predicts. The right follow-up baseline is either (a) He init in new_seeded, (b) sparse-start MLPs at 0.10 like the NEAT system, or (c) a lower LR for deeper nets. For the Group C comparison, the sparse-start NEAT system is the right baseline since that’s the apples-to-apples comparison, so this is mostly a note for future “what’s the textbook MLP floor?” experiments.

Next session

Mutations and crossover are integrated but untested in evolution. The next experiment (C3) should be:

  1. Construct a patch-seeded population (Genome::new_with_patches, say 64 spatial patches per individual, no downstream hidden layer).
  2. Set add_patch_prob = 0.05, mutate_patch_indices_prob = 0.5, per_patch_index_swap_prob = 0.02.
  3. Train on the 4-way joint task, single niche, no ecological speciation yet.
  4. Compare to the scalar NEAT baseline and the dense MLP baseline.

Open questions for next session:

Experiment C3: first patch-evolved population (4-way joint)

Built bin/group_c_evolve.rs: population 50, seeded 64 patches/individual (half spatial / half random-index), 500K steps with periodic 10K-step evolution windows, 25/25/25/25 four-way joint stream.

Result: 75.9% overall test, with 5,005 connections.

Method Test Conn Conn ratio
C1 [64] MLP 0.758 55,245
C1 [128] MLP 0.774 110,413
C3 patches 0.759 5,005 0.09×

The patch primitive matches the [64] dense MLP at 11× fewer parameters and trails [128] by 1.5pp at 22× fewer parameters. The Group B compression result generalizes from MNIST to the 4-way joint task.

add_patch is hostile to its own additions

best_patches stayed pinned at 64 for the entire 50-generation run, despite add_patch_prob=0.05 firing ~1 add per individual per generation. The mechanism: new patches enter with random weights and a N(0, 0.1) head weight, so they immediately add noise to the output and look worse than mature patches. The 10K-step generation window is too short for a new patch to train up enough to outweigh that cost, so it gets culled before it matures.

NEAT-classic solves the equivalent problem (new structural mutations look worse short-term) by behavior-preserving insertion — add_node splits an edge with weights 1.0 + original so the network’s function is unchanged at insertion time. The equivalent for add_patch_matcher is head_weight = 0.0: the new patch contributes 0 to its target’s pre-activation, fitness doesn’t drop, and SGD can then train the head weight upward if and only if the patch’s feature is useful.

This is the obvious C4 fix and the path forward for actually testing patch count evolution. C3 has demonstrated that patch index evolution works (fitness climbed 0.66 → 0.81 over 50 generations); patch count is the next axis.

Where Group C stands

C4-C7: patch count is not evolvable through any direct mechanism

C4 (head_weight=0), C5a (add_patch_prob=0.20), C7 (add_patch_burst of 8) all failed to grow patches at the top of the population. Top individuals stay at the initial seed size in every case. Average patches drift up modestly (64 → 65-77 depending on add rate) but the macro-mutants don’t reach top fitness because their new patches need training time and selection happens before they catch up.

C6 (per_conn_remove_prob=0.005) prunes ~4% of connections and loses ~0.5pp accuracy — pruning is mechanistically real but partially undone by NEAT crossover’s enabled-state inheritance.

Conclusion of the count thread: initial seed size dominates everything. The “evolve architecture through fitness selection” path is structurally blocked by the timing mismatch between cohort training and selection.

C5: capacity scaling cleanly log-linear

Sweep N_SEED_PATCHES = {64, 128, 256, 512} with otherwise-identical config. Test accuracy 75.9 → 82.4 → 85.7 → 87.1. Each doubling halves the gain. Patches beat the [128] MLP baseline (77.4%) at 256 patches with 6× fewer connections, and at 512 patches with 2.8× fewer connections and +10pp acc.

The 4-way joint task’s asymptotic ceiling at this LR/budget is ~88% with patches alone. To break that ceiling, the obvious next steps are: depth (Group B B25 finding: KMNIST gains a lot from a hidden layer with proper schedule), and per-task niching.

C8: ecological speciation works — Group B confirmed inside the evolved genome

Five niches: pure MNIST, pure Fashion, pure KMNIST, pure EMNIST, plus 25/25/25/25 mixed. Each starts with the same 128-patch seed (50% spatial / 50% random-index) and evolves independently for 300K steps.

Per-task best individual accuracy (test set):

Niche Own task vs C5b joint
mnist 96.8% +1.8pp
fashion 86.9% +4.1pp
kmnist 90.2% +3.8pp
emnist 78.3% +4.8pp

Pure niches beat joint training by 2-5pp on their own task. Zero cross-task transfer (consistent with main-stream Experiment 16).

The headline geometry result. Each niche evolves visibly different patch distributions:

Niche edge_frac (final) row_std col_std Reading
mnist 0.700 6.53 7.03 Spatial patches dominant — selection purged random-index
kmnist 1.000 8.08 8.14 Random-index dominant — selection purged spatial
fashion 1.000 8.12 8.11 Same as KMNIST
emnist 0.857 7.17 6.78 Mostly spatial, with row > col anisotropy
mixed 1.000 7.92 8.10 Random-index (averaged across tasks, pulled by majority)

Initial conditions (50/50 spatial vs random) put edge_frac ≈ 0.83. MNIST drifted down (spatial preference). KMNIST drifted to ≈ 1.0 (random-index preference). This independently rediscovers Group B’s per-task locality directions: MNIST/EMNIST keep spatial, KMNIST inverts, Fashion goes distributed.

The KMNIST inversion that took Group B 35 experiments to map is reproduced by selection in 300K training steps, without anyone telling the system what KMNIST is. That’s the Group C charter satisfied: lifting the patch-matcher primitive into the genome + ecological speciation is enough for evolution to discover task-conditional architectures.

EMNIST’s row_std > col_std anisotropy is interesting — possibly reflects the vertical-stroke bias of printed letters/digits. Group B’s B33 found EMNIST follows the rectangular wide-preference (+0.98 ***); both findings consistent.

Group C, current shape of the conclusion

  1. Integration works. Patch-matcher node compiles end-to-end through genome → phenotype → forward → backward, parity with Group B fixed-architecture results (C2: 96.6% MNIST).
  2. Compression is dramatic. Patches at 11× fewer parameters than the dense MLP baseline match its accuracy; capacity scales log-linearly with halving returns.
  3. Direct patch-count evolution is structurally blocked by training-vs-selection timing. Initial seed is the right knob, not mutation rate.
  4. Ecological speciation does the discovery work. Different niches → different evolved geometries, reproducing Group B’s per-task findings.

The remaining live questions are about what each patch is computing (per-patch visualization), whether patch-count evolution becomes possible inside niches (where competitors have similar topology), and whether the integration’s compression advantage holds with depth (multi-layer patch genomes).

Phase D: introspection, in-niche growth, depth

D1 (per-patch introspection) dumped PGM weight-maps and pixel-coverage heatmaps for the top individual of each niche. The coverage numbers sharpen C8’s edge_frac result: MNIST has 37% of patch-mass in the central 14×14 region (vs 25% uniform), fashion/kmnist have ~24% (at uniform), emnist has 38% with a clear top-left and horizontal-tight anisotropy (col_std=6.48 < row_std=7.19) consistent with Group B B33’s rectangular wide-preference. The qualitative “stroke detector vs scattered” intuition becomes quantifiable.

D2 (add_patch_prob=0.10 + add_patch_burst_prob=0.05 inside niches) is a clean negative. Average patches drifted from 128 to 128.3-129.0 across niches over 30 generations; top individuals stayed at 128-129. EMNIST had one rank-2 individual at 132 patches — better than C7’s outcome (where macro-mutants got dragged below the seed-count individuals), but still not the best in its niche. Niching softens the “macro-mutants get culled” problem but doesn’t unblock count evolution. The deep blocker remains training-vs-selection timing.

D2 also surfaced a real bug. The original runs (v1-v3) panicked at the KMNIST niche during phenotype compilation. The cause: NEAT crossover can combine matching connection genes whose enabled/disabled patterns are individually acyclic but together close a cycle in the enabled subgraph. Kahn’s topo sort then leaves the cycle-trapped nodes out of topo_order, and a connection referencing one of them indexes into the incomplete node_index and trips a no entry found for key panic. Fixed by Genome::sanitize(), which iteratively disables one inter-cycle edge per pass. Counter shows 1 cycle break across the entire 1.5M-step D2 v5 run — rare but catastrophic.

Three other follow-up invariant fixes landed in the same pass:

D3 (depth + niching) is the cleanest Group B replication so far. Adding a 32-node ReLU hidden layer between patches and outputs gives:

Three out of four per-task signs match Group B’s depth findings exactly. The mixed-niche regression is the ecological-speciation argument in concrete form: one hidden layer can help KMNIST or avoid hurting EMNIST but not both. Niching captures task-conditional architectural value that single-network training cannot.

Bonus: D3’s depth=32 network has fewer connections than D1’s (6,669 vs 9,933), because the 32-wide hidden bottleneck is narrower than 77 outputs. So KMNIST gets +3.3pp accuracy at 33% fewer connections — a Pareto win on that task.

Where Phase D leaves Group C

Two Group B cross-task findings now reproduce inside the integrated, niched system as emergent niche-level behaviors:

  1. Per-task locality direction (D1/C8): MNIST→spatial, KMNIST→distributed, EMNIST→spatial-with-anisotropy, Fashion→distributed.
  2. Per-task depth direction (D3): MNIST null, KMNIST +, EMNIST −.

Two structural blockers persist:

  1. Patch-count evolution at the current gen budget (C3-C7, D2): blocked by training-vs-selection timing regardless of insertion rate, magnitude, niching, or behavior preservation.
  2. Rare crossover-induced cycles (one per ~1.5M training steps with patch-add enabled): handled by sanitize. Identifying the precise gene combinations that close cycles is open work.

The Group C charter is now firmly satisfied: lifting the typed-species patch primitive into the genome and combining it with ecological speciation is enough for evolution to discover task-conditional architectures (both placement and depth), reproducing Group B’s manually-mapped findings.