Group G: Ecological Speciation and the Neural Ecosystem
The sixth research stream, opened 2026-05-18. Direct response to a question that hovered across all five prior streams: does evolutionary pressure from differing data distributions actually create speciation per se, or are observed inter-niche differences just isolated-population drift? And if speciation is real — can we route across the resulting “neural ecosystem” to outperform any single network?
The pages
- Group G Journal — chronological narrative.
- Group G Experiments — structured records (G1, G3–G8, G9 trichotomy).
Headline results
Yes, mix-pressure creates speciation per se. The variance ratio between varied-mix and isolated-drift conditions on MNIST accuracy is 199×; on Fashion accuracy, 94×. The five varied-mix niches are genuine ecological species — networkae mnistia (100/0) and networkae fashionmnistia (0/100) at the extremes, with smoothly interpolated intermediate forms.
The static ecosystem works — collectively the specialists beat any individual. Naive softmax averaging across 5 specialists gives 88.42% joint accuracy vs the strongest single specialist’s 85.76% (+2.66pp) and vs an oracle upper bound of 93.55% (+7.79pp headroom).
Under temporal regime shift, the ecosystem adapts in two distinct ways. G4 shows existing species can adapt via dead-time training on a failure buffer (becomes generalist). G4b shows that with frozen species, a new species emerges in response to the novel task (true speciation event) — spawn fires automatically when rolling accuracy collapses. G5 adds knowledge-aware self-abstention and demonstrates multi-speciation under graded environmental pressure: two new species emerged at the two regime shifts (KMNIST introduction, then KMNIST-dominant phase).
G1: speciation null test
The clean control experiment that prior streams had only approached indirectly. Two conditions, same 30-individual 64-patch seed population, same evolution config, same training budget:
- A (varied): 5 niches at MNIST/Fashion ratios [100/0, 75/25, 50/50, 25/75, 0/100].
- B (uniform): 5 niches all at 50/50, otherwise identical.
After 150K steps per niche, evaluate each best individual on held-out MNIST and Fashion test sets.
Varied condition — extreme specialization
| niche | MNIST | Fashion | edge_frac |
|---|---|---|---|
| 100/0 | 93.07% | 0.00% | 0.707 |
| 75/25 | 92.26% | 80.33% | 0.704 |
| 50/50 | 91.37% | 82.09% | 0.665 |
| 25/75 | 90.01% | 82.94% | 0.701 |
| 0/100 | 0.00% | 83.85% | 1.000 |
The pure-task niches physically cannot classify the other task — they’ve never received gradient signal on the unseen output classes. Mixed niches sit on the trade-off curve.
Uniform control — near-zero divergence
All 5 niches landed at 91.4-91.9% MNIST and 81.0-82.0% Fashion. Connection count varies in a ±20-range. They are functionally identical. Isolated training on identical data does not produce meaningful divergence.
Variance ratios
| metric | σ_varied / σ_uniform |
|---|---|
| MNIST accuracy | 199× |
| Fashion accuracy | 94× |
| edge_frac | 2.66× |
| avg connections | 1.69× |
| avg patches | 1.27× |
| row_std | 1.52× |
Functional speciation is the dominant signal: orders of magnitude beyond what isolated-population drift produces. Architectural divergence (connections, patches, geometry) is real but modest — consistent with prior findings that selection drives divergence, not mutation.
G3: the neural ecosystem
Take the 5 G1A specialists. Build a joint test set (10K MNIST + 10K Fashion = 20K examples with labels in [0..20)). Evaluate routing strategies.
Single-specialist baselines
| specialist | joint | MNIST | Fashion |
|---|---|---|---|
| 100/0 | 46.74% | 93.49% | 0.00% |
| 75/25 | 85.76% | 92.94% | 78.59% |
| 50/50 | 85.76% | 89.98% | 81.55% |
| 25/75 | 85.73% | 88.68% | 82.79% |
| 0/100 | 41.74% | 0.00% | 83.48% |
Best single specialist on the joint task: 85.76% (mixed niches).
Routing strategies
| strategy | joint | MNIST | Fashion |
|---|---|---|---|
| oracle (upper bound) | 93.55% | 96.85% | 90.24% |
| confidence (max softmax) | 76.41% | 90.21% | 62.60% |
| entropy (min entropy) | 76.68% | 90.41% | 62.96% |
| naive ensemble (avg softmax) | 88.42% | 93.48% | 83.36% |
| masked ensemble (class-aware) | 88.42% | 93.48% | 83.36% |
Three findings
-
Naive ensemble: +2.66pp over the best single specialist. Collective beats individual. The neural-ecosystem framing pays off in practice.
-
Oracle ceiling: +7.79pp above the best single specialist. A 5pp gap remains between naive ensemble and oracle — better routing is the unsolved problem.
-
Confidence routing fails by 9pp below the best single specialist. The pure-task 100/0 specialist was picked for 2973 of the 10000 Fashion images — confidently misclassifying them as digits. Pure-task specialists are overconfident on out-of-distribution inputs because they’ve never been told to abstain.
Why naive ensemble works despite confidence failing
Averaging softmaxes is robust against minority overconfidence. For a Fashion shoe shown to the ecosystem:
- 100/0 confidently says “digit 1: 0.92” — wrong, but loud
- 0/100 says “Coat: 0.83” — right, also loud
- Mixed specialists distribute across both halves
Averaged: the wrong-class “digit 1” gets (0.92+0+0+0+0)/5 = 0.184. The right-class “Coat” gets contributions from all five and ends around 0.27. Argmax picks Coat.
Pick-the-most-confident routing picks 100/0 and reports digit 1. Averaging implicitly weighs consensus, not the loudest single vote.
G4: ecological routing with dead-time adaptation
User-redirected from G1/G3 to a richer “ecosystem under regime shift” question: what happens when novel data arrives that no specialist was trained for? The mechanism design: per-population liveness state with exponential-backoff death on failure, dead-time training on a failure buffer, automatic spawning of new species when ensemble fails sustainedly.
G4 pre-trains two populations on MNIST and Fashion, then runs a 3-phase online stream:
- Phase A (30K): MNIST/Fashion 50/50 — steady state
- Phase B (30K): introduce KMNIST 1/3 each task
- Phase C (200K): KMNIST-heavy 60% K + 20/20 M/F
Result: no new species spawned, but both pre-trained populations adapted via failure-buffer training during their dead-time intervals. By Phase C end:
| species | MNIST acc | Fashion acc | KMNIST acc |
|---|---|---|---|
| mnist (pretrained) | 83.4% | 71.8% | 71.7% |
| fashion (pretrained) | 79.2% | 72.6% | 72.5% |
Both became generalists. The anteater learned to eat capuchin food. Rolling ensemble accuracy 78-82%. This is adaptation, not speciation — the framework provides online continual learning via implicit replay (failure buffer + dead-time training), but doesn’t demonstrate new-species emergence because at least one species was usually correct.
G4b v2: frozen specialists force speciation
To isolate the speciation question, G4b freezes the pre-trained populations — no online weight updates allowed. The only way to handle KMNIST is for a new species to emerge from the ecosystem’s failure buffer.
Spawn fired at step 20,103 — only 103 steps into Phase B (KMNIST introduction). The rolling-100-example ensemble accuracy dropped below 55% for 30 consecutive steps as both frozen specialists failed every KMNIST example. species2 (parent=mnist) was spawned with mnist’s genome population, inheriting the failure buffer, trained silently for 2000 steps, then joined ensemble voting.
End of Phase C:
| species | MNIST | Fashion | KMNIST | conn |
|---|---|---|---|---|
| mnist (frozen) | 92.4% | 0.0% | 0.0% | 1999 |
| fashion (frozen) | 0.0% | 83.6% | 0.0% | 1953 |
| species2 (new) | 65.0% | 57.2% | 77.2% | 2599 |
But ensemble Phase C KMNIST accuracy was only 66% — 11pp below species2’s individual KMNIST competence. The G3 confidence-wrong-vote problem in temporal form: frozen specialists’ overconfident wrong votes on MNIST/Fashion classes dilute species2’s correct KMNIST predictions in the averaging.
G5 v2: knowledge-aware self-abstention + multi-speciation
G5 adds per-species “class diet” tracking. Each frozen species suppresses its softmax outputs on classes outside its training diet (×0.1, then renormalize). New species use raw softmax (their diet is still being built).
Two new species spawned across the run in response to graded environmental pressure:
- species2 at step 20,087 — Phase B KMNIST introduction
- species3 at step 56,906 — Phase B → C transition (KMNIST becomes dominant)
| condition | Phase A rolling | Phase C KMNIST | n_species spawned |
|---|---|---|---|
| G4 (adaptive) | 80% | 72% | 0 (both pops became generalists) |
| G4b v2 (frozen + spawn) | 70% | 66% | 1 |
| G5 v2 (frozen + diet + spawn) | 88% | 79.5% | 2 |
Final per-species accuracies (run completed at step 200K):
| species | parent | M acc | F acc | K acc |
|---|---|---|---|---|
| mnist (frozen) | — | 91.7% | 0% | 0% |
| fashion (frozen) | — | 0% | 83.7% | 0% |
| species2 | mnist | 62.4% | 57.2% | 72.8% |
| species3 | mnist | 62.7% | 56.2% | 70.9% |
Phase C ensemble: 88% rolling, M=89%, F=82%, K=79.5% — a +13.5pp improvement on KMNIST over G4b v2’s 66% plateau. The ecosystem is now compositional: multiple species can co-emerge in response to distinct stress events, sharing ancestral lineage (both spawned from MNIST parent) but training under different selection pressures and converging to similar KMNIST specializations through parallel evolution.
G4c: single-niche replay baseline (the fair comparison)
Does ecosystem partitioning actually buy anything over a monolithic network with the same failure-buffer replay? G4c runs a single niche of 60 individuals (matching G5 v2’s 2×30 specialists in total compute), pre-trained on 50/50 MNIST+Fashion, with the same 1000-example failure-buffer replay on the same 3-phase stream.
| condition | Phase C rolling | Phase C K |
|---|---|---|
| G4 (2 species, adapt) | 79% | 72% |
| G4b v2 (frozen + spawn) | 74% | 66% |
| G4c (single niche + replay) | 75% | 75% |
| G5 v2 (frozen + diet + multi-spawn) | 88% | 79.5% |
Single-niche replay matches the simpler ecosystem variants (G4 and G4b) at the same compute. Only G5 v2’s full mechanism (frozen specialists + diet-aware suppression + multi-spawn) beats the baseline by a meaningful margin (+13pp rolling, +5pp K). The ecosystem framework earns its keep with the full design — simpler partitionings are roughly equivalent to a single niche with replay.
G6: hybrid adapt + speciate (counter-intuitive result)
What if pre-trained species are allowed to adapt AND new species can spawn? The hybrid should get the best of both worlds. Instead, it underperforms G5 v2:
| condition | Phase C rolling | Phase C K | mnist specialty | fashion specialty |
|---|---|---|---|---|
| G5 v2 (frozen + spawn) | 88% | 79.5% | 91.7% (kept) | 83.7% (kept) |
| G6 (adapt + spawn) | 82% | 80% | 78% (lost) | 68% (lost) |
Letting pre-trained species adapt erodes their specialization. The mnist species’ MNIST accuracy dropped from 92% to 78% as it absorbed Fashion and KMNIST training. Net effect: −6pp rolling for a +0.5pp KMNIST gain. Specialization is precious; preserve it. Only one new species spawned in G6 (vs G5 v2’s two) because adapting species don’t fail sustainedly.
G7: cross-niche transfer is NEGATIVE
Can an evolved MNIST specialist’s architecture transfer as a useful prior to KMNIST? Group B established the two tasks have opposite locality preferences (MNIST → spatial, KMNIST → distributed). G7 tests transfer directly:
| step | warm-start (clone MNIST specialist) | fresh-init | delta |
|---|---|---|---|
| 10K | 0.720 | 0.739 | −0.020 |
| 50K | 0.801 | 0.817 | −0.016 |
| 100K | 0.822 | 0.833 | −0.011 |
Warm-start trails fresh-init by 1-2pp throughout. The MNIST specialist’s spatial-bias patches are actively wrong for KMNIST, and evolution has to fight uphill to undo the inductive bias. Architectural specialization is task-conditional, and a wrong specialization actively interferes with learning a new task. This is why G5 v2’s “spawn fresh new species” approach works better than G6’s “let existing species adapt” — the parent’s inductive bias is net-harmful for a sufficiently different new task.
G8: longer sequences with EMNIST — one species per novel task
Does the ecosystem keep speciating as more novel tasks arrive? Extended G5 v2’s 3-phase stream to 5 phases, adding EMNIST (filtered to labels 0-9) after KMNIST. Same mechanics.
Two spawn events fired, one per novel-task introduction:
- species2 at step 20,103 (Phase A→B, KMNIST appears)
- species3 at step 130,447 (Phase C→D, EMNIST appears)
Final per-species lifetime accuracies:
| species | parent | M | F | K | E |
|---|---|---|---|---|---|
| mnist (frozen) | — | 91.7% | 0% | 0% | 0% |
| fashion (frozen) | — | 0% | 83.9% | 0% | 0% |
| species2 (KMNIST intro) | mnist | 64% | 57% | 74.9% | 85% |
| species3 (EMNIST intro) | mnist | 63% | 57% | 61% | 85.5% |
Phase E ensemble: 87% rolling, M=85% F=83% K=73% E=92%.
Each new species specialized in the task that was novel when it spawned. species2 (Phase B) is a KMNIST specialist; species3 (Phase D) is an EMNIST specialist. The mechanism is self-regulating: spawn events fire only at novel-task introductions, so the ecosystem doesn’t accumulate species without bound.
This is the most direct experimental confirmation of the biological pattern: pre-existing species preserve their specializations forever, novel tasks trigger fresh speciation events, new species specialize in the task that triggered their emergence.
What Group G establishes (full battery)
After G1, G3, G4, G4b, G5, G4c, G6, G7, G8 — the user’s “neural ecosystem” hypothesis is now thoroughly characterized:
- G1: speciation is real (199× variance ratio vs same-data control).
- G3: static ecosystem beats single networks via naive averaging (+2.66pp; oracle ceiling +7.79pp).
- G4: existing species can adapt to novel tasks via dead-time training (no new species needed if existing ones can generalize).
- G4b: frozen species + spawn mechanism produces speciation; but ensemble averaging dilutes new species’ votes.
- G5 v2: diet-aware self-abstention + multi-spawn beats all variants and the single-niche baseline. Best design.
- G4c: single niche + replay matches G4/G4b but G5 v2 wins by +13pp — the framework earns its keep with the full mechanism.
- G6: adapt + speciate is worse than frozen + speciate by 6pp — preserving specialists matters more than letting them generalize.
- G7: cross-task warm-start is negatively transferable (−2pp) — evolved geometry is task-specific.
- G8: one new species per novel task introduction; ecosystem grows compositionally and self-regulates.
The design principle: preserve specialists, spawn-on-demand for novel tasks, knowledge-aware suppression on out-of-diet votes. This is the working “neural ecosystem” recipe.
Still open
- Multi-source warm-start: clone from multiple specialists for a more general prior (G7 follow-up).
- Lateral gene transfer: migration/crossover across species. Currently species don’t interact.
- Routing-time efficiency: each example currently requires N forward passes (one per alive species). A learned router could pick a subset.
G9 trichotomy: rederiving classical ecology from reward-and-training rules
After the G4–G8 sequence established speciation-under-regime-shift, the user asked a deeper question: what if the environment isn’t a phased sequence of novel tasks but a stationary heterogeneous mix — say 60% MNIST, 25% Fashion, 10% KMNIST, 5% EMNIST — and species have to survive on energy gained from solving puzzles against attempt costs and metabolic costs? Does carrying capacity emerge? Do niches partition by frequency? Do generalists invade or specialists dominate?
G9 implements ecological economics on top of the speciation framework:
- Attempt cost (paid every attempt) and metabolic cost (∝ connection count, paid per step regardless).
- Rarity-weighted reward (
1 / task_frequency) so the 5% EMNIST niche pays 20 energy per solve while the 60% MNIST niche pays 1.67. - Permanent death below an energy threshold.
- Niche-underservice spawning: a new species spawns when per-task ensemble accuracy collapses on some task, D+C hybrid parent (50% clone-richest / 50% fresh-init).
The three variants differ in one rule each:
| variant | training rule | reward rule |
|---|---|---|
| G9 baseline | full failure-buffer | split-the-kill (correct attempters share reward) |
| G9b | niche-bound (target task only) | split-the-kill |
| G9d | full failure-buffer | winner-take-all (most-confident correct gets all) |
Headline results
| variant | dynamic | survivors | extinctions | M | F | K | E | ensemble |
|---|---|---|---|---|---|---|---|---|
| G9 baseline | generalist invasion | 3 | MNIST | 80% | 84% | 65% | 75% | 82% |
| G9b | carrying capacity | 4 (all specialists) | 0 | 92% | 84% | 71% | 86% | 87% |
| G9d | runaway confidence | 3 | MNIST + species3 | 87% | 81% | 65% | 72% | 84% |
G9b: Galapagos finch isolation
Niche-bound training prevents diet expansion. Each spawned species trains exclusively on its target niche’s failures, so its diet never broadens, and it never attempts examples outside its niche. Result: four specialists, one per niche, zero competition, zero extinctions. Per-niche accuracy strictly higher than any other variant. This is the textbook carrying-capacity result. Each specialist’s reward stream sustains it without overlap; the Lotka-Volterra math works out:
| species | per-task attempts | per-task acc | final energy |
|---|---|---|---|
| MNIST (frozen) | M:240K only | 92% | +143K |
| Fashion (frozen) | F:100K only | 84% | +179K |
| species2 (K specialist) | K:39K only | 71% | +151K |
| species3 (E specialist) | E:19K only | 86% | +221K |
G9d: Fisher’s runaway display selection
The user predicted this one before the run: “I’ll be interested to see if you rederive the biological basis for arrogance.” Under winner-take-all, reward goes only to the species with the highest softmax peak on the truth class. Calibrated uncertainty loses to loud overconfidence.
The MNIST specialist had 91.6% accuracy — higher than any survivor — but went extinct at step 154,119 anyway. Two surviving generalists with concentrated full-buffer training had peakier softmax peaks (from cross-entropy gradient descent saturating on a small training set), and under WTA, peakiness beat accuracy. species2: 77% accurate, +258K energy. MNIST: 92% accurate, dead. species3 fast extinction at step 10,850 confirms the founder-advantage corollary: first species to develop loud signaling locks the niche; later species starve before they evolve competitive peaks.
This is the dynamic that produces peacock tails, mating displays, and status hierarchies in biology — sexually selected display traits that win competitions regardless of underlying fitness. In our system the “display” is softmax peak height; the “mate choice” is the WTA reward gate; the runaway is gradient descent + selection compounding the peakedness across generations. Honest accuracy loses to confident display.
What G9 establishes
Three biologically-distinct evolutionary regimes from the same neuroevolution substrate, distinguished only by reward-and-training rules:
- G9b ≈ Galapagos finches. Niche isolation produces species specialized to their food source. No inter-specialist competition; selection acts on raw fitness in each niche separately.
- G9d ≈ peacocks. Same-niche competition + display-based reward selects for loud display traits on top of accuracy. Arrogance pays.
- G9 ≈ raccoon ecology. Laissez-faire mixed niches let generalists invade and out-compete specialists on each individual niche through superior attack surface.
We’ve now rederived three classical ecological mechanisms (allopatric speciation in G5/G8, niche partitioning in G9b, sexual selection in G9d) from a single neuroevolution framework. Each requires only a different rule for who eats what and how they train.
The “neural ecosystem” framing started as expressive language for a system that happens to use neural networks. By the end of the battery, it had earned a different status: it made specific experimental predictions that the data confirmed, and the predictions matched the metaphor’s mechanism rather than its surface imagery.
G6 (hybrid adapt + speciate) — biology predicted it would fail. Anteaters don’t gradually become omnivores when ants get scarce. Letting the pre-trained MNIST species “learn fruit” via the shared failure buffer should erode its anteater-ness without producing a meaningfully better generalist. Result: G6 dropped MNIST accuracy from 91.7% (G5 v2 frozen) to 78.2% (G6 adapted), with no ensemble gain. Specialization is precious; mixing pressures within a lineage degrades it.
G7 (cross-task transfer) — biology predicted it would be negative. A desert-adapted lizard in a swamp has the wrong adaptations. Group B established MNIST and KMNIST have opposite locality preferences, so the MNIST specialist’s spatial-bias patches should interfere with KMNIST learning. Result: warm-start trailed fresh-init by 1-2pp throughout 100K steps. Evolution had to undo the wrong inductive bias before finding the right one.
G8 (one species per novel task) — biology predicted the pattern exactly. This is allopatric speciation in textbook form: new environment, sustained selection pressure, reproductive isolation, distinct specialization. The G8 result: two spawn events, exactly one per novel-task introduction, each new species specialized in the task that triggered it. The system doesn’t accumulate species without bound — the spawn trigger fires only at regime shifts.
Parallel evolution in G5 v2. species2 and species3 spawned from the same parent and converged to similar KMNIST specializations. That’s the anteater/pangolin pattern — different lineages, similar selection pressure, convergent specialization.
The mechanism is small
The working ecosystem recipe is three rules:
- Preserve specialists. Don’t let pre-trained species adapt to new tasks; they get worse at their original task without becoming better at new ones.
- Suppress out-of-diet votes. Each species only votes within the classes it trained on. Knowledge-aware self-abstention prevents overconfident wrong votes from diluting correct ones.
- Spawn on demand. When the ensemble’s rolling accuracy collapses for a sustained window, clone a parent and train it on the accumulated failure buffer.
All the interesting behavior — one species per novel task, parallel evolution, self-regulating species count, the +13pp improvement over single-niche replay — emerges from those three rules interacting with environmental pressure. Biology isn’t complicated, it’s compositional — simple rules running for a long time on lots of substrate. The framing isn’t decorative; it’s a working theory.
(Full discussion in notes/group_g/biology_notes.md.)
G9 hexology: six evolutionary regimes from one framework
Extending the G9/G9b/G9d trichotomy with three one-variable variants asks targeted questions:
- G9bd (G9b + WTA reward): do niche-binding and WTA compose?
- G9e (G9d + per-attempt calibration penalty): can the runaway be bounded by costly confidence?
- G9f (G9b + environment shift mid-run): are specialists adaptively sustained or historically sustained?
Results:
| variant | one-variable change | regime | survivors | extinctions |
|---|---|---|---|---|
| G9 | baseline | generalist invasion | 3 | MNIST |
| G9b | niche-bound training | carrying capacity | 4 | 0 |
| G9d | WTA reward | Fisher’s runaway | 3 | MNIST + species3 |
| G9bd | niche-bound + WTA | niche-binding dominates | 4 | 0 |
| G9e | WTA + per-attempt calibration penalty | mass extinction monoculture | 1 | 8 |
| G9f | niche-bound + environment flip | frequency-invariant sustainability | 4 | 0 |
G9bd: niche-binding dominates over WTA
Identical to G9b’s result: 4 alive specialists, zero extinctions, M=92% / F=84% / K=70% / E=86%. The WTA reward distribution is silenced because niche-binding pre-empts the layer where WTA would act — multiple species never attempt the same example, so the “who wins the reward” question never arises. The mechanisms compose by one dominating the other, not by both contributing.
G9e: per-attempt calibration penalty produces monoculture
8 of 9 species went extinct, with the only survivor (species4) at +331K energy. The MNIST specialist died at step 1,854 with 92.7% accuracy. Even Fashion, with its narrow diet and pre-trained calibrated peaks, was driven extinct at step 389,858 by slow erosion of its income.
Why the prediction failed: cost is symmetric across attempters (winners and losers both pay), income is winner-take-all (only winner gets reward). Net for a peaky-correct species: +reward − cost > 0. Net for a peaky-wrong species: 0 − cost < 0. The penalty burden is uniform; the income asymmetry compounds dominance instead of bounding it.
Why real peacock tails do bound the runaway: real tails are costly to maintain (per-step survival cost), not costly to display (per-attempt cost). The tail imposes constant metabolic burden whether or not it’s currently displaying. Our calibration penalty modeled the wrong cost type. G9g follow-up: per-step metabolic cost proportional to softmax peakedness. Predict: produces the equilibrium I originally expected here.
G9f: rarity-weighted rewards are frequency-invariant by construction
Environment shifts 60/25/10/5 → 5/10/25/60 at step 200K. Prediction: MNIST starves when its abundant niche becomes rare. Reality: MNIST has the highest final energy of any species (+200K). Zero extinctions.
The math, doing it carefully: for a specialist with accuracy A on a niche of frequency f, income per step is f × A × (reward_per_solve) = f × A × (K/f) = A × K. The frequency cancels. Income is independent of environment composition.
Real biology: obligate specialists like pandas thrive on bamboo whether bamboo is rare or abundant — what matters is whether bamboo exists, not how much. The framework reproduces this exactly because rarity-weighting encodes “rare = valuable” by construction. Specialists are frequency-invariant.
To force extinction via environment in this framework, we’d need a niche to go to zero frequency (complete food loss), not just become rare. G9i territory.
Prediction scorecard
| variant | prediction | result | score |
|---|---|---|---|
| G9 | carrying capacity | generalist invasion | wrong |
| G9b | clean carrying capacity | matched | right |
| G9d | arrogance runaway | matched (user predicted) | right |
| G9bd | “cleanest yet” | niche-binding dominates, WTA silent | partial |
| G9e | bounded equilibrium | mass extinction monoculture | wrong |
| G9f | MNIST extinction | frequency-invariant, no extinctions | wrong |
Half right, half wrong. The wrong predictions are more informative than the right ones — each surfaced a non-obvious dynamic the framework produces that careful prior math would have revealed:
- G9: diet expansion via training defeats specialization
- G9e: per-attempt cost ≠ per-existence cost; only the latter bounds runaways
- G9f: rarity-weighting × frequency = constant by construction
Six classical biology mechanisms from one framework
| variant | classical analog |
|---|---|
| G9 (laissez-faire) | invasive species, r-strategist generalists |
| G9b (niche-bound) | allopatric speciation, Galapagos finches |
| G9d (WTA) | Fisher’s runaway, peacock sexual selection |
| G9bd (niche-bound + WTA) | reproductive isolation pre-empting mate choice |
| G9e (WTA + per-attempt calibration) | competitive exclusion principle (Gause), monoculture |
| G9f (niche-bound + env shift) | obligate specialist robustness (panda/bamboo) |
Six textbook ecological mechanisms recovered from one ~1100-line Rust framework by varying which rule applies to whom. The metaphor isn’t decorative — it’s a working theory. Each rule change produces the biological outcome the metaphor implies, and when the metaphor implies something subtle (G9f’s frequency invariance, G9e’s monoculture collapse), the framework actually produces it.
Compute and methodology
Full Group G battery: ~2.5 hours wall time on a 16-thread i9-9900K. G1 + G3 + G4 + G4b v2 + G5 v2 + G4c + G6 + G7 + F4 (Group F follow-up) + G8 covered the speciation question end-to-end. The load-bearing work was experimental design — particularly the same-data control (G1), the spawn-trigger criterion (G4b/G5), and the fair single-niche baseline (G4c).