Skip to content

An empirical investigation · 2026

The “random” seed is not random in the way the name suggests.

Every diffusion image generator takes a seed — a number tutorials call “just for reproducibility.” We measured what it actually does. In some architectures the seed deterministically controls roughly half of where things go in the picture: the framing, the layout, the vertical placement of the subject. In others that grip is almost gone, and the prompt takes over. The role is not fixed across models — it is a graded property of the architecture and how the model was trained.

The problem is not that a seed exists. Every generative process needs randomness. The problem is that this number is one of the heaviest weights on the output — co-equal with the prompt you actually wrote — and at the same time completely opaque to the author: labelled “random,” surfaced as a throwaway integer, with no readout of what it is deciding. A dominant input you cannot see or reason about is not a tool; it is an uncredited co-author. The aim here is to make its weight visible and controllable — not to remove it.

SDXL U-Net · DDPM NoobAI XL 1.1 U-Net family
% about 50 percent

point 54.4% · 95% CI [42.8 – 66.0]

The seed does about half the work of choosing where the subject lands.

MMDiT · rect-flow SD3.5-Large intermediate
% about 30 percent

point 30.2% · 95% CI [16.7 – 50.5]

Same rectified-flow MMDiT as Flux, but un-distilled (real CFG) and smaller — keeps part of the seed’s grip.

MMDiT · rect-flow Flux.1-dev MMDiT family
% about 5 percent

point 5.0% · 95% CI [1.0 – 16.0]

The compositional role of the seed nearly vanishes; the prompt takes over.

Figures are the seed’s share of variance in centroid_y — the vertical placement of the Otsu-thresholded foreground mass — the cleanest single composition axis. Bootstrap means with 95% block-bootstrap CIs (B = 5000, blocked on prompts; 64 seeds × 10 prompts per model). Source: MATRIX_RESULTS.md §1.

See the five-cell architecture matrix

Five cells, one matched protocol

The architecture matrix

Seed contribution to centroid_y composition variance across five diffusion architectures, with 95% block-bootstrap confidence intervals.
Model Family Params Training objective Res. CFG centroid_y seed % (95% CI) Regime vs Flux
NoobAI SDXL U-Net 2.6 B DDPM 1024 4.0 54.4[42.8–66.0] inverted (strong)
Animagine SDXL U-Net 2.6 B DDPM 1024 5.0 48.1[36.1–60.1] inverted (strong)
SD1.5 SD U-Net 0.9 B DDPM 512 7.0 35.9[24.6–50.8] inverted (weaker)
SD3.5 MMDiT 8 B DDPM 1024 4.0 30.2[16.7–50.5] intermediate (graded)
Flux MMDiT 12 B rectified flow 1024 4.0 5.0[1.0–16.0] baseline (reference)
PixArt-Sigma DiT + cross-attn 0.6 B DDPM 1024 4.5 pending in flight
Each row reports the share of vertical composition variance (centroid_y, the cleanest single axis) attributable to the random seed rather than the prompt. The U-Net family is the seed-dominant constant: NoobAI, Animagine and SD1.5 hold the seed at 36–54% regardless of fine-tune, scale or vintage. The MMDiT-with-DDPM cell (SD3.5) sits in the middle at 30%, and the rectified-flow Flux cell collapses it to 5%. The point estimates descend monotonically — 54 → 48 → 36 → 30 → 5 — because the architecture swap (U-Net → MMDiT) and the training-objective swap (DDPM → rectified flow) each subtract roughly half of the inversion. This is a graded, additive inversion across two design axes, not a binary flip. The solid bar marks the bootstrap mean and the lighter band its 95% interval; all five measured cells use the matched protocol of 64 seeds × 10 prompts (640 cells per architecture) with a B = 5000 block bootstrap resampling on prompts. PixArt-Sigma (a non-MMDiT transformer with cross-attention) is in flight; the dashed marker shows its pre-registered 25–40% band.

S03 · Variance decomposition

Where the vertical layout comes from

Each model's centroid_y variance splits three ways: the share fixed by the seed, the share fixed by the prompt, and the seed×prompt interaction. Reading top to bottom — NoobAI, Animagine, SD1.5, SD3.5, Flux — the orange seed band shrinks from roughly half to almost nothing while the cyan prompt band grows to dominate. That handoff is the finding.

Variance decomposition of centroid_y by model Horizontal stacked bars, one per model, each split into seed, prompt and interaction fractions of centroid_y variance. From NoobAI to Flux the seed fraction falls from 0.498 to 0.026 while the prompt fraction rises from 0.232 to 0.620. 0 25 50 75 100 share of centroid_y variance (%) NoobAI 49.8 23.2 27.0 Animagine 44.0 30.0 26.0 SD1.5 30.4 36.0 33.6 SD3.5 24.1 46.1 29.8 Flux 2.6 62.0 35.4
Per-feature variance decomposition of centroid_y from variance.json (64 seeds × 10 prompts per model). Seed band in orange (--accent-unet), prompt band in cyan (--accent-mmdit), seed×prompt interaction in muted grey. The seed share collapses from 49.8% on NoobAI to 2.6% on Flux while the prompt share rises from 23.2% to 62.0%. These are point estimates from the per-feature variance file and differ slightly from the headline figures in S01/S02 (NoobAI 54.4, SD3.5 30.2, Flux 5.0), which are block-bootstrap means with 95% intervals (B = 5000); the small offset between point estimate and bootstrap mean is expected.

The graded inversion

Not a switch — a slope

The first framing of this work was binary: the seed governs vertical composition in U-Net models and surrenders that control in MMDiT models. SD3.5 broke the binary. It lands at 30% seed control of centroid_y — squarely between NoobAI’s 54% and Flux’s 5%. The inversion is not a switch that one design choice flips. It is a slope that two independent choices descend, roughly half each.

NoobAI and Flux differ on five confounded axes at once: backbone (U-Net vs MMDiT), training objective (DDPM vs rectified flow), parameter count, text encoder, and training data. With only those two models, every causal story is observationally identical. You cannot say whether the seed lost its grip because the architecture changed or because the loss changed — the two moved together.

SD3.5 was the critical experiment because it breaks the confound. It shares its backbone with Flux (MMDiT) but shares its training objective with NoobAI (DDPM-family, not rectified flow). It also matches Flux on text encoder (T5 + CLIP) and parameter scale class (8 B vs 12 B). So SD3.5 holds the training axis at NoobAI’s setting while flipping the architecture axis to Flux’s. Whatever it does, it isolates one axis at a time.

The decomposition

Reading the three filled cells of a 2×2 design (architecture × training objective) gives an additive partition on centroid_y:

  • U-Net + DDPM — NoobAI baseline 54%
  • MMDiT + DDPM — SD3.5, architecture swap
    −24 pp from changing the backbone
    30%
  • MMDiT + rect-flow + distill — Flux, objective swap
    −25 pp from changing the training
    5%

Each design choice removes roughly half the seed’s compositional control, and neither alone explains the full inversion. Changing architecture while holding the objective costs about 24 percentage points (54 → 30); changing the objective while holding the architecture costs about 25 more (30 → 5). The permutation tests confirm both steps independently: SD3.5 sits significantly below its U-Net neighbour Animagine (Δ ≈ −20 pp, BH-adjusted p = 0.0002) and significantly above Flux (Δ = +21.5 pp, BH-adjusted p = 0.0002). SD3.5 carves out statistically distinct ground from both families.

Three-point graded inversion of centroid_y seed control A line descending from NoobAI at 54 percent to SD3.5 at 30 percent to Flux at 5 percent, with the two drops labelled minus 24 and minus 25 percentage points. 60 40 20 0 seed control of centroid_y (%) −24 pp architecture −25 pp training obj. 54% 30% 5% NoobAI U-Net · DDPM SD3.5 MMDiT · DDPM Flux MMDiT · rect-flow
Block-bootstrap point estimates (B = 5000, blocked on prompt) of the seed’s share of centroid_y variance, at matched 64 × 10 protocol. The slope is read in two independent steps: swap the backbone (NoobAI → SD3.5), then swap the training objective (SD3.5 → Flux). Each removes about half the control.
Pre-registration win — not a post-hoc story

Before the SD3.5 sweep ran, three outcomes were written down with probabilities: Scenario A (looks like Flux, p = 0.55), Scenario B (looks like NoobAI, p = 0.15), and Scenario C — intermediate, seed ≈ 20–35%, p = 0.30. The observed 30.2% with CI [16.7 – 50.5] landed inside Scenario C’s point-estimate band. The graded reading was the predicted outcome of a frozen forecast, not a narrative assembled after seeing the number.

What this opens — the 2×2 design

Three of the four cells are now filled. The missing cell is a U-Net trained with rectified flow — the opposite corner from SD3.5. The additive partition predicts it should land near ≈30%, reached from the other direction: U-Net’s +24 pp of architecture offsetting rectified flow’s −25 pp of objective. A clean ~30% there would confirm the two axes are genuinely additive rather than interacting.

A second axis is under test now. PixArt-Sigma running is a DiT — a transformer diffusion model, but not MMDiT: it injects text through cross-attention, more like a U-Net at the conditioning level. If it sits near the U-Net family (~40–50%), the operative split is the multimodal-attention pattern specific to MMDiT, not transformers in general. The pre-registered prediction is 25–40% (“matches transformer, not MMDiT”, p = 0.55).

Section 05 — the panel

The seven-model panel

The matrix is not a convenience sample. Every model earns its place by holding some variables fixed while one changes, so that any single observed difference can always be cross-checked against another pair. The headline number on each card is the vertical-centroid seed contribution — the share of variance in where the subject sits in the frame that is set by the seed rather than the prompt. It runs from 54.4% on NoobAI down to 5.0% on Flux.

Four twin pairs hold the panel honest, each isolating one confound:

  • NoobAI ↔ Animagine
    holds architecture, varies training corpus → confirms replication
  • NoobAI ↔ SD1.5
    holds architecture family + training, varies scale → tests the size axis
  • Flux ↔ SD3.5
    holds architecture (MMDiT), varies training objective → rectified flow vs DDPM
  • PixArt ↔ Flux
    holds transformer-ness, varies MMDiT vs cross-attention → isolates the attention pattern

SD1.5SD U-Net

U-Net
Params
0.9 B
Training
DDPM

centroid_y seed share

35.9%
95% CI [24.6 – 50.8]

measured

Tests whether smaller, older U-Nets show the effect — they do.

NoobAI XL 1.1SDXL U-Net

U-Net
Params
2.6 B
Training
DDPM

centroid_y seed share

54.4%
95% CI [42.8 – 66.0]

measured

The headline cell, measured first.

Animagine XL 4.0SDXL U-Net

U-Net
Params
2.6 B
Training
DDPM

centroid_y seed share

48.1%
95% CI [36.1 – 60.1]

measured

Independent-lineage replication of NoobAI.

Illustrious XLSDXL U-Net

U-Net
Params
2.6 B
Training
DDPM

centroid_y seed share

failed to load
missing diffusers config

not run

Optional second replication.

PixArt-SigmaDiT + cross-attention

DiT
Params
0.6 B
Training
DDPM

centroid_y seed share

in flight
sweep running — value pending

in flight

DiT but not MMDiT: isolates transformer-vs-MMDiT.

Flux.1-devMMDiT

MMDiT
Params
12 B
Training
rectified flow + distilled

centroid_y seed share

5.0%
95% CI [1.0 – 16.0]

measured

The inversion cell.

SD3.5-LargeMMDiT

MMDiT
Params
8 B
Training
DDPM

centroid_y seed share

30.2%
95% CI [16.7 – 50.5]

measured

The disentanglement cell: same arch as Flux, training like NoobAI.

Reference

Model families & training methods

The open text-to-image ecosystem is a handful of architectural lineages, each with a different way of injecting text, a different training objective, and a different age. The seed-composition behaviour we measure tracks two of these axes — the backbone (how text and image interact) and the training objective (how the model learns to denoise). This table lays the families side by side so you can see where each measured model sits.

Backbone axisU-Net (text via cross-attention at fixed resolutions) → DiT (transformer, cross-attention) → MMDiT (transformer, text+image tokens mix at every block).
Training axisDDPM / score-matching (predict the noise, many steps) vs rectified flow / flow-matching (learn a straight path noise→image).
Guidance axisReal classifier-free guidance (two forward passes) vs guidance-distilled (the CFG behaviour baked in, one pass — Flux).
FamilyBackboneText enters viaTraining objective Text encoderParamsReleasedReception & adoptionSeed→centroid_y
Stable Diffusion 1.xSD 1.4 / 1.5 · ToonYou, etc. U-Net cross-attention DDPM · ε-prediction CLIP ViT-L/14 ~0.9 B Aug–Oct 2022 Foundational launched the open ecosystem; thousands of fine-tunes/LoRAs, still used 35.9%
Stable Diffusion 2.xSD 2.0 / 2.1 U-Net cross-attention DDPM · ε/v-prediction OpenCLIP ViT-H ~0.9 B Nov 2022–23 Rejected encoder swap + dataset filtering broke prompts; community stayed on 1.5 not run
SDXLSDXL 1.0 · NoobAI · Animagine · Pony · Illustrious U-Net cross-attention DDPM · ε/v-prediction CLIP-L + OpenCLIP bigG 2.6 B Jul 2023 Dominant the open standard; ~70–80% of open-weight T2I use, vast fine-tune base 48–54%
PixArtPixArt-α / PixArt-Σ DiT cross-attention DDPM-style · often step-distilled T5-XXL 0.6 B 2024 Niche research-respected (efficient, T5); small consumer/fine-tune community 10.0%
Stable Diffusion 3 / 3.5SD3.5-Large MMDiT text+image tokens, every block rectified flow · real CFG T5-XXL + dual CLIP 8 B 2024 Rocky SD3 launch hit license + quality backlash; 3.5 recovered some, lost ground to Flux 30.2%
FluxFlux.1-dev / schnell MMDiT text+image tokens, every block rectified flow · guidance-distilled T5-XXL + CLIP-L 12 B Aug 2024 Ascendant new high-end open standard; fast-growing fine-tune momentum 5.0%
Video (Phase 2)Wan 2.2 · HunyuanVideo MMDiT (video) text+image+time tokens rectified flow T5 / umT5 2024–25 Rising leading open video models; adoption climbing fast planned

Training objective colour: DDPM / score-matching rectified flow / flow-matching. Seed→centroid_y is the measured share of vertical-composition variance the seed explains (bootstrap mean).

Read carefully: SD3.5 and Flux are both rectified-flow MMDiT models — they do not differ in training objective. What separates them is guidance distillation (Flux bakes CFG in; SD3.5 uses real CFG) and scale (12 B vs 8 B). So the NoobAI → SD3.5 → Flux descent conflates backbone (U-Net → MMDiT) and objective (DDPM → rectified flow) in its first step, then isolates distillation + scale in its second. The clean way to separate backbone from objective is the InstaFlow cell on the roadmap — a U-Net trained with rectified flow — which is exactly why it is the missing corner of the 2×2.

The research program

Experiment roadmap

Every experiment, grouped by what it does for the argument — with what it measures, where it runs on the GH200, how long it takes, its priority, and what each outcome would imply and where it points next. Generated from the live registry at tools/experiments.json; run tools/inquisition for live status.

8done
9queued
1future
4.7 hqueued GPU wall-time
64×10seeds × prompts / sweep

Core architecture matrix

The measured cells that establish the effect and its graded shape across architectures.

PriWhatWhat it measuresWhere to runDurationImportance · implication · direction
P0 doneNoobAI XL — headline cell SDXL U-Net + DDPM (baseline of the seed-dominant family) GH200 · ~11 GB 27 min Proof the effect exists: the seed fixes ~half of vertical composition in SDXL. Anchors the whole program.
P0 doneAnimagine XL — fine-tune replication holds architecture, varies training corpus GH200 · ~11 GB 58 min Rules out a single-checkpoint artifact → lets us say 'SDXL family', not 'one model'.
P0 doneSD1.5 — scale/vintage cell holds U-Net+DDPM, varies scale (0.9B) and base training GH200 · ~5 GB 28 min Effect predates SDXL → it is a U-Net+DDPM property, not SDXL-specific or scale-specific. Pushes the cause toward backbone/objective.
P0 doneFlux.1-dev — inversion cell MMDiT + rectified flow + guidance distillation GH200 · ~24 GB 1.4 h The inversion. Opens the central question: is it the MMDiT backbone or the rectified-flow objective?
P0 doneSD3.5-Large — disentanglement cell MMDiT + DDPM — breaks the architecture/training confound GH200 · ~20 GB 1.9 h Graded middle (30%) → the inversion is NOT binary; both axes contribute. Forces the 2×2 design. (Note: SD3.5 is rectified-flow, like Flux — so vs Flux it isolates distillation+scale, not objective.)

Confound-closing cells (the 2×2)

The decisive cells that separate backbone from training objective from distillation. Highest research leverage.

PriWhatWhat it measuresWhere to runDurationImportance · implication · direction
P1 queuedSDXL-Lightning — distilled U-Net (breaks distillation confound) distilled U-Net — if it collapses to single digits, distillation dominates even architecture GH200 · ~11 GB 6 min Distilled U-Net. Collapses ⇒ distillation alone suppresses the seed even on a U-Net (distillation is the dominant cause). Stays ~50% ⇒ distillation is not the cause. Isolates distillation from architecture.if distillation drives it: ~5-15%; if architecture drives it: stays ~50%
P1 queuedPixArt-alpha @ real CFG — non-distilled DiT-x-attn non-distilled DiT cross-attention — the other half of the distillation test GH200 · ~18 GB 20 min Non-distilled DiT at real CFG. Rises toward U-Net ⇒ PixArt's collapse was distillation. Stays ~10% ⇒ it is the DiT backbone. Removes the distillation confound from the PixArt result.
P1 doneInstaFlow-0.9B — the missing 2x2 cell U-Net + RECTIFIED FLOW — the empty quadrant of the 2x2 GH200 · ~6 GB 10 min THE missing corner: U-Net + rectified flow. ~30% ⇒ objective is ~half the effect (additive model confirmed); ~50% ⇒ objective irrelevant, backbone rules; ~5% ⇒ rectified flow alone kills it. This single cell decides the causal decomposition.centroid_y seed ~30% if architecture and training contribute additively

Within-family replication

Extra SDXL fine-tunes that tighten the generalization claim. Low new information, high robustness.

PriWhatWhat it measuresWhere to runDurationImportance · implication · direction
P2 queuedPony Diffusion XL — SDXL replication #3 third independent SDXL fine-tune lineage GH200 · ~11 GB 58 min Third independent SDXL fine-tune. Tightens within-family CIs; strengthens 'SDXL family' generalization. Little new direction.
P2 queuedIllustrious XL — retry SDXL U-Net replication GH200 · ~11 GB 58 min Fourth SDXL fine-tune (blocked on a diffusers config). Same role as Pony; unblock then run.

Validity ablations

Referee defenses: show the effect is not an artifact of sampler, prompt set, or guidance regime.

PriWhatWhat it measuresWhere to runDurationImportance · implication · direction
P1 queuedSampler ablation (NoobAI: Euler-a vs DDIM) does stochastic (ancestral) sampling dissolve the seed-composition coupling? GH200 · ~11 GB 27 min Euler-a vs DDIM on NoobAI. Holds ⇒ the effect is not a DDIM-determinism artifact. A standard reviewer challenge; answer it pre-emptively.
P1 queuedPrompt-set sensitivity (2 alt sets x NoobAI,Flux) is the effect specific to our 10 prompts? GH200 · ~11 GB 27 min Two alternate prompt sets × NoobAI, Flux. Split stays ⇒ the seed%/prompt% partition is not an artifact of one prompt set. Controls prompt-set sampling uncertainty.
P2 queuedCFG dose-response on Flux + SD3.5 can ANY CFG on MMDiT recover seed control? (U-Net showed monotonic decline) GH200 · ~24 GB 41 min CFG dose-response on MMDiT, mirroring the NoobAI sweep. MMDiT seed% staying low across all CFG ⇒ the inversion is not a guidance-regime artifact. Completes the CFG story across architectures.

Rigor & complementary evidence

Orthogonal evidence and statistical backbone: inversion arm, learned features, Bayesian model.

PriWhatWhat it measuresWhere to runDurationImportance · implication · direction
donePixArt-Sigma — DiT-not-MMDiT control transformer with CROSS-ATTENTION text injection (not multimodal mixing) GH200 · ~18 GB 20 min REFUTED prediction: landed 10% (Flux regime), not 25-40%. Raises guidance-distillation as the driver.PRE-REG 25-40% prob 0.55 — REFUTED; actual 10.0 [4.0-22.3]
P2 doneDINOv2 patch-token features (re-measure existing sweeps) richer composition features than Otsu centroid GH200 GPU ~min Learned DINOv2 patch features vs hand-built centroid/palette. Inversion holding in DINOv2 space ⇒ not an artifact of our feature choice. Generalizes the feature basis. (Done on SD3.5 + PixArt.)
P1 queuedDDIM inversion arm (bug fixed) — NoobAI + Flux independent evidence: invert real image -> noise -> regen, measure composition recovery GH200 · ~11 GB 43 min Recover the seed from a real image, regenerate, measure. Forward and inverse agreeing ⇒ composition really lives in the noise (U-Net) or really does not (MMDiT). Orthogonal evidence for the causal claim.
P2 queuedBeta GLMM across all cells (PyMC, NUTS) bounded-[0,1] hierarchical model of seed fraction CPU · this box ~min Bounded-proportion Bayesian model across all cells → one coherent posterior on the seed fraction with partial pooling. The statistical backbone reviewers will ask for.

Expansion

Where the program goes if the matrix holds: the fastest-growing modality.

PriWhatWhat it measuresWhere to runDurationImportance · implication · direction
P3 futurePhase 2: video models (Wan I2V + Hunyuan) does the inversion propagate to video models that inherit T2I backbones? GH200 · ~30 GB 1.3 h Do video models inherit the seed-composition behaviour of their T2I backbones? Extends the finding to the fastest-growing modality (Wan I2V, HunyuanVideo). Phase 2 of the program.

Priority: P0 done · P1 referee-critical / resolves the core causal question · P2 strengthens rigor or replication · P3 expansion. Where-to-run is the GH200 at this box; VRAM is observed or estimated. Duration uses measured seconds-per-image where a sweep has completed, else a documented estimate.

# live status of the same program, from the terminal:
research/inquisition/tools/inquisition # full dashboard
research/inquisition/tools/inquisition queue # upcoming + time estimates

S06 · Statistical rigor

Four methods that hold the claim up

The headline finding is a claim about variance: the seed explains roughly half of NoobAI’s vertical composition and almost none of Flux’s. A variance-fraction is a ratio of estimated quantities, bounded to [0, 1], computed over a small, hand-curated prompt set, and then compared across many model pairs. Each of those four properties is a way the claim could be an artefact rather than a fact — sampling noise, the wrong noise model, no formal test, or sheer multiplicity of comparisons. The four methods below each close one of those gaps. Together they convert “the numbers look very different” into “the difference is +47.1 percentage points, permutation p ≤ 0.0002, surviving FDR correction across the whole family of tests.”

METHOD 01

Block bootstrap B = 5000, resample on prompts

What it does

Resamples the 10 prompts with replacement 5000 times, recomputing the variance decomposition each time, and reads the 2.5th / 97.5th percentiles as a 95% confidence interval on every fraction.

Why we used it

A point estimate needs error bars; blocking on prompts respects the seed×prompt grid structure rather than treating cells as exchangeable.

Resampling prompt columns with replacement prompts 123 45 1 2 2 3 5 one of 5000 resamples point estimate within 95% CI
Each resample reshuffles which prompts are present; the spread of 5000 decompositions gives the interval.

Result it produced

NoobAI centroid_y seed% = 49.8 [46–53]; Flux = 2.7 [1.4–4.6].

METHOD 02

Permutation test stratified by prompt, B = 5000

What it does

Shuffles the model labels within each prompt 5000 times to build a null distribution for the between-model difference, then asks how often a shuffle matches or beats the observed gap.

Why we used it

It yields a non-parametric p-value for “is NoobAI’s seed% really different from Flux’s?” with no distributional assumptions on a bounded ratio.

Shuffling model labels within a prompt observed labels NNN FFF shuffled FNF NF null T from shuffles T_obs p ≤ 0.0002
The observed difference T_obs sits far in the tail of the shuffle-null; the tail mass is the p-value.

Result it produced

NoobAI vs Flux centroid_y: Δ = 0.471, p ≤ 0.0002 (Monte Carlo floor).

METHOD 03

Beta GLMM PyMC, NUTS sampler

What it does

A Bayesian generalized linear mixed model with a Beta likelihood on the logit scale and crossed random effects for seed, prompt, and interaction; NUTS draws the posterior over each variance component.

Why we used it

Variance fractions are bounded proportions in [0, 1]; a Beta model respects that support — where Gaussian ANOVA would not — and returns posterior credible intervals directly.

Beta posterior densities on the bounded interval 00.51 Flux NoobAI posterior of seed-variance fraction
Posteriors live entirely inside [0, 1]; their separation is the credible-interval form of the inversion.

Role

Cross-checks the bootstrap CIs with partial pooling, robust to noisy small cells (S = 64, P = 10).

METHOD 04

Benjamini–Hochberg FDR q = 0.05 across the test family

What it does

Sorts the raw p-values from the ~18–30 pairwise comparisons and adjusts each by its rank, rejecting only those that clear the stepped threshold so the expected false-discovery share stays at 5%.

Why we used it

With dozens of comparisons some will look significant by chance; BH controls the false-discovery rate without the brutal power loss of Bonferroni on correlated tests.

Sorted p-values against the Benjamini-Hochberg line k·q/m rank k → below line = rejected
Each dot is one sorted p-value; those below the BH line are declared discoveries at q = 0.05.

Result it produced

12 of 18 composition comparisons reject at the 0.001 level after BH.

Why all four, not one. The bootstrap puts error bars on the estimate but cannot test a difference; the permutation test tests the difference but assumes nothing about the estimate’s shape; the Beta GLMM honours the bounded support that both of those approximate; and BH keeps the whole family of comparisons honest. Remove any one and the claim reverts to “the numbers look different,” which is exactly the gap this layer was built to close. Full protocol in 02-method/STATISTICAL_RIGOR.md and the pairwise table in MATRIX_RESULTS.md §7.

S07 · NoobAI · CFG dose-response

Turning up the guidance loosens the seed’s grip — but never lets go

How much of the U-Net seed-dominance is a guidance-regime effect rather than an irreducible property of the architecture? We sweep NoobAI across five classifier-free-guidance scales and re-measure how much of each composition feature’s variance the seed explains. The seed’s compositional grip declines monotonically as guidance climbs — but even at CFG = 10, vertical placement stays ~43% seed-driven, roughly nine times Flux’s 5%. The inversion is therefore not just “MMDiT operates at high effective CFG.” Half the gap survives any guidance choice.

CFG dose-response of seed contribution on NoobAI A multi-series line chart. The horizontal axis is classifier-free guidance scale at 1, 3, 4, 7 and 10. The vertical axis is the percentage of composition variance explained by the seed, from 0 to 90 percent. Three descending lines are shown: vertical centroid falls from 82.3 to 43.0 percent, horizontal centroid falls from 83.2 to 32.0 percent, and foreground fraction falls from 52.4 to 13.7 percent. A vertical guide marks CFG equals 4, the typical practitioner setting. All three lines decline monotonically as guidance increases, yet vertical centroid never drops below 43 percent, far above the Flux MMDiT baseline of about 5 percent. seed contribution to variance (%) CFG = 4 · typical practitioner 82.3 43.0 83.2 32.0 52.4 13.7 1.0 3.0 4.0 7.0 10.0 classifier-free guidance scale (CFG) Monotonic decline, but the floor stays far above Flux.
  • centroid_yvertical placement
  • centroid_xhorizontal placement
  • fg_fractionsubject area in frame
Seed contribution to composition variance on NoobAI (SDXL U-Net) as a function of classifier-free guidance, swept at CFG = {1, 3, 4, 7, 10} with 32 seeds × 10 prompts (320 cells) per scale. Each value is the seed_frac point estimate from a two-way variance decomposition; the percentage tells you how much of where the subject lands is dictated by the random latent rather than the prompt. The dashed amber guide marks CFG = 4, the setting most practitioners use; the dashed cyan line marks the Flux MMDiT reference (centroid_y ≈ 5%). At CFG = 1 — no guidance, pure prompt-conditional sampling — the seed governs 82.3% of vertical and 83.2% of horizontal placement: the folklore “the seed is the composition” is most true here. As guidance pulls harder toward the prompt the seed’s grip falls monotonically across all three features. The takeaway: NoobAI at the strongest feasible guidance (CFG = 10) still sits at 43.0% on centroid_y — with a 95% bootstrap lower bound of 30.8%, roughly twice Flux’s upper bound of 16.0%. The guidance regime erases about half of the U-Net → Flux gap; the other half is irreducible architecture. “MMDiT is just high effective CFG” is ruled out — if it were true, NoobAI at CFG = 10 would reach Flux, and it does not. Values traced to NoobAI cfg-20260524-231325/cfg_*/bootstrap_variance.json (MATRIX_RESULTS.md §5).

S08 · complementary test

Run the diffusion backwards

The variance decomposition is a forward test: vary the seed across a fixed prompt, then measure how much the output composition spreads. It answers “does changing the noise move the subject?” The DDIM-inversion arm asks the mirror-image question. Instead of starting from random noise, it takes a real image, runs the sampler in reverse to recover the latent noise that would have produced it, and then checks whether that recovered noise still carries the compositional fingerprint. Forward: fix the prompt, vary the seed, watch the output. Inversion: fix the output, recover the seed, ask whether composition was ever in the noise to begin with.

This is a supporting arm, not the headline. The headline rests on the variance decomposition: the NoobAI–vs–Flux composition difference is significant at p < 0.0002 (permutation, B = 5000 — the Monte-Carlo floor, meaning zero of 5000 label shuffles beat the observed gap) on all three compositional axes, centroid_y, centroid_x, and fg_fraction. Inversion does not add to that p-value. It corroborates the same finding from an orthogonal direction: a positive result would mean the seed basins are intrinsic to the model rather than an artifact of where Gaussian sampling happens to start.

Why invert at all

A skeptic can dismiss the forward result as an artifact of sampling from a Gaussian: maybe random integer seeds happen to land in structured regions, and the “seed controls composition” effect is a quirk of how we draw noise rather than a property of the model. Inversion removes that escape hatch. The starting latents are no longer random draws — they are reconstructed from genuine photographs by reversing the deterministic DDIM trajectory at a null prompt.

If the same per-feature pattern holds for inverted seeds the way it holds for random ones — composition seed-driven on U-Net, prompt-driven on MMDiT — then the basin structure is intrinsic to the model, not an accident of Gaussian sampling. If it dissolves, the forward effect was a sampling artifact. That is the whole point: the two arms can disagree, and a clean replication under inversion is a much stronger claim than either arm alone.

What it measures

The arm builds an N × P grid: 32 inverted real images crossed with the same curated 32-prompt set (1024 generations), each combination generated forward at cfg = 4.0 over 28 DDIM steps. From every generated image it extracts the 18-dimensional feature vector and decomposes variance over the (inverted_seed × prompt) design, exactly mirroring the forward sweep. The N is deliberately small — inversion is expensive — and acceptable only because the effect, if present, is expected to be large.

  • Composition axes: centroid_x, centroid_y, fg_fraction — expected seed-driven on U-Net.
  • Palette axes: 15 LAB k-means channels — expected prompt-driven across all families.
  • The test statistic is the seed-variance fraction per feature, compared against the forward sweep’s bootstrap CIs.

noobai inversion arm · dtype bug fixed, sweep re-queued

DDIM inversion encode and decode flow A real image is encoded by DDIM inversion into a recovered noise latent that carries a composition fingerprint, then decoded by forward generation into a reconstruction. real image flickr30k DDIM invert reverse sampler null prompt · 28 steps recovered noise ε latent (C×H×W) composition fingerprint? is “where things go” encoded here? forward generate prompt-conditioned cfg 4.0 · 28 steps reconstruction × prompt grid decompose variance over (inverted seed × prompt)
The inversion pipeline. A real photograph is encoded by reversing the DDIM sampler at a null prompt to recover the noise latent ε that would have produced it. That latent is the unit under test: the dashed region asks whether the compositional fingerprint — where the subject sits in the frame — is written into the noise. Each recovered ε is then re-run forward across the prompt grid, and the spread of the resulting compositions is decomposed exactly as in the forward sweep. If the U-Net seed-fraction survives inversion, composition lives in the noise; on MMDiT it is expected not to.
Methodological rigor · numerical stability

The VAE must run in fp32 during inversion

DDIM inversion is far more sensitive to numerical precision than forward sampling, because it integrates the trajectory backwards and accumulates error over every step. The SDXL VAE in particular produces unstable encodings in fp16 — small rounding errors in the latent feed directly into the recovered ε and corrupt the fingerprint we are trying to measure. The encode step therefore casts the VAE to float32 before touching the image, so the latent that anchors the whole inversion is computed at full precision.

That fix introduced a second, opposite trap. The VAE was promoted to fp32 for encoding but never restored, so the forward generation pass then mixed an fp32 VAE with fp16 latents and crashed with Input type (c10::Half) and bias type (float) should be the same. The arm now explicitly restores the VAE to the pipeline’s compute dtype after inversion and before any forward pass. We flag this not as trivia but as a guard rail: a precision bug in the encoder would have silently degraded the inverted seeds and biased the comparison against the very effect the arm exists to test.

Frequently asked questions

Questions, answered plainly

Drawn from the project's working FAQ, calibrated for researchers, practitioners, and curious readers alike. Every number here traces to a measured cell or a stated confidence interval. Expand a question to read the answer; the accordion is fully keyboard-navigable.

What is a “seed” in image generation?

The seed is an integer that selects the specific pattern of random noise the model starts from. Same seed, same prompt, same settings produces the same final image, bit for bit. A different seed means different starting static, and so a different output even with an identical prompt.

Practitioners use seeds two ways: as a reproducibility tool (regenerate exactly this image later) and, on SDXL, as a hidden compositional control (“seed 47 gives a tight portrait; seed 109 gives a wider shot”). The randomness lives at the step where the seed is chosen, not in what the seed then does.

Why does it matter that the seed controls composition?

Because the number has been hiding a control surface that was never labeled as one. Every tutorial since 2022 described the seed as “a number to fix for reproducibility — random otherwise.” We measured that on SDXL the seed is doing about half the work of placing the subject in the frame (NoobAI centroid_y seed fraction 54.4%, 95% CI [42.8–66.0]) — comparable in influence to the prompt itself.

That makes the seed a covert compositional knob practitioners have been intuitively exploiting (“seed banks”, “lucky seeds”) for years without anyone formally measuring it. On Flux the same knob has moved — from the seed to the prompt — which has real consequences for any workflow that depended on it.

Is this peer-reviewed?

Not yet. It is currently a preprint — a public manuscript released before formal peer review. The predictions are pre-registered (outcome probabilities committed before the data lands), and the headline inversion is defensible at workshop scope today: p ≤ 0.0002 on the permutation test, replicated across two independent feature extractors, with non-overlapping 95% confidence intervals on the two architectures.

The plan is a workshop submission soon, then a main-track submission after the disentanglement experiments complete. It has not yet been submitted. The preprint format lets the result circulate and get critiqued in the open; everything reported is backed by public data, so anyone can reproduce or contest the numbers.

Can I use this in my own workflow today?

Yes, on SDXL-family models. Fix the seed to lock composition — seed banks are exactly the right practice on SDXL, Illustrious, NoobAI, Animagine, and Pony, which are all the same U-Net architecture. Our measurement is the explanation for why that workflow works: the seed carries roughly half the compositional variance.

On Flux the seed will not do that. If you move to a Flux pipeline, the seed bank stops carrying composition and you switch to a canonical prompt prefix (encode the framing in the first tokens), IPAdapter at higher weight for character lock, and image-to-video for animation. The finding is descriptive, not prescriptive — if your current pipeline ships images that look right, it validates that choice. It becomes relevant when you migrate between model families.

What is MMDiT, and why does it behave differently from U-Net?

MMDiT (multimodal diffusion transformer) is the architecture used by Flux and SD3/SD3.5, where text tokens and image tokens mix at every transformer block. A U-Net (SD1.5, SDXL and its fine-tunes) instead injects text at specific resolutions via cross-attention. That structural difference changes how compositional information is routed.

In our measurement, the locus of compositional control moves: on the U-Net it lives largely on the seed (the noise fiber); on MMDiT it lives on the prompt path. We hypothesize MMDiT's text-everywhere attention is what routes composition through the prompt, but we have not yet measured attention attribution — so we say “the relationship inverts between these two models,” not “MMDiT causes the inversion.” The wording matters and we hold it until SD3.5 lands.

Does this apply to video models?

Indirectly. Many video models inherit a text-to-image backbone — an MMDiT video model (Wan 2.2, Hunyuan Image-Video) extends the MMDiT family, and older HunyuanVideo-style models extend the U-Net family. The architectural prediction carries: a video model built on an MMDiT backbone should sit in the prompt-dominant regime, one built on a U-Net backbone closer to the seed-dominant regime.

We have not directly measured video variance decomposition. In practice the recommended pattern is to set composition once in a still keyframe — where the seed gives you that control on SDXL — then animate forward with an image-to-video model that takes the composition as given.

How is this different from “All Seeds Are Not Equal” (Li et al., ICLR 2025)?

Li et al. documented initial-seed effects on object placement within a single architecture family (Stable Diffusion and PixArt-α). It establishes that the seed influences where things go, but does not decompose that effect into a variance fraction or compare U-Net against MMDiT.

Our contribution is the cross-architecture contrast: we measure the variance fraction explicitly and show that the locus of compositional control moves — from the seed (U-Net, ~50%) to the prompt (MMDiT, ~5%). Single-architecture findings show the seed matters in SDXL; ours is the first to show that whether it matters is itself an architectural property.

Have you done a permutation test, not just bootstrap CIs?

Yes. The headline inversion on centroid_y is backed by a permutation test with B = 5000 shuffles of the model label. Observed Δ = 0.472 (NoobAI 0.498 − Flux 0.026); the 99th percentile of the null was 0.040, and the observed Δ was exceeded by 1 of 5000 shuffles — p ≤ 0.0002.

Bootstrap CIs alone are necessary but not sufficient for a difference claim. The permutation test is the formal test, it is distributional-assumption-free, and it survives Bonferroni correction for ~50 comparable tests.

Couldn't the centroid result just be an Otsu segmentation artifact?

No, because we replicated it with an independent feature extractor. DINOv2-large semantic features from the same NoobAI sweep give the top principal components as 75–91% prompt-driven, while centroid_y is 54% seed-driven. That dissociation — “where the subject is” is seed-driven, “what the subject is” is prompt-driven — comes from two extractors trained on different objectives.

On Flux the same DINOv2 measure gives 1.4% seed contribution, agreeing with the 5% centroid number. If centroid were an Otsu artifact, DINOv2 would not have to agree — and it does.

Doesn't the U-Net–vs–Flux comparison confound architecture with training, scale, and text encoder?

Yes — and we say so explicitly. NoobAI and Flux differ on six axes (architecture, training objective, parameter count, training data, text encoder, distillation). Until the disentanglement cells land we report “the relationship inverts between these two models,” not “the architecture causes it.”

The critical control is SD3.5-Large (MMDiT like Flux, but DDPM-style training like SDXL), which separates architecture from training objective. It is pre-registered with three outcome probabilities: architecture-wins 0.55, intermediate 0.30, training-wins 0.15. An intermediate result is one we committed to interpreting in advance.

Will you release the code and data?

Yes. All raw artifacts are already public at github.com/quivent/lambda — image grids, feature tensors, h-space activations, manifests, variance decompositions, and bootstrap CIs, at the specific commit hashes that produced each reported number.

The lambda topology CLI provides one-command reproduction; total wall time from scratch on a single 96 GB GH200 is about three hours. The companion documents are public at github.com/quivent/anime.productions and updated as data lands. There is no gatekept release.

What's next?

Five experiments are queued in priority order:

  1. SD3.5-Large — MMDiT architecture with DDPM training, the disentanglement cell separating architecture from training objective.
  2. PixArt-Sigma — a DiT that is not MMDiT and not distilled; the cleanest control for “is it the transformer, or MMDiT specifically?” (cell currently running).
  3. Larger NoobAI scale — 256 × 32 = 8192 images, tightening every reported CI by about 3.5×.
  4. DINOv2 per-patch features — richer spatial measurement than the centroid we currently use.
  5. Sampler ablation (deterministic vs ancestral), plus the missing U-Net rectified-flow cell — the absent corner of the 2×2 architecture-by-training grid — and the mechanism work on attention attribution.

S10 / Document index

The corpus

The full Inquisition corpus, organized by directory in reading order. Every entry links to its source on GitHub. Path prefix: research/inquisition/<dir>/<file>.

00-overview

Orientation 9 documents
  • READ_FIRST.md The short version in plain language: the seed is not random in the way the name suggests, and what that means architecture by architecture.
  • THE_CORE_QUESTION.md The one-sentence finding and the seed-versus-prompt role split, written so any reader can restate it in a breath.
  • FINDINGS.md The audit trail: every claim the investigation currently supports, each tagged with an explicit confidence level.
  • SIGNIFICANCE.md What is defensible right now and why: the claims that crossed from preliminary observation to rigor-backed, with the test behind each.
  • FAQ.md Anticipated questions answered for three audiences: skeptical reviewers, working practitioners, and curious laypeople.
  • IMPACT_AND_MARKET.md A calibrated estimate of how much of the text-to-image space the finding touches and what the work is worth across academic, practical, and speculative layers.
  • HISTORY.md The after-action narrative of how the investigation unfolded: what happened in what order, what was tried that failed, and where the inflection points were.
  • CONVERSATIONS.md Substantive question-and-answer exchanges preserved from the working session that produced the corpus.
  • research_log.md The running journal kept alongside the experimental queue, newest entries first, with kickoff, midpoint, and result notes per phase.

01-papers

The papers 6 documents
  • paper.md The synthesis document: an investigation into the fiber structure of diffusion models, or why what the seed encodes depends on which model is encoding it.
  • preprint.md Working preprint v0.2: how the seed encodes ~50% of image layout in SDXL and why that inverts in Flux, with bootstrap confidence intervals.
  • MATRIX_RESULTS.md The five-cell architecture matrix: NoobAI, Animagine, SD1.5, Flux, SD3.5 plus the NoobAI CFG dose-response, all at matched protocol with block-bootstrap CIs.
  • NARRATIVE.md The long-form essay, "The Number That Wasn't Random": how a forgotten dial on the most popular image generators turned out to be doing half the work.
  • related_work.md A four-axis survey of prior art on seed effects and compositional control, positioning the U-Net-versus-MMDiT variance decomposition as novel.
  • seed-composition-v1.md The v1 draft: a variance decomposition of diffusion fibers on a 64-seed by 10-prompt NoobAI sweep, testing the compositional-template folklore.

02-method

Methodology 6 documents
  • METHODOLOGY.md The per-feature variance decomposition method: running a model over a factorial seed-by-prompt grid to quantify seed, prompt, and interaction contributions.
  • STATISTICAL_RIGOR.md An honest accounting of the inferential gap and a concrete plan to close it: bootstrap CIs, permutation tests, Beta GLMM, and FDR correction.
  • EXPERIMENT_DESIGN.md The properly-set-up protocol that converts the exploratory work into inferential statistics, written before the next round of sweeps.
  • DATA_DICTIONARY.md A reference for every artifact file the topology pipeline writes, for re-analyzing published runs or building a schema-compatible pipeline.
  • THEORY.md The position paper "Diffusion Models Are Bundles, Not Maps," arguing the map view makes the seed channel invisible.
  • LEXICON.md A reference glossary defining every technical term used across the corpus, grouped by category and cross-referenced.

03-execution

How it was run 7 documents
  • DISPATCH_PLAN.md The execution layer for the experiment design: sequence, resource budget, dependencies, decision points, and commit discipline for the next round.
  • CLUSTER_STUDY.md The full architecture matrix designed to fit in ~24 hours on an 8-GPU box: 7 models by 5 CFG values plus an ablation cube and paper-scale runs.
  • PRE_REGISTRATION.md Numerical predictions and interpretations recorded before each scheduled experiment's results land, so they cannot be retconned post-hoc.
  • ARCHITECTURE_ROADMAP.md The 6-to-12-month research program: turning a single inversion finding into a published matrix, a mechanism story, and an open-source toolkit.
  • REVIEW_PREP.md Anticipated peer-review concerns with drafted responses and evidence; doubles as a rebuttal draft and flags where the work cannot yet rebut.
  • CONTINUATION.md A handoff and resumption protocol for picking the investigation back up from any state, for a collaborator, a returning author, or a future agent.
  • ROADMAP.md Superseded by DISPATCH_PLAN. The pre-pivot near-term plan, preserved as the running ledger of completed work and future experiment ideas.

04-frontier

Open questions 1 document
  • OPEN_PROBLEMS.md A standalone problem catalogue for the field: questions raised by the variance decomposition that the current method, scale, or compute cannot answer.

05-practice

Practitioner-facing 2 documents
  • PRACTITIONER_GUIDE.md For animators, prompt engineers, and stylists: what changes about how you work with seeds and prompts once the rules depend on the architecture.
  • TEACHING.md A pedagogical companion that lands the headline result for four audiences: CS undergrads, working animators, researchers, and policy readers.

06-impact

Impact essays 6 documents
  • 00_completeness_value_credentials.md The strategic-state document: how complete the plan is, what a discovery like this is worth, and what credentials it needs to be taken seriously.
  • 01_folk_knowledge_to_science.md How the variance decomposition reframes community folk knowledge as citable, falsifiable science, and what that transition means structurally and culturally.
  • 04_multibillion_company_path.md The asymmetric-upside document: precedents, company shapes, and honest probability calibration for the multibillion-dollar path opening on the finding.
  • 05_what_could_pass_stable_diffusion.md The paradigm-risk document: what could displace diffusion entirely within 2-5 years, and honestly how much of this work survives that displacement.
  • 06_design_dimension_value.md A history-anchored valuation of "compositional locus configurability" as a new feature axis along which next-generation diffusion models could compete.
  • 08a_products_b2b_enterprise.md B2B and enterprise product concepts built on the finding, such as brand-locked generation that shares a compositional skeleton across asset variations.

/ (root)

Navigation hub 1 document
  • README.md The entry point: an overview of the investigation, the folklore claim it tested, and a guide to the directory structure of the corpus.