Architecture

zkmcu ships as a family of parallel crates. Three library crates for the three supported proof systems, plus shared infrastructure for test vectors, firmware bench runs, and the deterministic-timing allocator experiments.

Library crates

Crate	Role
`zkmcu-verifier`	`no_std` Groth16 verify on BN254, EIP-197 wire format. Backed by `substrate-bn`.
`zkmcu-verifier-bls12`	`no_std` Groth16 verify on BLS12-381, EIP-2537 wire format. Backed by zkcrypto `bls12_381`.
`zkmcu-verifier-stark`	`no_std` STARK verify on Goldilocks + Blake3, winterfell 0.13 wire format. Ships with a Fibonacci AIR reference; user AIRs wire in via the same `Air` trait winterfell uses.
`zkmcu-vectors`	`no_std` loader for committed binary test vectors via `include_bytes!`. All three proof systems, plus a real Semaphore vector.
`zkmcu-bump-alloc`	`no_std` bump-style `GlobalAlloc` with watermark save/restore. Used for the deterministic-timing benchmark runs; also publishable as a standalone primitive.
`zkmcu-host-gen`	Host-only CLI. Uses `arkworks` to generate Groth16 proofs on either curve (with cross-check via the second independent implementation), uses `winter-prover` for STARK proofs, imports the vendored Semaphore VK + externally-generated proofs.

The three verifier crates all have the same API shape. Groth16 verifiers share a five-function form (parse_vk, parse_proof, parse_public, verify, verify_bytes). STARK is a bit different because there’s no VK (the AIR is the invariant), but the rest looks the same, one verify(proof, public) -> Result<()> function.

No generic trait unifying them though. I kept the implementations independent on purpose, so you can read each crate top to bottom without chasing generic traits across six layers of abstraction.

Firmware crates

Crate	Target	Curve / System
`bench-rp2350-m33`	ARM Cortex-M33, `thumbv8m.main-none-eabihf`	BN254 Groth16
`bench-rp2350-m33-bls12`	same	BLS12-381 Groth16
`bench-rp2350-m33-stark`	same	Winterfell STARK
`bench-rp2350-rv32`	RISC-V Hazard3, `riscv32imac-unknown-none-elf`	BN254 Groth16
`bench-rp2350-rv32-bls12`	same	BLS12-381 Groth16
`bench-rp2350-rv32-stark`	same	Winterfell STARK

All six import the verifier + vectors crates. The crypto source is byte-for-byte identical between the M33 and RV32 variants of the same system. Only entry macros, linker scripts, and cycle-counter reads change. That’s the portability claim this whole layout is designed to prove out.

Verification algorithm (Groth16 paths)

Given a verifying key (α, β, γ, δ, IC), a proof (A, B, C), and public inputs x[0..n]:

Compute vk_x = IC[0] + Σ x[i] · IC[i+1] in G1
Check that the product e(-A, B) · e(α, β) · e(vk_x, γ) · e(C, δ) equals the identity in the target group Gt

For BN254, the four pairings go through substrate_bn::pairing_batch, one multi_miller_loop + one final exponentiation shared across the four pairs.

For BLS12-381, same algebraic shape, through pairing::MultiMillerLoop::multi_miller_loop on zkcrypto’s bls12_381::Bls12.

Same algorithm, different backends. The API surface you write against in both cases is identical.

Verification algorithm (STARK path)

Different shape entirely. There is no VK in the Groth16 sense, the AIR definition (transition constraints, boundary assertions, trace width) is the verifier-side invariant. Given:

A winterfell::Proof (FRI-based, Merkle-committed, Blake3-hashed)
PublicInputs for the AIR (Fibonacci’s is just the claimed result)

Verify does:

Parse the proof into winterfell’s structured form
Verify the trace commitment Merkle tree
Verify the constraint composition polynomial commitment
Verify the FRI folding chain (13 layers at blowup 8, trace length 1024)
Check the queries hit consistent values across all the above
Final DEEP composition consistency check at the out-of-domain point

Winterfell wraps all of this behind winterfell::verify::<Air, HashFn, RandomCoin, VectorCommitment>(proof, pub_inputs, acceptable_options). zkmcu-verifier-stark is a thin wrapper that pins the hash (Blake3-256), vector commitment (binary Merkle tree), and minimum-security threshold (95-bit conjectured) for a specific AIR.

Cross-library consistency

Every committed test vector gets at least two independent crypto stacks to agree on it before the bytes are trusted. If any of them disagree, the tooling aborts instead of writing broken vectors to disk:

BN254 path:

Vector is generated by ark-groth16 0.5 on the host
Natively verified by arkworks before being written to disk
Re-verified on the embedded side by zkmcu-verifier (which uses substrate-bn)

BLS12-381 path:

Vector is generated by ark-groth16 on ark-bls12-381
zkmcu-host-gen serialises to EIP-2537 bytes and immediately cross-checks by parsing those bytes with zkcrypto bls12_381 and running a hand-rolled Groth16 pairing check. If arkworks and zkcrypto disagree, the tool aborts before writing the .bin files.
On the embedded side, zkmcu-verifier-bls12 re-verifies the same bytes.

Semaphore path (real-world BN254 vector):

VK extracted from the vendored Semaphore 4.14.2 trusted setup
Proof generated by snarkjs via @semaphore-protocol/proof under the production snark-artifacts
Verified back through zkmcu-verifier on host, then again on-device through the firmware

STARK path:

Vector is produced by winter-prover from a host-side Fibonacci trace, under fixed ProofOptions with FieldExtension::Quadratic
zkmcu-host-gen runs zkmcu_verifier_stark::fibonacci::verify on the proof bytes with MinConjecturedSecurity(95) before writing them to disk. If the prover’s configured options don’t meet 95-bit, generation aborts, so the committed bytes are guaranteed to pass the production verifier.
On the embedded side, zkmcu-verifier-stark re-verifies the same bytes.

If all the stacks agree, wire format, arithmetic, and security parameters are consistent across implementations. This is the main safety net against a silent encoding drift slipping into a committed vector.

Why `substrate-bn` (BN254)

Already no_std-compatible out of the box
Pure Rust, no C or assembly dependencies to cross-compile
Ethereum-compatible, used in production Ethereum client implementations
Smaller binary footprint than ark-bn254 for a single-curve verifier

Alternative I considered: ark-bn254. Works fine of course, but it pulls in the whole arkworks generic-programming circus. Heavier compile times, bigger binary for a narrow verify-only use case. substrate-bn is the leaner pick, no regrets.

Why zkcrypto `bls12_381` (BLS12-381)

no_std-clean out of the gate with default-features = false, features = ["groups", "pairings", "alloc"]
All-zkcrypto dep closure (ff, group, pairing, subtle, rand_core). No getrandom, no std leakage
Stack-allocated G2Prepared keeps pairing-workspace heap usage down vs heap-allocated Fq12 polynomial workspaces
Scalar-oblivious G1 scalar mul, informally constant-time with respect to scalar Hamming weight (0.09 % variance across random scalars)

I did a dep-fit spike first, documented at research/notebook/2026-04-22-bls12-381-dep-fit.md in the repo. First build, zero warnings, zero patches needed to get it no_std-clean. Honestly amazing for a crypto crate.

Why winterfell (STARK)

no_std out of the gate with default-features = false. concurrent / rayon / async all gated behind non-default features that stay off
Builds clean on both thumbv8m.main-none-eabihf and riscv32imac-unknown-none-elf with zero patches. Dep-fit confirmed in research/notebook/2026-04-23-stark-prior-art.md
Separate winter-prover (std) and winter-verifier (no_std) halves let me pull only the verify side into firmware, so prover-only code stays out of the flash budget
Pure-Rust Blake3 fallback on both embedded targets, no SIMD cross-compile wrangling

Alternatives I looked at: ax-stark (too early, not no_std-ready), stone (C++ only, doesn’t help Rust), rolling my own (absolutely not for phase 3). Winterfell’s Fibonacci reference was the shortest path to a measurable first number so that’s the one I went with.

Why a custom bump allocator

zkmcu-bump-alloc exists because of a specific finding documented on Deterministic timing. Winterfell’s verify path allocates ~400 Vecs internally, and a stock general-purpose allocator’s free-list evolution introduces enough timing jitter (~0.25 % on Cortex-M33, ~0.46 % on Hazard3) to obscure the underlying crypto’s own timing determinism. Wich is a problem if you care about side channels.

The bump allocator gives byte-identical allocator state between iterations (reset the watermark after every verify), which brings variance down to the 0.08 % silicon noise floor. It’s three things at once:

a benchmark tool, confirms the crypto itself is deterministic
a diagnostic, isolates which fraction of total variance is allocator-induced
a generally useful no_std primitive (atomic CAS bump pointer, in-place realloc on top of the bump, watermark save/restore, ~200 lines)

Not a production allocator of course, no-op dealloc means memory leaks up to the watermark every iteration, which pushes the STARK bench firmware’s heap to 384 KB. Production firmware should use embedded-alloc::TlsfHeap instead, that’s the same allocator that gives you deterministic timing at the 128 KB tier.

Workspace layout at the repo root

crates/
├── zkmcu-verifier/           ← no_std BN254 verify
├── zkmcu-verifier-bls12/     ← no_std BLS12-381 verify
├── zkmcu-verifier-stark/     ← no_std winterfell STARK verify
├── zkmcu-verifier-plonky3/   ← no_std Plonky3 STARK verify (PQ-Semaphore)
├── zkmcu-poseidon-audit/     ← in-tree audit of Poseidon2-BabyBear constants
├── zkmcu-vectors/            ← no_std test-vector loader (all systems)
├── zkmcu-host-gen/           ← host-only arkworks + winter-prover CLI
├── zkmcu-bump-alloc/         ← no_std bump allocator with watermark reset
└── bench-rp2350-{m33,rv32}-* ← per-ISA, per-proof-system firmware crates
benchmarks/runs/<date>-<slug>/    ← raw.log + result.toml per run
research/                     ← Typst sources → PDFs (whitepaper, survey, reports)
scripts/gen-semaphore-proof/  ← Bun + TS tool that produces the real Semaphore proof
vendor/                       ← submodules (Plonky3, substrate-bn fork, winterfell, semaphore)

The docs site lives in a sibling repo at ../zkmcu-web/ — separate git history, separate deploy.