Architecture
zkmcu ships as a family of parallel crates. Three library crates for the three supported proof systems, plus shared infrastructure for test vectors, firmware bench runs, and the deterministic-timing allocator experiments.
Library crates
Section titled “Library crates”| Crate | Role |
|---|---|
zkmcu-verifier | no_std Groth16 verify on BN254, EIP-197 wire format. Backed by substrate-bn. |
zkmcu-verifier-bls12 | no_std Groth16 verify on BLS12-381, EIP-2537 wire format. Backed by zkcrypto bls12_381. |
zkmcu-verifier-stark | no_std STARK verify on Goldilocks + Blake3, winterfell 0.13 wire format. Ships with a Fibonacci AIR reference; user AIRs wire in via the same Air trait winterfell uses. |
zkmcu-vectors | no_std loader for committed binary test vectors via include_bytes!. All three proof systems, plus a real Semaphore vector. |
zkmcu-bump-alloc | no_std bump-style GlobalAlloc with watermark save/restore. Used for the deterministic-timing benchmark runs; also publishable as a standalone primitive. |
zkmcu-host-gen | Host-only CLI. Uses arkworks to generate Groth16 proofs on either curve (with cross-check via the second independent implementation), uses winter-prover for STARK proofs, imports the vendored Semaphore VK + externally-generated proofs. |
The three verifier crates all have the same API shape. Groth16 verifiers share a five-function form (parse_vk, parse_proof, parse_public, verify, verify_bytes). STARK is a bit different because there’s no VK (the AIR is the invariant), but the rest looks the same, one verify(proof, public) -> Result<()> function.
No generic trait unifying them though. I kept the implementations independent on purpose, so you can read each crate top to bottom without chasing generic traits across six layers of abstraction.
Firmware crates
Section titled “Firmware crates”| Crate | Target | Curve / System |
|---|---|---|
bench-rp2350-m33 | ARM Cortex-M33, thumbv8m.main-none-eabihf | BN254 Groth16 |
bench-rp2350-m33-bls12 | same | BLS12-381 Groth16 |
bench-rp2350-m33-stark | same | Winterfell STARK |
bench-rp2350-rv32 | RISC-V Hazard3, riscv32imac-unknown-none-elf | BN254 Groth16 |
bench-rp2350-rv32-bls12 | same | BLS12-381 Groth16 |
bench-rp2350-rv32-stark | same | Winterfell STARK |
All six import the verifier + vectors crates. The crypto source is byte-for-byte identical between the M33 and RV32 variants of the same system. Only entry macros, linker scripts, and cycle-counter reads change. That’s the portability claim this whole layout is designed to prove out.
Verification algorithm (Groth16 paths)
Section titled “Verification algorithm (Groth16 paths)”Given a verifying key (α, β, γ, δ, IC), a proof (A, B, C), and public inputs x[0..n]:
- Compute
vk_x = IC[0] + Σ x[i] · IC[i+1]inG1 - Check that the product
e(-A, B) · e(α, β) · e(vk_x, γ) · e(C, δ)equals the identity in the target groupGt
For BN254, the four pairings go through substrate_bn::pairing_batch, one multi_miller_loop + one final exponentiation shared across the four pairs.
For BLS12-381, same algebraic shape, through pairing::MultiMillerLoop::multi_miller_loop on zkcrypto’s bls12_381::Bls12.
Same algorithm, different backends. The API surface you write against in both cases is identical.
Verification algorithm (STARK path)
Section titled “Verification algorithm (STARK path)”Different shape entirely. There is no VK in the Groth16 sense, the AIR definition (transition constraints, boundary assertions, trace width) is the verifier-side invariant. Given:
- A
winterfell::Proof(FRI-based, Merkle-committed, Blake3-hashed) PublicInputsfor the AIR (Fibonacci’s is just the claimed result)
Verify does:
- Parse the proof into winterfell’s structured form
- Verify the trace commitment Merkle tree
- Verify the constraint composition polynomial commitment
- Verify the FRI folding chain (13 layers at blowup 8, trace length 1024)
- Check the queries hit consistent values across all the above
- Final DEEP composition consistency check at the out-of-domain point
Winterfell wraps all of this behind winterfell::verify::<Air, HashFn, RandomCoin, VectorCommitment>(proof, pub_inputs, acceptable_options). zkmcu-verifier-stark is a thin wrapper that pins the hash (Blake3-256), vector commitment (binary Merkle tree), and minimum-security threshold (95-bit conjectured) for a specific AIR.
Cross-library consistency
Section titled “Cross-library consistency”Every committed test vector gets at least two independent crypto stacks to agree on it before the bytes are trusted. If any of them disagree, the tooling aborts instead of writing broken vectors to disk:
BN254 path:
- Vector is generated by
ark-groth160.5 on the host - Natively verified by
arkworksbefore being written to disk - Re-verified on the embedded side by
zkmcu-verifier(wich usessubstrate-bn)
BLS12-381 path:
- Vector is generated by
ark-groth16onark-bls12-381 zkmcu-host-genserialises to EIP-2537 bytes and immediately cross-checks by parsing those bytes with zkcryptobls12_381and running a hand-rolled Groth16 pairing check. If arkworks and zkcrypto disagree, the tool aborts before writing the.binfiles.- On the embedded side,
zkmcu-verifier-bls12re-verifies the same bytes.
Semaphore path (real-world BN254 vector):
- VK extracted from the vendored Semaphore 4.14.2 trusted setup
- Proof generated by snarkjs via
@semaphore-protocol/proofunder the production snark-artifacts - Verified back through
zkmcu-verifieron host, then again on-device through the firmware
STARK path:
- Vector is produced by
winter-proverfrom a host-side Fibonacci trace, under fixedProofOptionswithFieldExtension::Quadratic zkmcu-host-genrunszkmcu_verifier_stark::fibonacci::verifyon the proof bytes withMinConjecturedSecurity(95)before writing them to disk. If the prover’s configured options don’t meet 95-bit, generation aborts, so the committed bytes are guaranteed to pass the production verifier.- On the embedded side,
zkmcu-verifier-starkre-verifies the same bytes.
If all the stacks agree, wire format, arithmetic, and security parameters are consistent across implementations. This is the main safety net against a silent encoding drift slipping into a committed vector.
Why substrate-bn (BN254)
Section titled “Why substrate-bn (BN254)”- Already
no_std-compatible out of the box - Pure Rust, no C or assembly dependencies to cross-compile
- Ethereum-compatible, used in production Ethereum client implementations
- Smaller binary footprint than
ark-bn254for a single-curve verifier
Alternative I considered: ark-bn254. Works fine ofcourse, but it pulls in the whole arkworks generic-programming circus. Heavier compile times, bigger binary for a narrow verify-only use case. substrate-bn is the leaner pick, no regrets.
Why zkcrypto bls12_381 (BLS12-381)
Section titled “Why zkcrypto bls12_381 (BLS12-381)”no_std-clean out of the gate withdefault-features = false, features = ["groups", "pairings", "alloc"]- All-zkcrypto dep closure (
ff,group,pairing,subtle,rand_core). Nogetrandom, nostdleakage - Stack-allocated
G2Preparedkeeps pairing-workspace heap usage down vs heap-allocated Fq12 polynomial workspaces - Scalar-oblivious G1 scalar mul, informally constant-time with respect to scalar Hamming weight (0.09 % variance across random scalars)
I did a dep-fit spike first, documented at research/notebook/2026-04-22-bls12-381-dep-fit.md in the repo. First build, zero warnings, zero patches needed to get it no_std-clean. Honestly amazing for a crypto crate.
Why winterfell (STARK)
Section titled “Why winterfell (STARK)”no_stdout of the gate withdefault-features = false.concurrent/rayon/asyncall gated behind non-default features that stay off- Builds clean on both
thumbv8m.main-none-eabihfandriscv32imac-unknown-none-elfwith zero patches. Dep-fit confirmed inresearch/notebook/2026-04-23-stark-prior-art.md - Separate
winter-prover(std) andwinter-verifier(no_std) halves let me pull only the verify side into firmware, so prover-only code stays out of the flash budget - Pure-Rust Blake3 fallback on both embedded targets, no SIMD cross-compile wrangling
Alternatives I looked at: ax-stark (too early, not no_std-ready), stone (C++ only, doesn’t help Rust), rolling my own (absolutely not for phase 3). Winterfell’s Fibonacci reference was the shortest path to a measurable first number so that’s the one I went with.
Why a custom bump allocator
Section titled “Why a custom bump allocator”zkmcu-bump-alloc exists because of a specific finding documented on Deterministic timing. Winterfell’s verify path allocates ~400 Vecs internally, and a stock general-purpose allocator’s free-list evolution introduces enough timing jitter (~0.25 % on Cortex-M33, ~0.46 % on Hazard3) to obscure the underlying crypto’s own timing determinism. Wich is a problem if you care about side channels.
The bump allocator gives byte-identical allocator state between iterations (reset the watermark after every verify), wich brings variance down to the 0.08 % silicon noise floor. It’s three things at once:
- a benchmark tool, confirms the crypto itself is deterministic
- a diagnostic, isolates wich fraction of total variance is allocator-induced
- a generally useful
no_stdprimitive (atomic CAS bump pointer, in-place realloc on top of the bump, watermark save/restore, ~200 lines)
Not a production allocator ofcourse, no-op dealloc means memory leaks up to the watermark every iteration, wich pushes the STARK bench firmware’s heap to 384 KB. Production firmware should use embedded-alloc::TlsfHeap instead, that’s the same allocator that gives you deterministic timing at the 128 KB tier.
Workspace layout at the repo root
Section titled “Workspace layout at the repo root”bindings/├── crates/│ ├── zkmcu-verifier/ ← no_std BN254 verify│ ├── zkmcu-verifier-bls12/ ← no_std BLS12-381 verify│ ├── zkmcu-verifier-stark/ ← no_std winterfell STARK verify│ ├── zkmcu-vectors/ ← no_std test-vector loader (all systems)│ ├── zkmcu-host-gen/ ← host-only arkworks + winter-prover CLI│ ├── zkmcu-bump-alloc/ ← no_std bump allocator with watermark reset│ ├── bench-rp2350-m33/ ← Cortex-M33 firmware, BN254│ ├── bench-rp2350-m33-bls12/ ← Cortex-M33 firmware, BLS12-381│ ├── bench-rp2350-m33-stark/ ← Cortex-M33 firmware, STARK│ ├── bench-rp2350-rv32/ ← Hazard3 firmware, BN254│ ├── bench-rp2350-rv32-bls12/ ← Hazard3 firmware, BLS12-381│ └── bench-rp2350-rv32-stark/ ← Hazard3 firmware, STARK├── benchmarks/runs/<date>-<slug>/ ← raw.log + result.toml per run├── research/ ← Typst sources → PDFs (whitepaper, survey, reports)├── scripts/gen-semaphore-proof/ ← Bun + TS tool that produces the real Semaphore proof├── vendor/semaphore/ ← submodule with the upstream Semaphore source└── web/ ← this site