Skip to content

Architecture

zkmcu ships as a family of parallel crates. Three library crates for the three supported proof systems, plus shared infrastructure for test vectors, firmware bench runs, and the deterministic-timing allocator experiments.

CrateRole
zkmcu-verifierno_std Groth16 verify on BN254, EIP-197 wire format. Backed by substrate-bn.
zkmcu-verifier-bls12no_std Groth16 verify on BLS12-381, EIP-2537 wire format. Backed by zkcrypto bls12_381.
zkmcu-verifier-starkno_std STARK verify on Goldilocks + Blake3, winterfell 0.13 wire format. Ships with a Fibonacci AIR reference; user AIRs wire in via the same Air trait winterfell uses.
zkmcu-vectorsno_std loader for committed binary test vectors via include_bytes!. All three proof systems, plus a real Semaphore vector.
zkmcu-bump-allocno_std bump-style GlobalAlloc with watermark save/restore. Used for the deterministic-timing benchmark runs; also publishable as a standalone primitive.
zkmcu-host-genHost-only CLI. Uses arkworks to generate Groth16 proofs on either curve (with cross-check via the second independent implementation), uses winter-prover for STARK proofs, imports the vendored Semaphore VK + externally-generated proofs.

The three verifier crates all have the same API shape. Groth16 verifiers share a five-function form (parse_vk, parse_proof, parse_public, verify, verify_bytes). STARK is a bit different because there’s no VK (the AIR is the invariant), but the rest looks the same, one verify(proof, public) -> Result<()> function.

No generic trait unifying them though. I kept the implementations independent on purpose, so you can read each crate top to bottom without chasing generic traits across six layers of abstraction.

CrateTargetCurve / System
bench-rp2350-m33ARM Cortex-M33, thumbv8m.main-none-eabihfBN254 Groth16
bench-rp2350-m33-bls12sameBLS12-381 Groth16
bench-rp2350-m33-starksameWinterfell STARK
bench-rp2350-rv32RISC-V Hazard3, riscv32imac-unknown-none-elfBN254 Groth16
bench-rp2350-rv32-bls12sameBLS12-381 Groth16
bench-rp2350-rv32-starksameWinterfell STARK

All six import the verifier + vectors crates. The crypto source is byte-for-byte identical between the M33 and RV32 variants of the same system. Only entry macros, linker scripts, and cycle-counter reads change. That’s the portability claim this whole layout is designed to prove out.

Given a verifying key (α, β, γ, δ, IC), a proof (A, B, C), and public inputs x[0..n]:

  1. Compute vk_x = IC[0] + Σ x[i] · IC[i+1] in G1
  2. Check that the product e(-A, B) · e(α, β) · e(vk_x, γ) · e(C, δ) equals the identity in the target group Gt

For BN254, the four pairings go through substrate_bn::pairing_batch, one multi_miller_loop + one final exponentiation shared across the four pairs.

For BLS12-381, same algebraic shape, through pairing::MultiMillerLoop::multi_miller_loop on zkcrypto’s bls12_381::Bls12.

Same algorithm, different backends. The API surface you write against in both cases is identical.

Different shape entirely. There is no VK in the Groth16 sense, the AIR definition (transition constraints, boundary assertions, trace width) is the verifier-side invariant. Given:

  • A winterfell::Proof (FRI-based, Merkle-committed, Blake3-hashed)
  • PublicInputs for the AIR (Fibonacci’s is just the claimed result)

Verify does:

  1. Parse the proof into winterfell’s structured form
  2. Verify the trace commitment Merkle tree
  3. Verify the constraint composition polynomial commitment
  4. Verify the FRI folding chain (13 layers at blowup 8, trace length 1024)
  5. Check the queries hit consistent values across all the above
  6. Final DEEP composition consistency check at the out-of-domain point

Winterfell wraps all of this behind winterfell::verify::<Air, HashFn, RandomCoin, VectorCommitment>(proof, pub_inputs, acceptable_options). zkmcu-verifier-stark is a thin wrapper that pins the hash (Blake3-256), vector commitment (binary Merkle tree), and minimum-security threshold (95-bit conjectured) for a specific AIR.

Every committed test vector gets at least two independent crypto stacks to agree on it before the bytes are trusted. If any of them disagree, the tooling aborts instead of writing broken vectors to disk:

BN254 path:

  1. Vector is generated by ark-groth16 0.5 on the host
  2. Natively verified by arkworks before being written to disk
  3. Re-verified on the embedded side by zkmcu-verifier (wich uses substrate-bn)

BLS12-381 path:

  1. Vector is generated by ark-groth16 on ark-bls12-381
  2. zkmcu-host-gen serialises to EIP-2537 bytes and immediately cross-checks by parsing those bytes with zkcrypto bls12_381 and running a hand-rolled Groth16 pairing check. If arkworks and zkcrypto disagree, the tool aborts before writing the .bin files.
  3. On the embedded side, zkmcu-verifier-bls12 re-verifies the same bytes.

Semaphore path (real-world BN254 vector):

  1. VK extracted from the vendored Semaphore 4.14.2 trusted setup
  2. Proof generated by snarkjs via @semaphore-protocol/proof under the production snark-artifacts
  3. Verified back through zkmcu-verifier on host, then again on-device through the firmware

STARK path:

  1. Vector is produced by winter-prover from a host-side Fibonacci trace, under fixed ProofOptions with FieldExtension::Quadratic
  2. zkmcu-host-gen runs zkmcu_verifier_stark::fibonacci::verify on the proof bytes with MinConjecturedSecurity(95) before writing them to disk. If the prover’s configured options don’t meet 95-bit, generation aborts, so the committed bytes are guaranteed to pass the production verifier.
  3. On the embedded side, zkmcu-verifier-stark re-verifies the same bytes.

If all the stacks agree, wire format, arithmetic, and security parameters are consistent across implementations. This is the main safety net against a silent encoding drift slipping into a committed vector.

  • Already no_std-compatible out of the box
  • Pure Rust, no C or assembly dependencies to cross-compile
  • Ethereum-compatible, used in production Ethereum client implementations
  • Smaller binary footprint than ark-bn254 for a single-curve verifier

Alternative I considered: ark-bn254. Works fine ofcourse, but it pulls in the whole arkworks generic-programming circus. Heavier compile times, bigger binary for a narrow verify-only use case. substrate-bn is the leaner pick, no regrets.

  • no_std-clean out of the gate with default-features = false, features = ["groups", "pairings", "alloc"]
  • All-zkcrypto dep closure (ff, group, pairing, subtle, rand_core). No getrandom, no std leakage
  • Stack-allocated G2Prepared keeps pairing-workspace heap usage down vs heap-allocated Fq12 polynomial workspaces
  • Scalar-oblivious G1 scalar mul, informally constant-time with respect to scalar Hamming weight (0.09 % variance across random scalars)

I did a dep-fit spike first, documented at research/notebook/2026-04-22-bls12-381-dep-fit.md in the repo. First build, zero warnings, zero patches needed to get it no_std-clean. Honestly amazing for a crypto crate.

  • no_std out of the gate with default-features = false. concurrent / rayon / async all gated behind non-default features that stay off
  • Builds clean on both thumbv8m.main-none-eabihf and riscv32imac-unknown-none-elf with zero patches. Dep-fit confirmed in research/notebook/2026-04-23-stark-prior-art.md
  • Separate winter-prover (std) and winter-verifier (no_std) halves let me pull only the verify side into firmware, so prover-only code stays out of the flash budget
  • Pure-Rust Blake3 fallback on both embedded targets, no SIMD cross-compile wrangling

Alternatives I looked at: ax-stark (too early, not no_std-ready), stone (C++ only, doesn’t help Rust), rolling my own (absolutely not for phase 3). Winterfell’s Fibonacci reference was the shortest path to a measurable first number so that’s the one I went with.

zkmcu-bump-alloc exists because of a specific finding documented on Deterministic timing. Winterfell’s verify path allocates ~400 Vecs internally, and a stock general-purpose allocator’s free-list evolution introduces enough timing jitter (~0.25 % on Cortex-M33, ~0.46 % on Hazard3) to obscure the underlying crypto’s own timing determinism. Wich is a problem if you care about side channels.

The bump allocator gives byte-identical allocator state between iterations (reset the watermark after every verify), wich brings variance down to the 0.08 % silicon noise floor. It’s three things at once:

  • a benchmark tool, confirms the crypto itself is deterministic
  • a diagnostic, isolates wich fraction of total variance is allocator-induced
  • a generally useful no_std primitive (atomic CAS bump pointer, in-place realloc on top of the bump, watermark save/restore, ~200 lines)

Not a production allocator ofcourse, no-op dealloc means memory leaks up to the watermark every iteration, wich pushes the STARK bench firmware’s heap to 384 KB. Production firmware should use embedded-alloc::TlsfHeap instead, that’s the same allocator that gives you deterministic timing at the 128 KB tier.

bindings/
├── crates/
│ ├── zkmcu-verifier/ ← no_std BN254 verify
│ ├── zkmcu-verifier-bls12/ ← no_std BLS12-381 verify
│ ├── zkmcu-verifier-stark/ ← no_std winterfell STARK verify
│ ├── zkmcu-vectors/ ← no_std test-vector loader (all systems)
│ ├── zkmcu-host-gen/ ← host-only arkworks + winter-prover CLI
│ ├── zkmcu-bump-alloc/ ← no_std bump allocator with watermark reset
│ ├── bench-rp2350-m33/ ← Cortex-M33 firmware, BN254
│ ├── bench-rp2350-m33-bls12/ ← Cortex-M33 firmware, BLS12-381
│ ├── bench-rp2350-m33-stark/ ← Cortex-M33 firmware, STARK
│ ├── bench-rp2350-rv32/ ← Hazard3 firmware, BN254
│ ├── bench-rp2350-rv32-bls12/ ← Hazard3 firmware, BLS12-381
│ └── bench-rp2350-rv32-stark/ ← Hazard3 firmware, STARK
├── benchmarks/runs/<date>-<slug>/ ← raw.log + result.toml per run
├── research/ ← Typst sources → PDFs (whitepaper, survey, reports)
├── scripts/gen-semaphore-proof/ ← Bun + TS tool that produces the real Semaphore proof
├── vendor/semaphore/ ← submodule with the upstream Semaphore source
└── web/ ← this site