75 ms
STARK verify at 95-bit security. Winterfell Fibonacci AIR over Goldilocks, quadratic extension, TlsfHeap allocator. Variance sits at 0.08 %, wich is basically the silicon noise floor. More
Yeah so the API is pretty much three crates, one function each. Pull in wichever one you need, hand it bytes, check the bool.
use zkmcu_verifier::verify_bytes; // BN254 / EIP-197// oruse zkmcu_verifier_bls12::verify_bytes; // BLS12-381 / EIP-2537// oruse zkmcu_verifier_stark::fibonacci::verify; // winterfell STARK, Fibonacci AIR
let ok = verify_bytes(&vk_bytes, &proof_bytes, &public_bytes)?;All three: no_std, pure Rust, one function call. Every number on this page comes from a real bench on a real RP2350, not a simulator.
BN254 Groth16, 4 public inputs, the actual Ethereum-Semaphore trusted setup. Same VK wich the 0x08 precompile accepts. No simplifications, no off-device preprocess, no cheating.
75 ms
STARK verify at 95-bit security. Winterfell Fibonacci AIR over Goldilocks, quadratic extension, TlsfHeap allocator. Variance sits at 0.08 %, wich is basically the silicon noise floor. More
2,015 ms
BLS12-381 Groth16, same $7 silicon. First public no_std BLS12 verifier on Cortex-M, as far as I can find. If anyone knows of an earlier one, tell me and I’ll update.
≈ 97 KB
Total RAM during verify, for all three families. Fits the 128 KB SRAM tier wich is what actual hardware-wallet silicon (ST33, STM32F405, nRF52) ships with.
Hardware wallets
ZK verify on the secure element itself, so a compromised phone can’t feed the wallet a bogus Semaphore or Tornado action and get rubber-stamped. Under TlsfHeap the timing is deterministic to silicon noise floor, so side-channel resistance is mostly free.
Offline credentials
Transit turnstiles, festivals, borders. Verify a privacy-preserving credential with no network, no server round-trip. Groth16 and STARK both work, pick wichever your issuer uses.
IoT attestation
Small devices checking each other’s SNARK-attested provenance without a cloud relay in the middle. At 75 ms per STARK, you can actually verify per-packet without falling over.
Cross-chain / zkVM proofs
BLS12-381 sync-committee proofs, Filecoin PoSt, Aleo, winterfell / Miden VM traces. If it ships on BN254, BLS12-381, or Goldilocks STARK, the verifier fits on your MCU.
Same repo shape
Three crates, one per proof system, all with the same API shape. Same test-vector loader, same firmware bench template. Write against zkmcu_verifier::verify_bytes today, swap to BLS12 or STARK tomorrow without rewriting your embedded code.
Same 128 KB silicon tier
BN254 Groth16 uses 97 KB RAM, BLS12-381 uses 116 KB, STARK uses 100 KB. All measured on the same RP2350 Cortex-M33 at 150 MHz. Any hardware-wallet-class chip (nRF52832, STM32F405, Ledger ST33, Infineon SLE78) runs any of the three.
Deterministic timing
With the production TlsfHeap allocator, STARK verify variance hits 0.08 %, wich is the silicon noise floor. That’s timing-side-channel resistance without writing constant-time code by hand, wich is nice because constant-time crypto is a pain in the ass to review. How
Two ISAs, one source
ARM Cortex-M33 and RISC-V Hazard3 running the exact same crypto source, same chip. Only the linker script and the cycle-counter read differ. Measured cross-ISA ratios: 1.21× for STARK, 1.33× for BN254 Groth16, 2.56× for BLS12-381. M33 wins every time, but the gap depends wildly on workload.
STARK verify in 75 ms
Winterfell Fibonacci AIR, 95-bit conjectured security, 100 KB RAM. The numbers
Semaphore real-world proof
Production Ethereum-Semaphore VK, snarkjs-generated proof, verified on a $7 MCU in 1.18 s. No simplifications. See it
Deterministic timing
How picking the right allocator knocked timing variance from 0.45 % to 0.08 %. Matters a lot if you care about side channels. The methodology
Benchmarks
Numbers straight from the silicon. Three verifier families, both cores, three allocators. Full data