Skip to content

zkmcu

no_std Rust ZK verifiers for microcontrollers. Current focus: a post-quantum Semaphore-style identity proof, verified on a $7 Pico 2 W in 1.6 s under dual-hash (Poseidon2 + Blake3) STARK composition. The earlier BN254 / BLS12 / winterfell-STARK family still ships under 128 KB SRAM. Of course nobody had published this before, so I had to find out myself.
no_std Rustpost-quantum, dual-hash128 KB SRAM tier

Yeah so the API is pretty much three crates, one function each. Pull in whichever one you need, hand it bytes, check the bool.

use zkmcu_verifier::verify_bytes; // BN254 / EIP-197
// or
use zkmcu_verifier_bls12::verify_bytes; // BLS12-381 / EIP-2537
// or
use zkmcu_verifier_stark::fibonacci::verify; // winterfell STARK, Fibonacci AIR
let ok = verify_bytes(&vk_bytes, &proof_bytes, &public_bytes)?;

All three: no_std, pure Rust, one function call. Every number on this page comes from a real bench on a real RP2350, not a simulator.

1,611 ms
Dual-hash PQ-Semaphore depth-10 verify, Cortex-M33

Two independent FRI proofs over the same statement: Poseidon2-BabyBear-16 (audited round constants) and Blake3 1.8. Both must accept. Soundness composes across the two hash families, so a cryptanalytic surprise on either doesn’t collapse the verifier. Hazard3 RV32 lands at 2,042 ms on the same die.

762 ms BN254 Semaphore

Classical baseline. Real Ethereum-Semaphore v4 depth-10 Groth16 proof, the same VK that the 0x08 precompile accepts. Verifies on the same $7 MCU. More

49 ms STARK prove

First STARK prover on bare metal. Threshold-check circuit, prove value < threshold, N=64, BabyBear+Quartic, 64 KB heap peak. FibRace needs 3 GB for the same predicate. More

73 ms STARK verify

Deterministic to silicon noise floor. Winterfell Fibonacci AIR over Goldilocks, quadratic extension, TlsfHeap allocator. Variance sits at 0.08 %. More

2,015 ms BLS12-381

Sibling Groth16 verifier on the same $7 silicon. First public no_std BLS12-381 verifier on Cortex-M, as far as I can find. If anyone knows earlier work, tell me and I’ll update.

Hardware wallets

ZK verify on the secure element itself, so a compromised phone can’t feed the wallet a bogus Semaphore or Tornado action and get rubber-stamped. Under TlsfHeap the timing is deterministic to silicon noise floor, so side-channel resistance is mostly free.

Offline credentials

Transit turnstiles, festivals, borders. Verify a privacy-preserving credential with no network, no server round-trip. Groth16 and STARK both work, pick whichever your issuer uses.

IoT attestation

Small devices checking each other’s SNARK-attested provenance without a cloud relay in the middle. At 75 ms per STARK, you can actually verify per-packet without falling over.

Cross-chain / zkVM proofs

BLS12-381 sync-committee proofs, Filecoin PoSt, Aleo, winterfell / Miden VM traces. If it ships on BN254, BLS12-381, or Goldilocks STARK, the verifier fits on your MCU.

Same repo shape

Three crates, one per proof system, all with the same API shape. Same test-vector loader, same firmware bench template. Write against zkmcu_verifier::verify_bytes today, swap to BLS12 or STARK tomorrow without rewriting your embedded code.

Same 128 KB silicon tier

BN254 Groth16 uses 97 KB RAM, BLS12-381 uses 116 KB, STARK uses 100 KB. All measured on the same RP2350 Cortex-M33 at 150 MHz. Any hardware-wallet-class chip (nRF52832, STM32F405, Ledger ST33, Infineon SLE78) runs any of the three.

Deterministic timing

With the production TlsfHeap allocator, STARK verify variance hits 0.08 %, which is the silicon noise floor. That’s timing-side-channel resistance without writing constant-time code by hand, which is nice because constant-time crypto is a pain in the ass to review. How

Two ISAs, one source

ARM Cortex-M33 and RISC-V Hazard3 running the exact same crypto source, same chip. Only the linker script and the cycle-counter read differ. Measured cross-ISA ratios: 1.21× for STARK, 1.33× for BN254 Groth16, 2.56× for BLS12-381. M33 wins every time, but the gap depends wildly on workload.

PQ-Semaphore writeup

The six-phase 128-bit security build. Each phase tested one knob, predicted the cost, then either confirmed or rejected the hypothesis on real silicon. One phase rejected its own hypothesis; the final phase landed exactly on its predicted floor. Read it

Semaphore real-world proof

Production Ethereum-Semaphore VK, snarkjs-generated proof, verified on a $7 MCU in 1.18 s. No simplifications. See it

Deterministic timing

How picking the right allocator knocked timing variance from 0.45 % to 0.08 %. Matters a lot if you care about side channels. The methodology

Benchmarks

Numbers straight from the silicon. Four verifier families, both cores, three allocators. Full data