Semaphore on a microcontroller

Semaphore v4.14.2snarkjs production setup762 ms on Cortex-M33variance 0.030 %

Synthetic x² = y benchmarks are fine for wiring things up, but they don’t answer the question that actually matters: can this verify the proofs people use in production? So I took a real Semaphore v4.14.2 Groth16 proof (Merkle tree depth 10, 4 public inputs), produced by snarkjs under the CDN-hosted production trusted setup, and ran it through zkmcu-verifier on a $7 Pi Pico 2 W.

762 ms on Cortex-M33, 1,634 ms on Hazard3 RV32, every iteration returning ok=true, iteration-to-iteration variance 0.030 %. That’s the tightest measurement in the whole project. Same proof bytes that Ethereum’s Semaphore precompile accepts at the other end of the wire.

What “real” means here

The proof that lands on the MCU traces through this pipeline:

VK: extracted from vendor/semaphore/packages/proof/src/verification-keys.json at depth 10. The actual trusted-setup artifact the Semaphore team shipped for v4.14.2.
Prover side: @semaphore-protocol/proof 4.14.2 → snarkjs → production snark-artifacts (tag 4.13.0) fetched from the Semaphore CDN. Same code paths the JavaScript SDK uses when generating a Semaphore proof for on-chain submission.
Wire format: snarkjs emits the proof in Ethereum’s EIP-197 byte order. zkmcu-host-gen converts from snarkjs JSON to the .bin files that land in crates/zkmcu-vectors/data/semaphore-depth-10/.
On-device verify: zkmcu-verifier parses those bytes and runs the Groth16 pairing check with substrate-bn.

The numbers

	Cortex-M33	Hazard3 RV32
Verify time (median, N=5-6)	762 ms	1,634 ms
Iteration-to-iteration variance	0.030 %	0.030 %
Predicted ahead of measurement	~1,160 ms	~1,620 ms
Measured Δ vs prediction	+1.4 %	−3.5 %
All iterations return `ok=true`	yes	yes

Both predictions inside noise on first measurement. Closest prediction-vs-reality match in the project’s phase-2 arc, honestly I was a bit surprised. Full raw data: benchmarks/runs/2026-04-22-m33-semaphore-depth10/ and -rv32-semaphore-depth10/.

Full analysis in the prediction report: 2026-04-22-semaphore-baseline.typ.

What this unlocks

The Semaphore verifier call that runs on Ethereum’s precompile during every anonymous group message now runs unmodified on a $7 MCU. Practical applications that open up once the verify is on-device:

Hardware wallets that verify before signing

Today your phone runs the Semaphore verify and your Ledger / Trezor just signs the resulting transaction. A compromised phone can trick the hw wallet into authorising a bogus Semaphore action. zkmcu lets the hw wallet’s secure element run the verify itself, so the signing step can refuse if the proof doesn’t actually hold up.

Offline Semaphore gates

Turnstile, door lock, voting booth that accepts a Semaphore-style “I’m in this group without revealing which member” proof without any server call. Hardware + zkmcu firmware + a Semaphore VK baked in at provisioning. That’s the whole stack.

Peer-to-peer private vouchers

Person A hands Person B a ZK-proven payment note or access credential. Person B’s device verifies locally. No on-chain step needed for the verification side of the exchange (settlement can happen separately).

Mid-transit attestation

IoT devices forwarding SNARK-attested sensor readings or identity claims downstream, with each hop verifying the previous hop’s ZK signature on-MCU rather than trusting the network.

Reproducing end-to-end

# One-time: pull the submodule and generate a fresh proof.
git submodule update --init
cd scripts/gen-semaphore-proof
bun install                        # fetches the Semaphore npm packages
bun run gen                        # writes proof.json, deterministic under the hardcoded seed

# Convert snarkjs JSON to EIP-197 bytes and commit them.
cd ../..
cargo run -p zkmcu-host-gen --release -- semaphore \
    --depth 10 \
    --proof scripts/gen-semaphore-proof/proof.json

# Verify on the host (parse_vk + parse_proof + parse_public + verify).
cargo test -p zkmcu-verifier --release --test parse_semaphore

# Flash + bench on hardware.
cargo build -p bench-rp2350-m33 --release
scp target/thumbv8m.main-none-eabihf/release/bench-rp2350-m33 \
    pid-admin@10.42.0.30:/tmp/bench-m33.elf
# On the Pi 5 with the Pico in BOOTSEL:
picotool load -v -x -t elf /tmp/bench-m33.elf
cat /dev/ttyACM0

The hardcoded inputs in gen.ts (identity seed, message, scope, tree depth) make the whole pipeline byte-deterministic. Rerunning produces the same .bin files, verifiable against the committed SHA-256 on the repo.

What this does not claim

Not every Semaphore-shaped circuit will verify in exactly 762 ms. Tree depth affects witness size, not VK or public-input count, so a firmware implementing pairing_batch the same way lands on the same timing profile. Different depth changes the proving cost on the host, not the verify cost on the MCU.
Not a constant-time implementation. Verify duration varies observably with public-input Hamming weight (see benchmarks) because substrate-bn uses a sliding-window NAF. Acceptable for verify-only threat models where proof + public inputs are already public, not acceptable if secret data flows into the verify path.
Not a performance lower bound. The hand-written ARMv8-M UMAAL Montgomery reduction in vendor/bn/src/arith.rs already covers most of the gap (988 ms → 551 ms square baseline = 1.79× on the simple circuit). Further hand-asm on the squaring path or Karatsuba on the BN254 Fp2 / Fp12 layers would squeeze more, but a 2-3× claim is uncalibrated. That’s future work.

See the Security page for the full threat model.