zkVM Security: What Could Go Wrong?

Written by Suneal Gong on Nov 21, 2024

Introduction

A zkVM (Zero-Knowledge Virtual Machine) uses zero-knowledge proofs to prove and verify computations run in specific ISA (Instruction Set Architecture). Existing zkVMs (e.g., risc0, sp1, jolt, valida, zkm) allow developers to write programs in high-level languages like Rust or C++ without needing to worry about the complexities of ZKPs. It abstracts away the underlying cryptographic details, enabling developers to focus on their application logic and ship faster. Once a zkVM is secure, it provides “out-of-the-box” ZKP functionality to any program running on it, giving developers the benefit of zero-knowledge proofs without additional effort.

However, while zkVMs simplify development, they are still complex systems that integrate many technical components—from compilers to proof systems and on-chain verification. A single bug in any of these components can lead to catastrophic security failures. Understanding the tech stack behind zkVMs is crucial for builders to create secure systems and for users to evaluate the reliability and security of the applications they rely on.

In this blog post, we will walk through the typical zkVM workflow, highlight common vulnerabilities at each stage, and explore how bugs in these areas can compromise the security of zero-knowledge applications. By gaining a deeper understanding of the technologies involved, we can better safeguard against potential risks and build more robust zk-powered applications.

How does zkVM work

The figure below illustrates the general workflow of a zkVM, which we introduce step-by-step.

zkbugs

Compilation. The zkVM user writes programs in a high-level language (e.g., Rust, C, C++) and compiles it down to assembly code in a specific ISA (Instruction Set Architecture) that the zkVM is built for. For example, risc0, sp1, and Jolt use RISC-V, while zkm uses MIPS. As these ISAs have already gained broad use, compilation can reuse much of the existing compiler toolchains.

From a user’s perspective, the process doesn’t change much, as they can still use familiar tools and workflows, such as standard compilers. However, many libraries are not directly usable in zkVMs. This limitation is akin to writing programs for bare-metal devices without the support of an operating system, as many libraries depend on features like I/O operations that aren’t supported in the zkVM environment.

Execute. Given the assembly code and input, the VM executes the program to generate the execution trace. Execution essentially involves interpreting instructions until the program terminates or panics. The trace is a sequence of execution steps, where each step represents an instruction with operands and read/write actions to registers and memory.

VM Constraint and Prove. The core of a zkVM is to prove that the execution is valid. This means “given the assembly program and the input, I know a valid execution trace that generates this output.” To validate the trace, each step of the fetch-decode-execute cycle must be verified. This primarily involves proving these three points:

Instruction Execution: The instruction executes correctly. For example, if add 1 2 is run, the result should be 3.
Memory Consistency: The read/write operations in memory and registers are consistent. For example, if 1 is written to the ra register, the next read from that register should yield 1.
Control Flow: For example, after executing the current instruction, the next instruction should be fetched correctly.

To improve prover speed, some zkVMs split the trace into smaller segments, proving each segment in parallel and then combining them to form a complete proof.

Verify. The verifier receives the proof, which includes the commitment to the trace and the program input/output data. It essentially checks the three points above and confirms whether the execution terminated or panicked as expected.

To reduce the verifier’s cost, recursion is often used. For example, risc0 uses a STARK-to-SNARK circuit to verify the STARK proof within a Groth16 verifier, significantly reducing the gas required for verification on Ethereum.

What can go wrong in zkVM

Here, we’ll look at potential vulnerabilities at each workflow stage in zkVMs, from compilation to verification.

Compilation

Targeting existing ISAs allows zkVMs to benefit from mature compiler toolchains, but compiler bugs remain a non-negligible risk. If the compiled assembly code does not reflect the intended source code logic, this can result in catastrophic security issues. According to recent estimates, “every month, there should be at least a dozen new known vulnerabilities likely to lead to an exploit in a formally verified RISC-V ZKVM.” This highlights the possible issues inherent in relying on compiler correctness.

That said, this risk is not exclusive to zkVMs—compiler bugs can affect all types of applications. It’s also understudied how often these issues lead to exploits specifically targeting zkVM systems. While the risks remain theoretical in many cases, ensuring compiler reliability is still essential for building secure zkVMs.

Beyond standard compilation, zkVMs often involve custom preprocessing, such as specialized libraries and precompiles, to optimize performance:

Instruction Replacement: For instance, Jolt replaces instructions like DIV with a sequence of virtual instructions that are more suitable for lookup operations.
Precompiles: Precompile is a custom circuit to accelerate specific operation. zkVMs like risc0 and sp1 support precompiles for some frequently used functions in crypto, such as sha256, to enhance performance.
Custom Libraries: Some zkVMs provide optimized custom libraries for the high-level language. For example, sp1 implemented custom version of memcpy and memset to increase efficiency.

These preprocessing steps, essential to zkVM functionality, introduce complexity. It’s crucial that each step is secure and that the original program logic is preserved accurately to avoid introducing vulnerabilities.

Execute

The execution phase of a zkVM should strictly adhere to the specifications of the underlying VM, ensuring consistent behavior. Given the same assembly code and input, the execution trace and output must be identical for both the prover and the verifier. Some programs may rely on randomness provided by the operating system, such as hashmaps, which are often randomized to prevent DoS attacks. However, any form of randomness used within the program needs to be handled in a predetermined, deterministic way, ensuring consistency. For instance, in sp1, sys_rand is used to generate deterministic randomness, preventing any discrepancies between prover and verifer.

VM Constraint and Prove

In a zkVM, users are not writing ZK circuits anymore. This nicely let developers dodge common ZK circuit pitfalls by design: the ones that arise from mixing out-of-circuit and in-circuit code and misunderstanding how to properly constrain witnessed/hinted data. That being said, the complexity and the circuitry is still there! It is just moved away in a nicely built abstraction: the circuitry of the VM itself. This means ZK circuit bugs are moved to that layer, and they still can exist and need to be found. Just like other zero-knowledge proof projects, under-constraining any part of a zkVM circuit can lead to severe security vulnerabilities.

Usually, each instruction in the VM needs its own circuit. In RISC-V, for instance, dozens of instructions must each be constrained carefully to prevent errors. Missing even one constraint can allow the prover to generate an incorrect trace. For example, a missing constraint in a single instruction allowed an incorrect trace to pass verification. In Jolt, it uses a lookup table for instruction handling, which can reduce the number of circuits it needs to define.

For zkVMs with precompiled functions, the circuits must also cover these precompiles. Operations like sha256 need custom proof circuits to ensure they function correctly. An audit of RISC Zero’s circuits shows how precompiled functions require extra attention to avoid errors.

Consistency in memory and register accesses is also essential. Each time a memory address is read, the circuit must check that the value matches what was last written to that address. Missing this check, as shown in this audit, can let the prover use incorrect values in memory.

Finally, the circuit must mirror the VM’s handling of specific operations. One example is RISC-V’s handling of division-by-zero, which doesn’t result in a panic but instead returns a specified value depending on the operands. Failing to implement this behavior correctly could allow discrepancies between the trace and expected behavior.

Verify

The verifier verifies the proof against its verification key. The verification key is specific to the program, which contains commitments to the assembly code and configuration data that is independent from executions. The proof data it receives usually includes the program’s public inputs, outputs, and zkp proof. It’s important not to mix these elements up. For example, if a verifier takes the memory layout directly from proof data, as shown in this Jolt issue we reported, a malicious prover could alter the memory layout to overwrite the output.

The verifier should also ensure that the program has terminated correctly. Without this check, a prover could submit a partial trace and still pass verification, potentially resulting in incorrect outputs. An example of this risk appears in another Jolt issue, where a malicious prover can forge a proof for a truncated trace with incorrect output.

Summary

zkVMs offer a promising way to enable secure, zero-knowledge proofs for computations across various platforms. However, their complexity arises from the integration of multiple components, including compilers, instruction set architectures, zero-knowledge proofs, and verifiers. Ensuring the security of each part is vital to maintaining the overall integrity of the system.

As zkVMs continue to evolve, it is important to focus not only on performance but also on security. Each component, from compilation to verification, must be rigorously examined to prevent vulnerabilities. With the rapidly developing nature of zkVM technology, staying vigilant about potential risks and flaws will be essential for creating robust and secure zero-knowledge applications. As the space grows, we will continue to keep an eye on its developments and ensure that security remains a top priority.