Twist and Shout: Fast, Simple Memory Arguments

Name: Twist and Shout: Fast, Simple Memory Arguments
Uploaded: 2026-05-09T14:28:42.994Z
Description: This seminar introduces Twist and Shout, cryptographic protocols designed to accelerate Snark provers, which are currently a limiting factor in Snark applicability. A Snark (Succinct Non-interactive A

a16z crypto·May 9, 2026

AI Summary

This seminar introduces Twist and Shout, cryptographic protocols designed to accelerate Snark provers, which are currently a limiting factor in Snark applicability. A Snark (Succinct Non-interactive ARgument of Knowledge) allows an untrusted prover to demonstrate knowledge of a witness satisfying a property without revealing the witness itself. Snarks are typically used in a two-step process: converting a high-level computer program into a circuit (front-end) and then applying a proof system to the circuit (back-end). While Snark verifiers are generally efficient, provers often perform a million times more work than native execution, although this overhead is highly parallelizable. The goal of Twist and Shout is to significantly reduce this prover overhead, aiming for a 10,000 to 100,000-fold overhead rather than a million-fold. Snark back-ends are designed in three stages: a polynomial IOP (Interactive Oracle Proof), a polynomial commitment scheme, and finally, applying the Fiat-Shamir transformation to make the interactive argument non-interactive. A polynomial IOP involves the prover sending a large polynomial encoding the witness, which the verifier can only query at specific points. There are two classes of polynomial IOPs: univariate, which are good for verifier costs, and multivariate (multilinear), which are better for prover costs. This talk focuses on multivariate polynomials due to the emphasis on prover speed. While multivariate polynomials have higher verifier costs than univariate ones, they are still relatively low (handful to dozens of kilobytes), making them suitable for many applications, especially those not requiring sub-kilobyte proofs or a trusted setup. A polynomial commitment scheme allows the prover to cryptographically commit to a polynomial, typically a small 32-byte commitment, such that the prover cannot later change its mind about the polynomial. Later, the prover can provide an evaluation of the committed polynomial at a specific point and prove its correctness. Different commitment schemes exist, using elliptic curve groups, cryptographic hash functions, or lattices, each with trade-offs in prover time, proof size, verifier time, and cryptographic security basis. The key technical takeaways of this work are centered on the Sumcheck protocol and exploiting repeated structure in computations. The Sumcheck protocol, a result from 1990, is crucial for designing fast provers, especially with multivariate polynomials and extensive interaction in the polynomial IOP. However, not all uses of Sumcheck are equally effective. To achieve the simplest and fastest Snarks, Sumcheck should be used in a way that leverages repeated structure in the statement being proven. The intuition is that proving a simple procedure run many times is significantly easier than proving an unstructured procedure of the same size. To exploit this, the approach advocates embracing "sparse polynomials," which have a high degree but mostly zero coefficients. The goal is to ensure the prover never pays for these zeros, neither in the commitment scheme nor in the polynomial IOP messages. Elliptic curve-based commitment schemes are best for committing to many zeros, and Sumcheck-based polynomial IOPs are best for avoiding payment for zeros. This contrasts with previous approaches that tried to eliminate zeros by reducing sparse problems to dense ones. Shout is a batch evaluation argument. It addresses scenarios where a prover claims to have correctly evaluated a function `F` (taking inputs `x` and `y`) `T` times for `T` different input triples. A common example is proving `F` is a primitive instruction of a simple CPU. The traditional approach would involve creating a circuit for `F` and then `T` copies of that circuit, which is expensive. The fastest Snarks for circuit evaluation still require about six field multiplications per gate for the prover. Shout aims to do better for large batch sizes (thousands to billions). It introduces a fixed cost, independent of the batch size, which is amortized over the entire batch, leading to a lower per-evaluation cost. This fixed cost can be a small power of `K`, where `K` is the size of the input space of `F`. For instance, in the Jolt ZKVM, `K` could be 2^64, and the fixed cost might be 2^16, which is small compared to a billion evaluations. This approach is beneficial when `F` has specific structure, as is the case for primitive CPU instructions. The per-evaluation cost using Shout can be equivalent to applying circuit-based provers to a 10-gate circuit, meaning it's more efficient for operations that are not solely field multiplications. The advantage of Shout over circuit-based methods comes from two factors: an economy of scale where a fixed cost enables lower per-evaluation costs, and the fact that the Shout prover is not constrained to proving *how* it evaluated `F` (e.g., gate by gate through a circuit), only that the evaluation is correct. This gives the prover more freedom, leading to faster proofs. Batch evaluation arguments can also be viewed as "lookup arguments," where evaluations of `F` are seen as lookups into a read-only memory storing `F`'s evaluation table. The fixed cost then relates to the memory size, and there's a per-read cost. Twist generalizes Shout to read-write memory. In a CPU context, this means proving that values returned by read operations are correct, i.e., they reflect the most recently written value to that memory cell. Twist achieves this with a prover that is roughly three times faster than previous arguments for read-write memory while having significantly shorter proofs. Compared to prior work with similar proof sizes, Twist can be over an order of magnitude faster. Both Twist and Shout also offer a major benefit in "streaming prover" capabilities, allowing prover memory usage to be bounded more efficiently than traditional methods that rely on recursive proof aggregation. Twist and Shout are particularly relevant for ZKVMs (Zero-Knowledge Virtual Machines), which allow a prover to demonstrate correct execution of a computer program cycle by cycle. ZKVMs mirror CPU operation (fetch, decode, execute, memory reads/writes). Jolt is a ZKVM specifically designed around the Sumcheck protocol and batch evaluation arguments (Shout for read-only memory, Twist for read-write memory) rather than circuits. This design makes Jolt simpler and faster than other ZKVMs. It leverages the inherently repeated structure of VM execution (fetch-decode-execute loops, repeated memory operations). Jolt focuses on the Risk-V CPU architecture because it's an open-source, relatively simple architecture that, by accident, is friendly to Snark proving. Its primitive instructions are directly amenable to Shout. The existing tooling for Risk-V is also a significant advantage. Proving one Ethereum block currently requires about a billion Risk-V cycles. In Jolt, RAM and registers are handled by Twist, while program bytecode lookups and primitive instruction execution are handled by Shout. A small circuit component (R1CS) acts as "glue" between these components, ensuring consistency and correctness. This circuit, with about 26 constraints per Risk-V cycle, also exploits repeated structure for a 2x speedup. A key simplicity benefit is that the verifier only needs to know how to evaluate the multilinear extension of each primitive instruction, rather than needing to verify a complex circuit for each. The implementation of Twist and Shout within Jolt is nearing completion. Proofs are expected to be around 50 kilobytes. The prover is anticipated to achieve about a million Risk-V cycles per second on a high-end, parallelized laptop for large computations (100 million cycles and up). The total prover work for Jolt is predicted to be about 40,000 times the native execution, a significant reduction from the typical million-fold overhead. This prediction is supported by both empirical observation (comparing CPU clock speed to cycles proven per second) and analytical calculation (approximately 500 field multiplications per cycle, with each field multiplication taking about 80 CPU cycles). This overhead is relatively workload-independent. An interesting point is that Jolt, a ZKVM without extensive circuits, is expected to be faster than some popular Snarks for circuits, even for hand-optimized circuits. This is because the repeated structure in VM execution, leveraged by Jolt, more than offsets the overhead that comes from imposing this structure. While circuit-based Snarks force the prover to follow a prescribed path (gate by gate), Jolt's batch evaluation arguments give the prover more freedom, leading to greater efficiency. Regarding prover memory, current Snarks often break large computations into smaller pieces, prove each piece, and then recursively aggregate these proofs. This "proofs of proofs" approach is complicated and introduces significant overhead. Twist and Shout, by exploiting sparse polynomials and fast proving techniques, enable bounding prover memory usage without needing this recursive aggregation, thus enhancing simplicity and security. The underlying mechanism for handling sparse polynomials efficiently with elliptic curve-based commitment schemes is that commitments to zero coefficients are "free" because they don't change the group product. The Sumcheck protocol further ensures that the prover only pays for non-zero terms when summing evaluations of a polynomial, effectively ignoring zeros. Shout's design is inspired by an old interactive proof for counting triangles. Both protocols involve repeated inner products. In Shout, read addresses are committed in a "one-hot" format, meaning a huge but sparse vector. The sumcheck protocol is used to "virtualize" the read values, meaning the verifier can obtain an evaluation of the read value polynomial by invoking sumcheck, which reduces it to evaluating polynomials it *does* have access to. While the `RA` (read address) polynomial is huge (e.g., 2^94 for a billion reads into 2^64 memory), its sparsity allows for fast commitment and sumcheck computation, as only non-zero entries are processed. A separate sumcheck also verifies that `RA` is indeed one-hot. In summary, the key insight is to embrace sparsity: the Sumcheck protocol ensures the prover never pays for zeros in computation, and specific commitment schemes ensure the prover never pays for zeros in commitment. This approach makes handling large but sparse structures like `RA` efficient, simplifying Snark design and significantly accelerating prover times, especially for computations with highly repeated structure like VM execution.