curdleproofs 0.0.1

An implementation of the Curdleproofs shuffle zero-knowledge argument
Documentation
# Curdleproofs optimizations

Curdleproofs is optimized for quick verification. This section goes into more details on the optimizations deployed.

## MSM Accumulator

Throughout the protocol there are checks of the form
\\[C  \\stackrel{?}{=}  \\bm{x} \\times ( \\bm{g} \\ || \\ \\bm{h} \\ || \\ G_T \\ || \\ G_U \\ || \\ H \\ || \\ \\bm{R} \\ || \\ \\bm{S} \\ || \\ \\bm{T} \\ || \\ \\bm{U}) \\]
These checks form the bottleneck of the verifiers computation and we can save significant amounts of work by accumulating these checks into a single multiscalar multiplication that is checked at the end of the protocol.

We implement the optimization in the [`crate::msm_accumulator`] crate.

## IPA Verification Scalars

We use an optimization from [Bulletproofs](https://eprint.iacr.org/2017/1066.pdf) to reduce the verifier overhead in
inner product arguments that is also used in the [Dalek
implementation](https://doc-internal.dalek.rs/bulletproofs/inner_product_proof/index.html). We will demonstrate the
optimization for the $SameMsm$ argument, but same reasoning applies for the $DLinner$ argument as well.

The verifier computes only three checks in the entire $SameMsm$ argument: namely in Step 3 it checks that $A = x G_1$,
$Z_T = x T_1$, and $Z_U = x U_1$.  This means that although the prover needs to compute the intermediate vectors
$\bm{G}, \bm{T}, \bm{U}$ at each step in order to compute the $\pi_j$ values, the verifier does not and it can compute
the final $A, Z_T, Z_U, G_1, T_1, U_1$ directly from the initial $A, Z_T, Z_U, \bm{G}, \bm{T}, \bm{U}$ and the $B_A,
B_T, B_U, \bm{\gamma}$ values.

Using a simple example where the starting $|\bm{G}| = 8$, we see that $\bm{G}$ gets changed as follows:
\\[
\\begin{array}{c c c c c c c c c c c c c c c c c}
 & \\bm{G} = & (G_1'     & || &  G_2' & ||  & G_3'  & ||  & G_4'  & ||  & G_5'  & ||  & G_6'  & ||  & G_7'  & ||   & G_8') \\\\
 & \\bm{G} = & ( G_1'  & + &  \\gamma_1 G_5'  & ||  & G_2' &  + & \\gamma_1 G_6'  & ||  & G_3' & + &  \\gamma_1 G_7'  & ||  & G_4' &   + & \\gamma_1 G_8' )
	\\\\
 & \\bm{G} = & ( G_1'  & + &   \\gamma_2  G_3'  & +  & \\gamma_1 G_5'&  + & \\gamma_1 \\gamma_2 G_7'  & ||  & G_2' & + &  \\gamma_2 G_4'  & +  & \\gamma_1 G_6' &   + & \\gamma_1 \\gamma_2 G_8')
	\\\\
 & \\bm{G} = & ( G_1'  & + &   \\gamma_3  G_2'  & +  & \\gamma_2  G_3' &  + &  \\gamma_2 \\gamma_3 G_4'  & +  & \\gamma_1 G_5' & + &   \\gamma_1 \\gamma_3 G_6'  & +  &  \\gamma_1 \\gamma_2 G_7' &   + & \\gamma_1 \\gamma_2 \\gamma_3 G_8')
\\end{array}
\\]
such that the final $G_1$ value is equal to
\\[
( 1, \\ \\gamma_3, \\ \\gamma_2, \\ \\gamma_{2} \\gamma_3,  \\ \\gamma_1, \\ \\gamma_1 \\gamma_3, \\ \\gamma_1 \\gamma_2, \\ \\gamma_1 \\gamma_2 \\gamma_3 ) \\times \\bm{G}
\\]
If we set $\\bm{\\delta} = (\\gamma_m, \\ldots, \\gamma_1)$ to be the reverse of $\\bm{\\gamma}$ then we see a useful structure
\\[
G_1 = ( 1, \\ \\delta_1, \\ \\delta_2, \\ \\delta_{1} \\delta_2,  \\ \\delta_3, \\ \\delta_1 \\delta_3, \\ \\delta_2 \\delta_3, \\ \\delta_1 \\delta_2 \\delta_3 ) \\times \\bm{G}
\\]
namely that
$G_1 = \\bm{s} \\times \\bm{G}$ where
$s_i = \\sum_{ j = 1 }^{m} \\delta_j^{b_{i,j}}$ for $b_{i,j}$ such that $i = \\sum_{j = 1}^m b_{i,j} 2^j$ is the binary decomposition of $i$.

## Grandproduct Verifier Optimizations

The non-optimized grandproduct verifier is required to compute a vector $\\bm{G}' = \\bm{u} \\circ \\bm{G}$
for some public vector $\\bm{u}$.
Then $\\bm{G}'$ is used as input to the $DLInner$ common reference string.
Computing $\\bm{G}' = \\bm{u} \\circ \\bm{G}$ would cost $n$ scalar multiplications that cannot be accumulated efficiently.
However, when we look into how the vector $\\bm{G}'$ is used in $DLInner$, it is used only once during
\\[
AccumulateCheck( \\bm{\\gamma} \\times \\bm{L}\_{D} +  (B\_D + \\alpha D) + \\bm{\\gamma}^{-1} \\times \\bm{R}\_{D}  \\stackrel{?}{=} d \\bm{s}' \\times \\bm{G}' )
\\]
This check is equivalent to accumulating the check
\\[
AccumulateCheck( \\bm{\\gamma} \\times \\bm{L}\_{D} +  (B\_D + \\alpha D) + \\bm{\\gamma}^{-1} \\times \\bm{R}\_{D}  \\stackrel{?}{=} (d \\bm{s} \\circ \\bm{u} ) \\times \\bm{G} )
\\]
We thus edit the $DLInner$ verifier to only take the original generators as input in $crs_{DLInner} = (\\bm{G}, H)$, however to take $\\bm{u}$ one of the public inputs $\\phi_{DLInner} = (C, D, z, \\bm{u})$.
The accumulated check can then be run efficiently.

A second optimization we run is that the non-optimized grandproduct verifier is required to compute a group element
\\[
D \\gets B  -  (1, \\beta, \\ldots, \\beta^{\\ell - 1} ) \\times \\bm{g}' + \\alpha \\beta^{\\ell + 1}  \\bm{1}  \\times \\bm{h}'
\\]
for 
\\[
\\bm{g}' \\gets ( (\\beta^{-1} g_2 , \\beta^{-2} g_3, \\ldots, \\beta^{-(\\ell-1)} g_{\\ell} ) \\ || \\ \\beta^{-\\ell} g_1 ) \\ \\land \\ \\bm{h}' \\gets \\beta^{-(\\ell + 1)} \\bm{h}
\\]
Expanding we see that
\\[
\\begin{align*}
(1, \\beta, \\ldots, \\beta^{\\ell - 1} ) \\times \\bm{g}' & = (1, \\beta, \\ldots, \\beta^{\\ell - 1} ) \\times ( (\\beta^{-1} g_2 , \\beta^{-2} g_3, \\ldots, \\beta^{-(\\ell-1)} g_{\\ell} ) \\ || \\ \\beta^{-\\ell} g_1 ) \\\\
& = ( (\\beta^{-1} g_2 , \\beta^{-1} g_3, \\ldots, \\beta^{-1} g_{\\ell} ) \\ || \\ \\beta^{-1} g_1 )  \\\\
& = \\beta^{-1} \\sum_{i=1}^{\\ell} g_{i}
\\end{align*}
\\]
Similarly
\\[
\\begin{align*}
	\\alpha \\beta^{\\ell + 1}  \\bm{1}  \\times \\bm{h}' & = 	\\alpha \\beta^{\\ell + 1}  \\bm{1}  \\times \\beta^{-(\\ell + 1)} \\bm{h} \\\\
	& = \\alpha \\sum_{i = 1}^{n_{bl}} h_i
\\end{align*}
\\]
If we store $g_{\\mathsf{sum}} =  \\sum_{i=1}^{\\ell} g_{i}$ and $h_{\\mathsf{sum}} =  \\sum_{i=1}^{\\ell} h_{i}$ in the CRS then we can compute
\\[
D \\gets B  - \\beta^{-1} g_{\\mathsf{sum}} + \\alpha h_{\\mathsf{sum}}
\\]
in just $2$ scalar multiplications.