Code dissection: secret sharing

| 1 Comment | No TrackBacks

To kick off the code dissections, I've chosen one of the more complicated GolfScript programs I've written to date. It turns out that I've learnt so much in the 6 months since I wrote it that this writeup has taken many many hours and I've reduced the program in size by 40%.

The challenge was "to implement Shamir's Secret Sharing reconstruction over the Finite Field defined by the prime 1928049029".

Input is done using stdin. First comes an integer k, then k lines follow. Each of these lines contains a pair of integers x y that represent a secret. In other words f(x) = y in the original polynomial that was used to construct the secrets.

The first thing to do is to understand the mathematical background. If you're completely new to number theory, this will probably be a bit heavy-going. I'm going to assume basic understanding of arithmetic modulo a prime number P.

Shamir's secret sharing works by picking a random polynomial p(x) of order n. The secret is the y-intercept p(0). Given n+1 points on the polynomial it's possible to interpolate / extrapolate to any other point. I've gone with the technique suggested by the Wikipedia article on SSS: Lagrange interpolation.

Given n pairs (xi, yi) where yi = p(xi), we can evaluate p(x) as
p(x) = \sum_{i=0}^{n-1} \left( y_i \prod_{j \ne i} \frac{x_j - x}{x_j - x_i} \right)
which in the special case x=0 reduces to
p(0) = \sum_{i=0}^{n-1} \left( y_i \prod_{j \ne i} \frac{x_j}{x_j - x_i} \right)

Note that this is going to require us to do division modulo P. Dividing by a number is equivalent to multiplying by its reciprocal, and we can find the reciprocal in a prime field using the extended version of Euclid's algorithm.* It's most elegantly expressed in Pythonesque pseudocode: to find the reciprocal of a (mod b) where gcd(a, b) = 0, a > 0, b > 0 comes down to:

(c, d) = (1, 0)
while (b > 0): (a, b, c, d) = (b, a%b, d, c-(a/b)*d)

We might find it useful to rewrite this slightly:

(c, d) = (1, 0)
while (b > 0): (a, b, c, d) = (b, a-(a/b)*b, d, c-(a/b)*d)

The result in either case is c. If you look closer you can see that we do in fact have an easy way to do modular division: we can calculate C / A (mod B) by simply multiplying the original values of c and d by C.

But experienced GolfScripters will already have spotted a big problem: we have easy access to the top three elements of the stack with \ and @, but manipulating four elements is hard. We're going to have to use arrays in some form. I have spent a lot of time trying different combinations, and I'm not certain that there still isn't a slightly shorter way to do it. (And, of course, there's a character or two difference according to the order in which the arguments arrive). Suggestions gratefully received to improve on:

2,[a b]{
    .~/{\.(;@@~@*-+\}+2*.1=
}do;0=

This is fairly quick to dissect: we have to use a loop without knowing a priori how many times it will run, and do is the shortest such loop. (Well, maybe I should think about how to do it with unfold...). Going into the loop we have [d c][b a] on the stack; therefore at the end of the loop body we want [d' c'][b' a']b'. When b' is zero we exit the loop, and to get the final value of c from that we pop [b a] and extract d. Hmm, I've messed up somewhere, but I've tested and this is the right way round. TODO Come back and fix the derivation / explanation.

The inside of the loop body is only slightly tricky. We copy the top array, expand it to its (two) elements, divide, and then use + to push that value inside a block. That block transforms the top array and flips; by repeating it twice we transform both arrays and leave them where they were. The block itself is straightforward stack manipulation.


Ok, we're ready to look at the program as a whole. The first thing to do is to process the input into a suitable format. We can simply discard the first line, because we don't need to know the number of points. So we start by discarding the first line and turning the input into an array [[x0 y0] ... [xn-1 yn-1]]:

~](;2/

We're going to have a double loop: the outer loop for the sum and the inner loop for the product of the interpolation. It's one character better to do a sum over the result of a map as 0\{...+}/ than {...}%{+}*, so the next step is

0\

We're going to do such complicated things with the array we currently have that it's worth storing in a variable, and it turns out that we don't lose anything by using an alphabetic name. So we store it (remembering that storing doesn't pop, rather counterintuitively) and then lift the inner loop and the summation (remembering that it's actually summation mod P):

:X{ ... +P%}/

Note: we need to define P inside the loop body. But it's such a long value it's certain to be worth not duplicating, and storing it when we first use it will be two characters for a later one character lookup: any other method would involve a dup and some stack manipulation to keep it out of the way until we need it, so there's unlikely to be a better one.

So: inner loop. We have subtotal [xi yi] on the stack. To get the xj≠i we'll have to remove xi from the array. We expand the array on the top, flip xi to the top and duplicate, grab the full array, map each element in it to the x value, remove xi by symmetric set difference (the raw value is promoted to a set automatically), flip again:

~\.X{0=}%^\
# Stack: subtotal y_i [x_{j != i}] x_i

The inner loop is a product with base case yi – no need to bring a constant one into this. So we want to map over the xj≠i a function mapping to xj(xj-xi)-1 (mod P). We can repeat the trick already seen in the GCD discussion above of adding an integer to a block to pull it inside before mapping, so it pretty much comes down to

{\[.0](@-[1928049029:P%P] GCD *}+/

Notes: this initialises c to xj to do division; the % could equally be a + and is necessary because the subtraction might give a negative number, and our GCD only works with non-negative numbers.

The final code, with whitespace and minimal commenting, is

~](;2/0\:X
# subtotal [[x_0 y_0] ... [x_{n-1} y_{n-1}]]
{
	~\.X{0=}%^\
	# subtotal y_i [x_{j != i}] x_i
	{
		# subtotal subproduct x_j x_i
		\[.0](@-[1928049029:P%P]
		# [x_j 0] [(x_j-x_i)%P P]
		{.~/{\.(;@@~@*-+\}+2*.1=}do;0=
		*
	}+/
	+P%
}/

and fully golfed is 86 chars:

~](;2/0\:X{~\.X{0=}%^\{\[.0](@-[1928049029:P%P]{.~/{\.(;@@~@*-+\}+2*.1=}do;0=*}+/+P%}/

* Well, the shortest way of finding the reciprocal x-1 (mod P) is probably x P.,\@{@*\%(!}++?, which uses ? to search through 0 to P-1 looking for a y for which y P x @*\%(! is true. But this is very very slow - don't expect the test cases from the original problem to complete in less than a few days.

Actually Nabb points out that via Fermat's Little Theorem it's x P((?P% - still pretty slow, and very heavy on memory consumption.

No TrackBacks

TrackBack URL: http://www.cheddarmonk.org/golfscriptblog/cgi-bin/mt/mt-tb.cgi/4

1 Comment


Note: we need to define P inside the loop body. But it's such a long value it's certain to be worth not duplicating, and storing it when we first use it will be two characters for a later one character lookup: any other method would involve a dup and some stack manipulation to keep it out of the way until we need it, so there's unlikely to be a better one.

So: inner loop. We have subtotal [xi yi] on the stack. To get the xj≠i we'll have to remove xi from the array. We expand the array on the top, flip xi to the top and duplicate, grab the full array, map each element in it to the x value, remove xi by symmetric set difference (the raw value is promoted to a set automatically), flip again:

~\.X{0=}%^\
# Stack: subtotal y_i [x_{j != i}] x_i
The inner loop is a product with base case yi – no need to bring a constant one
il lottery

Leave a comment

About this Entry

This page contains a single entry by Peter published on October 7, 2011 1:00 AM.

Debugging GolfScript was the previous entry in this blog.

Ruby eval and debugging revisited is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.