Today the terms ‘user privacy’ and ‘data confidentiality’ are very important if you are planning to provide a service to people other than yourself. The big question is how do you go about to enforce these concepts? Everybody has heard about ‘encryption’ and that it is apparently the way to go. Unfortunately most people haven’t heard about encryption standards and end up building their own ‘secure’ systems. It is a situation that all too much reminds me of the following scene from the movie Groundhog Day (excellent movie, you should watch it. From a legal source of course):
People think that “it’s showtime” by building their own cryptosystem and end up driving their users off a cliff – killing them in a firey inferno.
To illustrate how not following standards breaks your security I will show an example brought to me by one of my (unnamed) colleagues just the other day. The problem: we have some values A, B, and C we want to hide from prying eyes. These values can either be -1 or +1. Their proposed scheme was to choose another three values X, Y, and Z that can also only be -1 or +1 rather store the values AX, BY and CZ. Up to this point what they are effectively describing is a One-time Pad, which is an unbreakable encryption scheme. If they have this unbreakable scheme, what can go wrong?
The thing is, they didn’t just want to store A, B, and C they want to hide D, E, and F as well. By using the same values X, Y, and Z. And this is where it breaks down. With a One-time Pad you cannot use the same X, Y, and Z more than once. Let’s see why:
Say we get
AX = 1
BY = 1
CZ = -1
For any choices of A, B and C we can find X, Y and Z that will satisfy these equations. For example:
A = 1, B = -1, C = -1 means that X = 1, Y = -1, Z = 1.
A = -1, B = 1, C = 1 means X = -1, Y = 1, Z = -1.
And you can do this for all 8 possible combinations of A, B, and C. There is no way of knowing which values of A, B and C gave us the output (1, 1, -1). But look what happens as soon as we learn the following:
DX = -1
EY = 1
FZ = -1
Since our variables can only take the values -1 and +1 we know that the square of any variable is equal to 1. Multiplying the corresponding terms we get:
ADXX = AD = -1
BEYY = BE = 1
CFZZ = CF = 1
From this we learn that if A = 1 then D = -1, B = E and C = F. If the One-time Pad was with different pads for each message then learning one of the messages would tell you nothing of the other one, but now if we find out what one message is we can immediately figure out the other one. Another way of leaking information is if we know that these are encryptions of bids for an auction (binary encoded with the 0s set to -1) and A, B, C was sent first we know the second person just bid 4 more than the first guy.
There exists and interesting interchange between the One-time Pad and ASCII encoding. I found out about this from the excellent online cryptography course I am following Below I have encrypted the two phrases ‘a cat mat’ and ‘THE LOW Q’ with a One-time Pad using the same random pad:
c1 = `Rí!UX}
c2 = Uptﾬ#NoX
Now from these ciphertexts it doesn’t seem like you can figure out anything about the phrases and if you look at each in isolation you can’t. But if you take the two ciphertexts and XOR them together byte by byte you get 5h&A8o:A% and when you compare them to the original phrases:
a cat mat
THE LOW Q
I’m pretty sure you can see the pattern here. We now know where the spaces occur in the strings as well as some of the letters of the original phrases! Which is quite a lot of information already that we did not want to give away.
Just remember kids, save your users from the inferno by following the standards!