Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Censorship Your Rights Online

The Ultimate Weapon Against Censorship? 181

Erik Moeller writes "David Madore, mathematician at ENS, describes a method that might be the ultimate weapon in the battle against Internet censorship. In his paper A method of free speech on the Internet: random pads he introduces a system of so-called pads, chunks of random data that are used to encrypt controversial information.(Read More)

Every byte in the source file is XOR'd with exactly one byte in the random file. The result file, by itself, is totally indistinguishable from white noise, provided that the pad used is truly random. Madore now suggests that users store pads on different servers and use several of them in combination to encrypt data.

A FTP or WWW site that stores one of the pads could argue that they are only storing random noise, and another might do the same. It would be mathematically impossible to prove them guilty of storing illegal information (unless there is a way to prove that one pad was created after the other). Only by the combination of the two (or more) files I am able to retrieve the original controversial information. The critical parts are the links to the pads I need to obtain the information, but those might be traded on a distributed system like Gnutella or FreeNet. Plus links take very little space and can be relocated easily to freespace ISPs.

The concept is a little more complicated than my summary here, so please read the paper (and mirror it, it's GPL'd!). There are already scripts and programs to create pads and restore the original files (including a GUI program for Win32). I might add that the idea of pad encryption is fairly old, already used in WWII -- its advantage is that it is mathematically safe if the pads are truly random and only used once, thus its name "One Time Pad"."

This discussion has been archived. No new comments can be posted.

The Ultimate Weapon Against Censorship?

Comments Filter:
  • One Time Pads have been around forever.
  • It's quite probable that a judge might decide that the gain to society of allowing your ftp site carry a bunch of "random" data is outweighed by the claimed economic losses of the MPAA and friends.

    Not that I would agree with him, and I'm not a lawyer, but reading over the DeCSS injunctions and the copyright clauses it seems law contains many questions of "relative harm."

    If judges accept banning linking to things which theoretically might lead to copyright infringement, banning you from hosting random data without some obvious purpose doesn't seem so far fetched. Of course, you're more likely to just see injunctions against you distributing the data and not monetary damages (unless they can prove you acted in concert with your sister site carrying complementary data), but this doesn't help that much.

  • This is a sort of interesting use of the ideas behind the one time pad.

    It seems that what this guy is suggesting is really a system for message distribution where the sender of a message can deny that the message exists until such time as the message has already been mirrored around the world. The problem that I foresee is that in places that really clamp down on free speech they will simply ban your taking part in this system since its purpose is clearly to circumvent the (oppressive) Law. After all, the system has no other use.
  • If this is to be used to allow the public to get information, then you have to tell people which blocks they need to grab to construct the message. And if you tell them that, then anyone who wishes to suppress the message need only have one of the blocks removed. This only seems to rely on the fact that no one could be forced to remove a block of random, useless data under most laws.

    Ok, so they could point at the site posting instructions and say "this block makes this bad thing available, therefore you must make it go away, regardless of the other things that may depend on it". If that's not possible, then they could just censor the site with the instructions. Sure the message is still there, technically, but if you don't know which blocks it needs, it'd be rather much effort to test all the combinations, especially if you don't even know what message you're looking for.

    And if you say "well they could just put the instructions on another site", they could have just moved the content too, so how exactly does this make it harder to censor things? Did I miss some valuable point here?
  • What would prevent us from using Gnutella as the basis of a pad distribution network?

    --

  • by Paul Crowley ( 837 ) on Saturday June 17, 2000 @10:39PM (#995084) Homepage Journal
    "Secret sharing" allows you to break a piece of data (usually a secret key) into N "shares", such that you only need M %lt; N shares to reconstruct the secret, but such that you don't have sufficient information to reconstruct the secret with M-1 shares (ie it's not just impractical, it's information-theoretically impossible). This means you could extend the scheme to keep working even if one or more of the participating sites go offline.

    However, I don't believe any such scheme will work. If it turns out that existing law is insufficient to prosecute participants, they'll extend the law so that acting in a way that could facilitate such a scheme is illegal, and that will include participating in FreeNet, Gnutella, the Eternity service, or whatever. That's why we need both the technology and the data havens.
    --
  • This doesn't make any sense. Sure, the pads are random, you can distribute the pads, but you still need to distribute the information that combining certain pads in a certain way gives you a certain message.

    If you could censor the delivery of the message, you could censor the delivery of the list of pads needed to create the message.

    All you're doing is putting the information into a new form. It's the pad list which becomes the important piece of information here and it's precisely the pad list which is completely unprotected by this scheme.

    It sounds pretty useless to me.
  • One security weakness I see is that an attacker can keep track of the pad database, keeping a note of the dates all pads are added to the database. This way, they can determine the location of at least one 'guilty' pad--the most recently uploaded pad in a set of pads contining undesirable material.

    With this attack in mind, I really don't see what these pads give us that the traditional cypherpunk techniques, such as the anonymous mailers, freenet, etc. don't give us.

    - Sam

  • as soon as you distribute 1 pad, along with a suggestion that you combine it with certain other pads at particular internet addresses, it could be argued you are distributing an encrypted version of said document.

    If you don't say which pads to combine it with, then I can't see anybody trying all 50,000,000 combinations to see what happens :-).

    The idea has merit if combined with some more sophisticated mechanism (freenet for example), but by itself, I don't see it buying much.
  • Grr, and when I had this idea a while back I was sure it was original.

    The idea I had: a Gnutella-like system, where information is distributed over several different computers. No individual computer that holds a segment of the information knows what it's for. To make sense of the information, you need to get two things: the decryption pad (which would be, as mentioned here, XORed with the data) and the "Key": a list of servers containing chunks of the information. These would be available from separate sites, or perhaps distributed across multiple sites.

    Because no server containing the information also contains a key, and no server containing the information also contains a pad, and especially because there is no way to tell the information stored on the server from meaningless random garbage unless the pad is applied, no server would ever be liable for information it contained.

    Anyway, I'm still interested in writing a system like this: wxWindows is the preferred implementation API. It could, in fact, be implemented alongside or on top of Gnutella itself. Email me if you'd like to be a part of it or would like to see it happen; if I get enough interest, I'll write it.
  • What I don't get is why someone would be storing white noise on their server. I mean come on. The argument that it's not encrypted data and just white noise is kind of a flimsy one to use against inspectors or what not. Why in the world would you be wasting storage space with white noise unless it's something important? Maybe I just don't get it.

  • Ooops, jumped ahead. I read the entire linked text, but not the Slashdot post, since I thought that it was just bits of the interesting text. The linked page mentions Freenet, but not Gnutella. :(

    --

  • I say we rot-13 every text file maybe 2 or 4 or maybe even 6 times. That will really scramble the data around.

    Think about it.......


    Double J. Strictly for the . . .
  • From: Lord Greyhawk
    To: david.madore@ens.fr
    Cc:
    Bcc:
    Subject: http://www.eleves.ens.fr:8080/home/madore/misc/fre espeech.html
    Reply-To:

    A few things to note about your proposal. You say

    "This will give you a new pad: it is also made of completely random
    data, but XORing it together with the pads you have selected will give
    back the hidden data, padded (pun unintended) with zeroes."

    Clearly you do not want a the plain text to be predictably padded with
    zeros. This is even more vital with XOR. If I combine N pads to make
    pad P that ends in the same sequence as Q then I know Q is the last pad
    needed to decrypt the message. Similary, I could make a catalog of pairs
    of pads, and check their endings with P to find the pair needed to finish
    decrypting the message. Continue with a catalog of three pads....

    You should really obfuscate the message before creating the pad so the
    plain text is scrambled before the XOR operations. Terminating in zeros
    is a very very obvious mistake (even at 4:30AM).

    Also, there is another method to defeat such catalogs once and for all.
    Simply rotate the start position of each pad, e.g. start with byte 10,487
    and eventually wrap around to the beginning and end with 10,486.

    So you XOR with 6 pads, chosen from 200 which would be a keyspace
    of 82,408,626,300 (which is only 36 bits...very weak in that regard) and
    then specify 6 rotations (of which 5 add security) and now you have
    82 billion * (128k)^5 keys which is about 121 bits. Almost up to the
    standard 128 bits used in secure SSL. If the sixth rotation is allowed to
    count, then the effective key length is up to 138.

    "Pads should be mirrored as much as possible around the
    Internet. However, no single site should ever mirror all the pads -
    nor a too large fraction of them."

    Why? Any attacker can simply download all the pads. This is the
    fastest part of any attack. I download 650MB CDROM images, which
    would be over 5000 pads worth. If the attacker knows you likely did
    not try certain subsets of pads, then that help the attacker narrow
    down the search. So if is is known that all 6 pads are not on the
    same server, that helps the attacker.

    If all servers are full mirrors then shutting some down does not help
    stop the information, you would have to kill them all.

  • Have a look at http://freenet.sourceforge.net ... this is what you are looking for.
  • I use a simple spider program to search the web and usenet for information that might be of use to me. Daily it sucks in a couple hundred megs of mostly useless data that is further sorted and searched by a backend program. If I wanted to grab illegal information I could just cause the spider to have a wider range that would find that information but without making it look as if I were looking for it. Obviously since my spider'd information never gets shown to anyone else I'm not responsible for others receiving this information and since it's automated and along the same lines as programs that generate search engines which are common online it'd be rather difficult to place any legal blame on me for receiving such data. My basic point being is that all the positives mentioned for the use of pads can be acheived just by playing dumb and using already existant technology so why bother? I'm sure there are as many ways to hide messages as there are children who like to play at spy games. It'd be more useful IMO to just push for more encryption when sending data over the Internet and more systems like freenet.
  • Perhaps it's a sex fetish of mine? Maybe my friend made a bet with me that I would write a program that would create pad* files of random data? The possibilities are endless... :)

    --

  • Since the moon is not govern by any single government/authority, if someone can afford to put a base on the moon and runs an news/web server on it, no law authority can ban the information distributed from there.

    So, why do we need pads?

    (This idea maybe a bit too expensive...)
  • by hypergeek ( 125182 ) on Saturday June 17, 2000 @10:55PM (#995097)
    While this, in conjunction with Freenet [sourceforge.net] may make censorship more difficult, and possibly more tricky from a PR point of view, it's a simple matter for a large governmental body to find and stamp out all the freenet-type servers in its jurisdiction.

    The best weapon against censorship is getting the general public rallied to your cause. Slinking around in the underground only makes you look more criminal to the average joe, and easier for any censorial body to sway public opinion against you. (Remember the panic about "hackers" from the early 90s to the present?)

    Failing that, though, the second best weapon, IMO, would be true anonymity. Would it be possible to have host addresses spontaneously, randomly generated, encrypted, and routed to the destination in a kind of virtual circuit?

    Then, when the connection is terminated (or even beforehand if constant generation of new addresses is part of the scheme), the address is discarded, never to be used again (except perhaps by coincidence).

    If someone wants to communicate, um, "nonymously" (as opposed to anonymously, of course :), they'd simply use digital signatures, but anonymity would be the default.

    Unfortunately, I'm not sure exactly how the non-addressing scheme would be implemented, and it would be of limited use to servers (which would require static addresses anyway), but with a shared client/server mechanism such as Gnutella, Freenet, (or OpenCOLA, for that matter ;), you could have a "swarm" network. Like a swarm of insects, you can definitely see that the swarm's there, you can tell when one insect bites you, but you can't track down that individual insect, as it gets lost in the swarm again.

    Or something like that. (It's 1:50-ish a.m., so I'm not exactly bright-eyed and bushy-tailed... :)

  • Well, I'm really not. I don't want to distribute information across the freenet - all the replication is wasteful, imo. Also, freenet does not break data into chunks.

    Assuming data replication isn't a problem, however, freenet's probably an ideal carrier for this type of system.
  • The idea is basically split information in several "pads", that truely look random. You get the information back by xor-ing several pads (or known pseudo-random generators).

    The idea is not new. I even saw it in some magazine such as Scientific American or Dr Dobbs Journal: the idea there was to split information in several pieces, so that you need a number of them to get the information (at least a fixed number of them, but not necessarily all).

    The problem is that it is pushed as a method for promoting free speech. IANAL, but if your free speech is legal, then there is no need to encrypt it ; if it is not, then you'll just prove that you were aware of its illegality, and tried to work around, which won't look too good in front of the judge.
    Also carrying crypted pieces of information can get you into trouble. You can't say you are unaware that it might be illegal: why was it crypted in the first place ?

    The other problem is that once someone knows how to decipher the random pads, nothing prevents him to tell how to do to others. I expect NSA to be quickly informed. Of course: it is difficult to make a crypto scheme that the "good" people can decipher but not the "bad" people. You need a shared secret, and sharing a secret at Internet scale is not safe. Or you need to encrypt with the public keys of some safe people (which you know personally, or are 100% sure by other means) but then (mail|public repository)+PGP is a better solution.

  • by Detritus ( 11846 ) on Saturday June 17, 2000 @10:57PM (#995100) Homepage
    The NSA and its precessors have been attacking problems like this for over fifty years. You take a bunch of intercepted messages, select two messages, overlay one message on top of another, subtract or exclusive-or the messages, look for a non-random result, shift or rotate one of the messages by a character or code group, and repeat. Continue until each message has been compared to every other message. The statistical anomalies indicate that two messages were encrypted with the same pad or additive. The NSA used this method to detect Soviet messages that had been encrypted with the same one-time-pad. The Soviets ran short of one-time-pads during World War II and issued duplicate pads to AMTORG and the KGB. It was also used to break naval codes that used a code book and random additive from a second book. Using multiple files makes the problem larger but the same techniques can be used.
  • If the intended recipients can access the controversial material, then the government/lowyers/RIAA/[insert bad guys here] can also access it.

    All they have to do at that point is go to court and present both "halves" of the material, and demonstrate that they combine to equal whatever [bad guys] don't like.

    What do you claim at that point? "Um, it is just a coincidence that those two files of white noise xor to give you instructions on building nukes... really!"

    Somehow I don't think that you would be believed.
  • Hey,

    What I don't get is how this system differs from a symmetric key algorithm. It all hinges around one bit of data known only to intended readers, in this case the numbers of the two pads, and in the case of, say Blowfish, the password to decrypt. Why not just encrypt your message, post it (anonymously) to USENET and where you would have told people the two pads' locations, you tell them the message title and the password?

    Just my 2

    Michael Tandy

  • One security weakness I see is that an attacker can keep track of the pad database, keeping a note of the dates all pads are added to the database.

    David thought of this. If you read the article carefully, you'll see it says: "Pads should be mirrored as much as possible around the Internet. However, no single site should ever mirror all the pads -- nor a too large fraction of them. "

    So there is no "pad database" per se.

  • I thought the same, until I actually read the referred article. :) This is an idea about a free-speech-network, a'la FreeNet. Not a OTP system.


    --
    "Rune Kristian Viken" - arcade@kvine-nospam.sdal.com - arcade@efnet
  • In order for somebody to have their free speech readable, they would also have to distribute the names of the pads needed to decode it.

    The author's idea seems to be the following: since no piece of data can actually be said to contain incriminating speech, nobody can be prosecuted for writing it.

    Did neural nets come to anyone else's mind? All that is being done here is that the data is being stored in distributed fashion. If anyone can read the message, prosecutors can too, and the fact is that the author transmitted the message (albeit by an unusual message).

    The other major benefit of this method claims to be anonymity.

    Practically speaking, are there not many many existing anonymous-posting methods? (Eg. use hacked dialup; use anonymous http proxy; use long chain of telnets and format computers on the way, etc.) Or even worse, just post something from someone else's equipment, without their knowledge (eg. public library) and then leave.

  • Why won't adversaries of the system flood it with junk?
    .
    How does it protect the originator once the set of pads used is revealed? Isn't it no better than mailing your message to a "we like illegal information" mailing list and knowing that the cat is out of the bag as you go to jail?
    .
    There is still a major cryptographic weakness here: By using all permutations of the pads available a cryptographer can very quickly find a permutations with high statistical redundancy. Simply put, the aren't one time pads. They are publically known pads. A contradiction. If enough pads are to be generated to avoid statistical redundancy, there names would have almost be as long as their contents, and well that just makes things silly.
    .
    Why not just release the pad after the fact, instead of having weaker public pads?
  • I believe that this would have exactly the opposite effect; a loss of freedom and tightening of censorship controls. How long do you think the powers that be will tolerate kiddie porn, encryption secrets, and Metallica songs being freely available due to a loophole in mathematics before that loophole is irrevocably closed? Every time someone comes up with a clever idea like this it gives the (metaphorical) Man an excuse to tighten his grip.

    Figuring out ways to get around the law like this is childish and doesn't help the cause of opposing censorship. Instead of wasting our time finding loopholes in the laws, we should spend our collective energy trying to change them.
  • "used to encrypt controversial information" is not really what the article is about. The purpose of pads would be to distribute controversial information without implicating any party. As such, (and not as an encryption tool) it's a nice idea. The only weak point I can see is that someone has to, eventually, release the list of pads which represent the message, so expose that person to the lawyers. However, distributing the list of pads cannot be considered as the same as distributing the information itself, and holding old, random or innocent pads cannot either. So, it gets my vote.
  • by Anonymous Coward
    What you've just described is freedom.net [freedom.net], at least, the, er, nymous version. Not to be confused with freenet, despite the rather similar names, freedom.net uses a network of servers (many of which are not run by them, and which are in different countries around the world) and multiple layers of encryption.

    Each server peels back one layer, so it never knows more than "the last hop was machine X and the next hop is machine Y" -- theoretically you could trace the path back, if you could subpoena 3-5 different companies across a couple of continents... feasible, but difficult, and that'd still only get you an IP address.

    Anyway, it's quite a neat system but unfortunatly crashes a little too often for my tastes. Still, given time...

  • I think the 16 hex digits will not be enough for anonymity, if this method becomes popular. 16 hex digits means 2^64 unique filenames. For comparison, this is the square of 2^32, the number of IP addresses.

    Once there are 2^32 keys around, then the chance of a collision is quite high. I suggest using 32 hex digits (ie. 2^128 unique names). Another possibility, on top of this, would be to encode the first 16 bytes in a more efficient manner than hex digits.

    A unix note here, the given command

    dd if=/dev/urandom of=pad.dat bs=1k count=128

    od -t x1 pad.dat | head -1
    mv pad.dat pad(sixteen first digits of dump given by previous command).dat
    can be more efficiently written as:
    dd if=/dev/urandom of=pad.dat bs=1k count=128; mv pad.dat `od -t x1 pad.dat |head -1 |awk '{for (i= 2; i <= 17; i++) printf($i);}'`.dat
  • The people who are rich and famous are those who not only have good ideas, but act on them :)
  • I think what a lot of people are missing is that from a legal point of view, you have to consider where the information is.

    The scheme that is proposed can be thought of as a bizzare compression technique. You hide the bulk of the data in public view but strip it of all its information. The information is in the description of the 5 (or so) pads that you need (the key). The key is very short, but it contains all the information of the message. If a legal entity wishes to suppress the information, they need to suppress the description of the key, not the pads themselves. What this scheme offers is the ability to make the information arbitrarily smaller than the data, and hence easier to share.

    What the idea seems to be begging for is a proposal for a Gnutella-for-pads application whereby you can create, exchange and assemble pads in a single app. Also the standard should explain how larger files are constructed and the exact format of the key information (which pads you need) since you will need a different set of pads for each 128kb block of data in the potentially large file.

  • A note on the 'birthday problem' he mentions:

    The problem is to calculate the chance that if n random numbers exist, in a range 1...N, then what is the chance that two are the same? (This is relevant in deciding if keys will collide).

    It is called the birthday problem because of the following settings: Consider your class at school (say it has 28 people). What is the chance that two of them will have the same birthday?
    [Note: Leap years are ignored for convenience here.]

    In other terms, what is the chance that if you have 28 numbers in the range 1...365, what is the chance that two will be the same?

    The answer is found by: 1 - ((364/365) * (363/365) * (362/365) * ... * ((366 - 28)/365)), which works out to about 0.67 (ie. 67%, about 2/3 chance). This result can be somewhat surprising at first.

    If you don't understand the derivation -- imagine there are two people in your class. The chance someone else's birthday is NOT the same as yours is 364/365. So the chance that you both have the same birthday is 1 - (364/365).
    And so on.
    HTH.
  • From the webpage:

    The point of this system is to promote free speech on the Internet, nothing else.

    I'm not sure I understand the purpose of this. How does it promote 'free speech'? All I see this is a pretty neat idea on how to hide data (I'm not sure if this can be technically called 'encryption'). But since when does encryption free speech? Encryption adds another layer that has to be taken off before the underlying information can be found. It adds complexity to finding this information. It blocks out other users, be they 'good' or 'evil'.

    If you want to free speech, you need to work with the people or groups going against free speech, especially in the large scale. By 'in the large scale' I mean that encryption/padding is not the right way to promoto free speech if you want to carry your message to the public, to the masses. You do that by letting your message become extremely accessible.

    If I suddenly found, say, a working method for cold fusion (or something else along those lines). The worst thing you can do is keep this a secret to be protected among a chosen few. And if you did want this to be widely known, how do you do it? Encrypt it among dozens of sites, and then there's a hidden, secret site with the link? Or get the information far and wide, so users only have to look at it and not 'find links' etc.

    This isn't free speech. This is a way to hide data from those that don't want to see it. It's "close-sourcing" information to the unprivileged.

    That's not freedom.

  • as soon as you distribute 1 pad, along with a suggestion that you combine it with certain other pads at particular internet addresses, it could be argued you are distributing an encrypted version of said document.

    Take any "pad" or even any data whatsoever on the Internet, and you can create a "pad" for that data which will result in anything.

    So if I take a kiddie porn image and generate a pad based on a jpg on whitehouse.gov, which when combined with the jpg on whitehouse.gov reconstructs the kiddie porn, is the White House then distributing kiddie porn?

    There is no reason you couldn't do this with any such random pad, then frame the guy with distributing child porn by saying "combining X with Y" gets you this child porn image.

    I'm not sure, but the law is probably stupid enough to allow something like this.

  • Yes, the same idea will work, but with *much* more computing. Against a reused OTP, you only have to compuare all pairs of messages, about n^2. If the person used 5 pads plus their own one, you have to do about n^6 combinations, which gets a lot harder. If the system took off even slightly and there were 5000 pads out there, this comes to about 2^64 combinations. While that is theoretically possible, anything much bigger won't be.
  • The problem with secret sharing is that (at least in all the systems i've heard of) all the shares need to be created along with the secret. The thing that makes this system better is that only one of the shares can be traced back to you, so they would have to determine that that is the newest of the shares to be able to pin blame on you. With a secret sharing system you would have to generate all the shares, and if any of them is traced to you then you have lost your anonymity.
  • regardless of the other things that may depend on it
    Assume every time a block is censored it destroys about five messages in the system. (Estimated from the use of five pads to encode a new pad.) And if removing one message results in five blocks being removed from the system, then one censored message destroys about 25 unknown others. This means that if there is about a 1/25 rate of messages going through resulting in action being taken by police, the probablility of anything going through becomes fairly low.
    .
    My apologies to the author.

  • I belive this project was inspired by my post [slashdot.org] of Monday June 12, @04:24AM EDT.

    Ok, sure it's just a weak joke from a potential troll- but I do believe boards like slashdot are the perfect place to store such pads. No such conspicuous files to explain, and mountains of data to sift through to even find the millions of instances stored 'round the world.

    Someone mentioned in an earlier post that no one would trouble themselves to check all 50 million variations, but I think this would be childs play to the NSA. Perhaps by mixing up portions of pads randomly you could greatly increases the magnitude of what's going on. This would require rather bulky decoding instructions though.

    A much more useful idea would be to have a certain webserver [apache.org] (or maybe a standard protocol for all servers) generate a pad with every X web hits (up to a certain quota per day, and at either random or regular intervals), stored on a random message board. The randomness could even be taken from something cool like 'sub-ether' noise on the network, for a touch of Trekkie flair, or any of the myraid techniques used to generate randomness on a machine.

    Also, this would allow a type of slang to develop, expressing swift and accurate decoding instructions. As an example; "..dot ..3 ...last thursday." could be buried in an email under other pretenses, and indicate (presuming regular intervals) slashdot's 3 am pad, 3pm pad, 3rd pad of the day, 3rd from the last or 3+x from the y (x being some third source like a stock price fluctuation rounded specifically down). The proper interpretation of the hints could be pre determined by the parties before hand.

    This whole idea of 'soft export' encryption certainly has a lot of room for refinement, but it could come in very handy from a web cafe terminal in Europe, when Harrison Ford and the CIA are hot on your hax0r trail.


    :)Fudboy
  • by rjh ( 40933 ) <rjh@sixdemonbag.org> on Saturday June 17, 2000 @11:51PM (#995120)
    I am an InfoSec professional IRL, but I am not speaking for my employer, yadda yadda, this is not professional advice, insert standard disclaimer.

    First: I've never heard of this fellow. I don't recall seeing his name in any of the crypto journals. I don't recall seeing any particularly clever attacks from him in the past. Protocol design is tough; it is an exceedingly nontrivial task, even harder than designing new algorithms. Anyone who says they have a great new protocol is most likely lying, unless they're a Tuchmann, a Coppersmith or a Schneier.

    Always assume all new protocols are full of it, until enough time and attacks have gone by to give confidence that the protocol is only mostly full of it.

    Second: this system is not secure. Repeat after me: a one-time pad is secure as long as it's only used once. The likelihood of a birthday attack is orders of magnitude more likely than he's making it out to be. The reason for this is because Net traffic is not uniform; certain places tend to be "hot" and others "cold".

    Let's have a thought experiment. Let's say Slashdot begins to implement this system, and has a few thousand "pad blocks" available. This means a few hundred megabytes of purely random data--let's completely ignore the practical difficulties of purely random data for now and just assume we can do it.

    When Alice decides to store something unpopular and encrypts it with Pad(s) alpha, beta and gamma, so that Bob and Charlie can read it later, what's Alice going to do? -- Probably use one of the first twenty pads listed. Why? Because people are lousy at choosing random numbers. If you ask someone to pick a number at random, they're most likely to pick a number between one and ten, not one and fifty billion. Things that are at the head of a list get selected more often than those that aren't.

    Let's say that Slashdot randomizes these pads, though, so they always come up in an unpredictable order. (Never mind the practical difficulties in how to do this in the first place. It's a thought experiment. Just keep alive in the back of your mind the fact that (a) we've had to create hundreds of megabytes of purely random numbers, and (b) we have to present them to people in a purely random way.)

    After some mathematics, Alice's super-secret Neiman-Marcus cookie recipe is now pretty much totally obscured. She posts the recipe to a Website, and then tells Bob and Charlie, "Psst! I posted the information to this site. Find pads with IDs of [she recites their IDs] and use that to recover the information!"

    At that point the secret police storm in, having been eavesdropping on the entire conversation. They throw Alice, Bob and Charlie in jail. They go to the website, pull the information, get the pads and read the Neiman-Marcus Cookie Recipe for themselves. Guess what? This protocol has completely, totally and utterly failed.

    The naieve response is to say "well, they wouldn't say it in the open... they'd use encrypted email to share the pad IDs!" Okay, fine. All that's happened is the encrypted email is the weak link in the security; if that goes, the entire scheme falls apart.

    Now recall those two extremely thorny problems from before. Hundreds of megabytes of purely random data are very hard to come by, and purely random presentation of random data is very hard to do. Add in the implementation weaknesses to the weakness of the communications channel between Alice, Bob and Charlie, and you've got a protocol which has very little merit.

    This protocol solves a problem which doesn't exist, as far as I can tell. Now, admittedly, I'm not the sharpest knife in the drawer and I'm also bone tired and I could be totally misunderstanding what the goal of the protocol is.

    But for a secret-sharing protocol, or as a way to securely store information in a way which is deniable, it's pretty dismal.
  • The real problem is social engineering.
    This is an important and probably illegal secret message:
    i:/45 (&7*T u3goh o['68
    7+(&6 4Pgh5 *(P&5 8G*=!
    Please copy it to your ftp mirror.
    .
    I think freenet has the advantage of only copying things that are of interest to the requesters.
  • Ok, just about no one seems to have read and understood Madore's page, so I'll summarize his idea: when two people independently serve statistical "white noise" (which just happens to XOR to controversial material), it is ridiculous for either to be convicted.

    I understand this legal argument, but it's a rather highly technical legal argument. Suppose the DA decides to prosecute anyway and has some imbecile willing to testify to your guilt?

    Ok, at this point you then have to find yourself an expert witness to testify at a price of a couple grand a day. So then the DA hires a lot more "experts" to shout down your expert. So now you are paying massive legal expenses on doctored-up kiddie porn created by a crooked DA.

    The jury will be told that obviously you are some kind of criminal because otherwise why would you be doing something like this in the first place. Anyone who knows anything about the Internet or even has an AOL account will be excluded from the jury. Then any jury you have, presuming you can even afford lawyers, will already be drooling idiots, and will be pummeled into submission by a parade of trained circus ponies and clowns with seltzer water.

    To counter this you will have to spend every penny you ever had, and indenture yourself into slavery for your lawyers. Then the idiot jury will probably find you guilty anyway.

    That's assuming you get a trial. They could just invoke the name of Mitnick and deny you bail, and lock you up in solitary until you agree to waive your right even to have a bail hearing. Then they won't let you examine any of the "evidence" in your case and will generate a few gigabytes of crap. When you finally get the right to examine it, they'll print out tens of thousands of pages of binaries on a dot-matrix printer and let you look at it with a flashlight for five minutes a day in a dark room.

    All this is well and good as a mathematical exercise, but the real trick in creating a security system is to have one which is so ubiquitous that having it won't even seem suspicious.

    Because even looking suspicious is enough to get demonized these days. And what's the legal excuse? Ooooooh, we need to protect the CHILDREN. They'll use it for CHILD PORN!

    (IMO fuck the children, but that's not good politics. Anyone using this system will be portrayed ipso facto as some sort of pervert or molestor, and PGP already does this stuff fine.)

    (Oh and I forgot. While this is all going on a bunch of idiots will be posting on slashdot, ohhhh, but he's a criminal, hell with him.)

  • ...are a strong social framework, a tradition for the respect of individual rights, and a rational government working in harmony.

    Stop looking for technological fixes to problems that aren't technological.


    The regular .sig season will resume in the fall. Here are some re-runs:
  • ...is Napster tweaked to use this technology. That way the next big-label band that comes along won't be able to tell what's gettin' traded.
    I mean honestly, I read Dilbert a few days back, and it encapsulated the same basic principle here. That "data" is merely harmless 1's and 0's resting on (insert any form of data medium here). And it remains that way, until you activate it with a translation device. Not unlike cave paintings and eyeballs. Or better yet--and more appropriate--an inkblot.
    However, the issue we run into in this digital age, is that everyone's using the same set of eyes, or merely licensed copies of the original set.
    This technology (or encryption in general) would more or less "toss it up" a bit. Making certain data so that only like 'minds' can distinguish it, and turn it into meaningful information.

    Kinda weird when you think about it that way, eh?



    -={(.Y.)}=-
  • Moderators: moderate this up! Or have you all been replaced by zombies that zap any post containing the word 'Metallica'? Slipshod.
  • Keeping in mind that you're talking about monitiring *all* the possible databases, here are some ideas: 1.) Suppose I force uploads to my pad server to contain several pads, some of them being previously submitted pads (from my server or some other server). Anyone observing would know that some of the data would, by it's nature, be randomly selected known pads. The most recent uploader would be less meaningful, since *anyone* might have stumbled on that pad. 2.)What if I run a pad database that doesn't indicate when a pad was released to me? Borrowing an idea from the cypherpunk remailers, you could submit a pad to me with a special header that tells me to wait "x" intervals before I make your new submission available. Or I could simply add a random amount of time. Or both. The observer might know *which* server it appeared on first, but it would be very hard to tell who submitted it, since it could have happened, say, two years ago. 3.) My pad server could take a certain number of pads and forward them to another server, making them that other server's pads. A pad that you actually submitted to me would belong to some other server. 4.)As soon as you decide to start posting information via this system, you start sending pads to pad servers (possibly before you actually post a "message"). This way you'll increase the number of pads, but you'll also make it harder for someone to know that you're posting a "message", since you may just be sending white noise. Everyday, an observer would have to examine everything you post in light of everything else you have ever posted (This quickly becomes an exponential problem, no?). [BTW, have you ever heard of the shortwave radio transmissions that are, allegedly, spy transmissions? I remember reading once that the amount of traffic on these stations has not decreased signigicantly since the end of the cold war, since noone wants the other side to have a hint that the amount of spying may have decreased or to have a hint when spying takes an upswing again. Understand?] It all has to do with obfuscation. From my limited experience with cypherpunk remailers, the major weakness is someone putting together when you post and when an "anonymous" message appears on the system. With this system, a message would not appear at one time, and hopefully not in one piece. You could probably even devise a system where the sender doesn't actually know exactly *when* the message was sent!
  • Because it *could* be something important. Would you mind storing pads if you knew it might contain a message that could overthrow a tyrannical government? Different people would have different motives.
  • The first weakness is that it is easy to poison the repositories with pads with false names. The pad names should be made self-verifying by using a hash of the entire pad as a name (e.g. md5).

    The second problem is that the keyspace is too small. The obvious solution would be to encrypt the data. This way the "URL" for the information would be the names of pads to XOR plus the encryption passphrase. The encryption format should have no headers and be indistinguishable from random data without the passphrase. A good candidate would be CipherSaber [gurus.com].

    The system's biggest advantage is that it ridiculously simple and uses existing tools. This makes it very transparent.

    ----
  • I've heard of this sort of professional bitterness amongst the academic types. I suppose it serves a purpose.

    I personally think it is much better to take the optimistic path and consider the weak points as challenges rather than as the point of departure for mockery.

    This message smacks of mockery through and through, mainly with the overuse-of-the-bolden-tags, as they just might say in Germany.



    :)Fudboy
  • There's a good point here, the actual problem in this system is that I cannot figure out is how to tell the receiver which random pads to use for decryption... does this system work with a 'trusted-courier' in any way?

    On the other hand, random data encryption, where keys/pads are used only once is mathematically the best encryption form, been proved in the 1920's. But a practical implementation seems very tough to me.

  • Britain is at the moment passing laws that make it a legal requirement for ISPs to tap there customers (for MI5). This law will also require suspects to hand over your passwords for any encrypted data they ask for or go to prison.

    Having a system that means they cannot prove you have any encrypted data may be the only way to defend against this.

    Glynn
  • What I don't get is why someone would be storing white noise on their server. I mean come on. The argument that it's not encrypted data and just white noise is kind of a flimsy one to use against inspectors or what not. Why in the world would you be wasting storage space with white noise unless it's something important? Maybe I just don't get it.

    And you don't think that's a scary thing: Having to justify the existance of ANY file on your hard drive to ANYONE !? That sounds entirely horrifying to me.

    Me: "I just had the file on my hard drive, sir"

    Judge: "For what reason?"

    Me: "I dunno, I just wanted to see what it would look like"

    Judge: "Well, the state deems that it appears too random, and since you can't offer an acceptable explantion for its use, we have to assume it was for illegal purposes."

    Scoff now, but it's been happening since the beginning of time.

    If I want to sit there and read from /dev/random all day(not the best choice for real 'white noise', granted), NOTHING about that points to any illegal, or even 'suspicious', activity. It's one man, piping data to a file. When any incarnation of that, random or ordered, is considered illegal, I'm moving out.

  • Here, we encrypt files with a random pad, and distribute the 'key' via some secure medium.

    How is this different to encrypting with any other algorithm, or even just zipping the file in question with a password?

    In either case, there is an encrypted version of something easily available, which a person with enough time can crack. The key to this thing is distributed securely.

    Why then, do we not just distribute the whole file in question via our secure distribution network? What's the point of putting the encrypted format out for the public anyway, if only a selected few (who have privileges to the key) can read it and only after using our secure network?



    ---
  • by orpheus ( 14534 ) on Sunday June 18, 2000 @12:57AM (#995134)
    This method described has almost no merit at all.

    The Article had so many technical, philosophical, mathematical and other misconceptions in the article (just a few listed below), that it could pass for a modestly well crafted troll. It had 'something for everyone' (i.e. anyone should be able to poke *some* hole in it, with a moment's thought), making it both an 'obvious troll' and 'good bait'.

    At first, I thought the author was sincere, but then I noted that he actually reversed and misrepresented its flaws as *strengths* (e.g. 'the birthday effect' in namespace collisions)

    How did this article get on the front page of SlashDot? <sarcasm> Is it supposed to be a sly social analysis, a wry deconstructionist experiment or dry Gallic humor? I wonder why it's under censorship rather than crypto -- could it be footnote #6, below? Must be, else this ubmission would never be the cream of the crop. </sarcasm>

    [1] "Free speech" is only meaningful when it can be widely heard. Perfect encryption without public decryption is like locking yourself in a trunk and throwing away the key. If every Joe Sixpack and Dexter Tapedglasses can read your message without prior arrangement, so can Joe Gannon and Janet Reno. if JS and DT can't read it, it ain't 'free speech', its 'private communications'.

    [2] The only privacy insight here is the obvious fact that "encrypted files may look like garbage" (regardless of encryption method) However *cleverly* encrypted files, e.g. steganography, may look like something utterly harmless. Which approach is safer/more secure for the originator, the storage site, and the recipient? Especially in the light of laws like England's mandatory key surrender on (proper) demand. Someday, keeping massive Porn databases may be your duty as a patriot! ;-> How else can we stop the jackbooted thugs from finding/blocking our 21st century Federalist Papers?]

    [3] While independently assigned padnames of 8 bytes may offer 2^64 names, there is a 50% chance of collision after relatively few pads are generated (i.e. millions). The birthday problem the article mentions doesn't suggest high freedom from collisions (as he implies), it means collisons are much likelier than we expect: if there are 24 people in a room, it's *probable* (>50%) that there'll be a birthday collision (shared birthday) even though there are 366 possible days in the dataspace. He cites this as proof that collisions will *not* be a problem

    [4] The system loses the ability to decode as more random pads are created/shared and collisions begin to occur. Since pad generation is uncontrolled, this method would become an information hole -- if you used the 'wrong' #6930d3ed740d54de for a given file, you'd get gibberish -- yet all pad #6930d3ed740d54de are equally valid. The system he calls "A whole Mess 'O' Pads" would degrade to "A Whole Mess" (of bits) -- an effective information hole.

    [5] At its best, this inept rendition of a one-time pad is a Geek Pig Latin [GPL??], reducing the encription value of a theoretically UNBREAKABLE 128K one-time Pad to a *theoretical* maximum of 2^(64*n) combinations [where n is the number of OTPs XOR'd together. and a minimum that is no more than x^n combinations for brute force cracking [where x=number of published pads, n= number of XORs]

    You can best think of it as a poor key generation method, where the true key is not the 128K pad, but the far shorter 'instructions' -- the keynames to XOR together. The example he gave (6 XORs, 8 byte keynames) amounts to the same security as XORing against a 384 bit key, as far as a brute force attack is concerned. This is the same security as XORing against "Netscape Engineers are Weenies! They really are!!" (48 bytes)

    [6] Perhaps this article can be most charitably read as an experiment in information darwinism, but not in the Dawkinsian 'meme' sense: the speaker who uses this method is 'too dumb to be listened to' (and silenced by disappearing into the 'Whole Mess'O'Bits) -- akin to the sardonic 'too dumb to live'. (This is supported by his assertion that he is not sure free speech is a good thing)
  • I haven't even read the article and I can see where this is going. Consider this scenario. You create a random one time pad and put it up as File A. I take that pad, XOR it with, say, Metallica's "One" MP3. I take that result and post it as file B. Someone else takes file B and XOR's it with, say, a critique of Chineese governmental policies to create file C.

    Yet another person posts a web site which says A+B=Metallica's "One", B+C = government critique. Now, just who is Metallica going to sue for posting copyrighted info? Everybody is posting mathematically random data. A and B are both random. XOR A and B and you get the MP3. XOr the MP3 with A, you get B. XOR the MP3 with B, you get A. It'd be impossible to say after the fact whether A was created as a key to B or B was created as a key to A.

    This is obviously a very simplified scenario, but imagine it spread out to thousands of files, with the possibility of XORing multiple files.

    It isn't about security or encryption in the sense of keeping data hidden. It isn't about secretly transfering the knowledge of which key to use, accessing the key unobtrusively, or trusting the intended reader. None of that matters at all.

  • Yeh but the whitehouse is not going to be suggesting that you combining their jpg with your pr0n.com pad are they? Whoever is the one to make the suggestion of combining the pads, is pretty much by definition the "owner" of the document.

  • ...has been around for years...

    Oh mighty guru, please enlighten we poor ignorant ones. Give us a link. Support your claim. Has it been around for years? Has it been used before? If it's so well known, why didn't M. Madore know about it?

    ------

  • "All you're doing is putting the information into a new form."

    True, but maybe that new form is (more) legal.
    --
    Compaq dropping MAILWorks?
  • Me and a friend devised a similar idea to defeat the UK's forced decryption bill. Now the one time pad idea is nothing new, but what it means is that you can decrypt the text into anything you want using a different key.

    Again, you need a random stream, but this random stream isnt used as the key directly. This is used to generate a dummy key (which will decrypt to something you can show the police (or feds or whatever) - the dummy text), and the real key - which will decrypt to the real plaintext - the controversial material.

    Heres the method...

    K = Random one time pad (aka real key)
    P = Plain Text
    D = Dummy Text
    C = Cipher text
    KD = Dummy Key

    C = P xor K (Normal method of encrypting a one time pad)

    KD = C xor D (To get the dummy key - xor the ciphertext with the Dummy plaintext)

    Now when asked to surrender your key - you give in the dummy key. The law enforcement agency decrypts the text as follows.

    KD xor C = D

    remember that:

    KD = P xor K xor D

    and

    C = P xor K

    so when you XOR KD and C, D pops out, and there is no way to prove that you have anything other than an innocent little message.

    To get the real message - just XOR K and C - to get P.

    Again - this method suffers from a number of problems, such as plaintext and dummytext need to be equal, the real key must be kept secret from law enforcement, the need for a random, one-use-only key, and a host of others that I may not have come across, but it works, and provided the key is random, and kept secret - there is nothing that can be done to prove you didnt supply the correct key.

    There is also another possibility, but I am not sure that this would work as well:

    Encrypt the text with a block cipher to get C, and then get the dummy key as usual. This has the advantage that you can use your normal encryption programs, and not need a one time pad for normal use, but you can claim that you _did_ use a one time pad, and supply the dummy key as usual. Provided the ciphertext was encrypted using a good algorithm, it should appear random, and there is no way to prove otherwise.

    I am unsure as to whether using standard encryption methods leave some sort of signature as to what method was used, and if this is so, then the above method falls down.
  • If you rot-13 something that has been rot-13'd prevouisly it just unscrambles it dumb ass. Gurp!
  • by Effugas ( 2378 ) on Sunday June 18, 2000 @02:50AM (#995141) Homepage
    *Sigh*

    Everybody loves the One Time Pad.

    Can't imagine why. It's like, couple words out of Shannon saying a system can be provably uncrackable, as long as it's far too annoying to actually use, and people convert that to:

    Lets just make it not annoying to use.

    Problem is, the security comes from that annoyance, and degrades ungracefully: Very, very ungracefully. As in, the moment one pad gets compromised, or even reused, boom. Game over. You're done.

    Compound that by having key material retrieved by the encryptor over a network(as this system depends on), and you're even more done. Lets analyze what's going on here a bit.

    All cryptosystems are essentially engines for extracting the secrecy from a set of data. Secrecy is something even more intangible than the raw data that itself is secret; a very large quantity of information can be stored and transfered, but a secret can only be transfered if that data can be understood. Cryptography essentially works by allowing the comprehensibility of data--though not the data itself--to be extracted and simplified down to some other piece of data.

    Now, often that data can be much, much smaller. Broadbridge Media, for instance, takes direct advantage of this for reasonably secure mass data distribution of music videos on CDs--some large ciphertext gets mass distributed on CDs or DVDs, while a small, personalized transaction over the Internet allows an individual to retrieve the key which decrypts the ciphertext into plaintext. The mass data is moved, but remains incomprehensible until a relatively tiny amount of key material is transmitted to the destination host.

    Madore's system is somewhat similar; he still has a chunk of extracted secrecy composed of a "recipe of pads" which, when XOR'ed together, reveal the plaintext. This recipe can be as small as literally two pads; an innocent "complete works of Shakespeare" page and some extension thereof.

    First problem? Madore gets his pad indexes from the first couple of bytes of whatever pad he's come across. PGP has survived reasonably well with a 2^^32 complexity attack against its public keyspace indexes(it's called the DEADBEEF attack); Madore's system however is likely to find collisions in everyday use.

    It never ceases to amaze cryptographers that, for all the functionality of the fixed-output, one way hash(password storage, small indexes to arbitrarily sized inputs), people don't use them. There really aren't that many flat out solved problems in all of crypto, this is one of them. IF YOU'RE NOT STORING YOUR PASSWORDS AS EITHER MD5 OR SHA-1 HASHES, YOU'RE WAITING TO GET HACKED. *sigh*

    Anyway, beyond that small chunk of data which gives the recipe of which block to use, there's also the censorworthy-but-XOR-obfuscated block which will supposedly diffuse itself throughout the network. Whereas Broadbridge got its incomprehensible data out the door on CDs, Madore's system invokes the distributed nature of many, many XORable keyblocks to hide which block on the network is the actual censor-worthy block.

    But how many blocks do I need to use for a recipe? Suppose I have 200 random blocks to choose from, and I download one block of random key material. Wait. Lets say I'm really paranoid, and I generate my own random block to XOR against, and upload it to a server. OK. So I've gotten my single block to XOR against, I do so, and I upload my data-containing block to the padservers.

    I've already lost.

    Whether I downloaded my keyblock from the network, or uploaded it to the network, anybody sniffing my network traffic will see the exact block I used to encrypt against. They'll either watch it leaving the keyserver or going back in.

    Worse, lets assume there was no sniffer--just 201 random blocks, any two of which can be XORed together to reach plaintext. The complexity isn't one of fifty billion, it's 201*201, or a good 40,401 operations. Use of two pads isn't particularly specified...but then, use of this as a viable encryption system isn't particularly specified either. You can tell, by this line:

    "Your first task is to locate an announcement stating that the data you want are recoverable by XORing such a set of pads."

    Oh, that's all.

    "Go find your key."

    Obviously, with no special complexity applied to locating your key, there's nothing that separates You As Reader from You As Censor. And, since whoever determines a key used *once* for secret information determines it for all time...boom.

    But, lets be fair. Madore's goal mainly seems to be able to give websites the capability to host information they can't recognize. Freenet did this; Madore doesn't actually even come close. Among other things, the system isn't particularly fault tolerant. Good secret sharing systems allow m-of-n functionality, i.e. retrieval of any m number of shares from n total(like 3-of-5) reveals the data. This system? Any block is missing--and there doesn't need to be more than two--and your data is gone. Loss of a single pad archive is likely to cause some data to disappear forever. Ouch.

    Honestly, I'm putting too much energy into this. Madore writes the following:

    The pads, of course, are just named by their 16-hex-digit names (thus, strictly speaking, the announcement makes it possible to recover the first eight characters of the data; but that should not be a problem).

    Any cryptosystem which leakes information about the plaintext in the key material never should have left the drawing boards. I congratulate Madore on noticing this, of many flaws in his design, but this really is Bad Crypto. It's timely, and it's useful, and it'll hopefully prevent people from falling for other Pad scams by sheer nature of the /. reaction, but it's still Bad Crypto.

    *Sigh* At least he wasn't trying to sell us anything.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com
  • This scheme isn't as secure as it seems. Suppose I set up a corrupt repository where I copy the name of every pad and its first 8 bytes, but randomize all other data. Anyone who tries to reconstruct the original text using one of these altered pads will get only garbage, as will anyone trying to reconstruct using the original pad a message encrypted with the altered pads.

    A better solution is to use a hash function's output as the byte string. Since it is in practice virtually impossible to create another pad with the same hash signature, all a client has to do is compute a hash for each pad, and compare it against the list of hashes of the pads needed to reconstruct the message. Any tampering in a pad would thus be obvious; a hash scheme would also protect against errors in downloading pads, pads that get damaged due to hard drive problems, etc.
  • If you rot-13 something that has been rot-13'd prevouisly it just unscrambles it dumb ass.

    You just won the "Look everybody! I am Stupid! See how stupid I am? As more and more people get on the net, the possibility of meeting another as stupid as me remains nil!" contest.

    You are, in fact, dumb.
  • [1] "Free speech" is only meaningful when it can be widely heard. Perfect encryption without public decryption is like locking yourself in a trunk and throwing away the key. If every Joe Sixpack and Dexter Tapedglasses can read your message without prior arrangement, so can Joe Gannon and Janet Reno. if JS and DT can't read it, it ain't 'free speech', its 'private communications'.

    (For convenience, let's call the act of getting the pads and XORing them together " schkroping ").

    Not at all. If you describe such a text as being available by schkroping together, say, 95FE35321DA3, 95843938475894, 3948382830405, 409530404950 and 28305049394, (presumably each pad being locatable by it's "name"), you'd get a schkroping browser with will get the information you want just as (insert your favourite HTML browser name do), except that the URL would be the name of the various pads constituting the information.

    Hey! Let's invent a new URL type: shkrp://(pad 1),(pad 2),(pad 3),...(pad n )

    [3] While independently assigned padnames of 8 bytes may offer 2^64 names, there is a 50% chance of collision after relatively few pads are generated (i.e. millions). The birthday problem the article mentions doesn't suggest high freedom from collisions (as he implies), it means collisons are much likelier than we expect: if there are 24 people in a room, it's *probable* (>50%) that there'll be a birthday collision (shared birthday) even though there are 366 possible days in the dataspace. He cites this as proof that collisions will *not* be a problem

    However, here, you're right. There WILL be name collisions when you just take the first n bytes of the pad to identify it. But what can we do? If we take the last n bytes of the pad, we'll have the same problem. Even if we XOR them together, or if we XOR the CRC of the pad over that.

    Ultimately, it would seem that the only real unique key would have to be the pad itself!!!! Which hardly solves the problem at hand...

    The method could sure be greatly improved by the million eyeballs now looking at it; how about incorporating it in freenet, as the author suggests????


    --
    Here's my mirror [respublica.fr]

  • It seems to me that this system is just as easily censored as the existing internet. Somebody has to host the information that tells which pads are put together to make the real data, right? It wouldn't be any harder to censor this "list of pads" than it would be to censor the unencoded file itself in the first place.

    It appears that all this method does is move the point of censorship from the document contents to the "list of pads" required to build the document from the random data stored on various servers.

    Unless I'm missing something when I read through the document, I don't think that this really gains us anything and at the same time it makes it really freaking difficult to put a file out there. Maybe if it were automated, it would make a nice extension to an anonymous DFS system for file sharing, but you shouldn't rely on it to prevent censorship.

  • I have a question about this system. One time pads have been around forever, and dividing the information would protect it, possibly, but how would someone know which pad has the data they need? It would require a distributed list of pad numbers, and their contents, which would defeat the whole purpose.

    If someone had a text file that they wanted to encrypt, using this would be a waste, because then only they would be able to get to their information. Oh, I see know, this protects free speach by making it so that no one else hears it. Which would protect one from prosecution, because if only you have access to the information, you can say what you will.

    A better way, would be to use a variation of the Venegier's square, which would make it indecipherable to those who don't have the key, weed out those without the patience for subversive measures against supposedly oppressive governments, and give the jolly old guys at the NSA something to do with their time...

    If I'm wrong, moderate this to: -1(Stupid)
    -sempiternity
  • Servers should not list what they have (maybe except to mirrors), they should just return what they are requested.
  • Lets assume you are trying to get a message that is being sent to someone. All you need to do to crack the code is pretty much maths that is taught to 15 year olds: a)get all the pads on the internet or atleast all the pads that the bad guys are likely to have used. b) guess a few words of the message c) produce a big matrix of all the pads multiplied by a vector of 1's or 0s (1 means that pad used, 0 means it wasn't) and then make it equal to the guessed text. OK we are now down to simple matrix multiplication (well in this case exclusive-or multiplication but this makes little difference.) Invert the matrix (which takes a while for a big matrix, but not ridiculously so.) multiply by the guessed message and calculate the vector that tells you which of the pads were used. Check that the rest of the message makes sense, if so you have cracked the message, otherwise guess again. It doesn't absolutely always work (because you have to guess the message), but it works often enough that the scheme is pretty much worthless. This is called a 'known plaintext' attack. This code is not proof against it. Other codes are proof against it, and should be used instead. Don't waste your time with this.
  • At that point the secret police storm in, having been eavesdropping on the entire conversation. They throw Alice, Bob and Charlie in jail. They go to the website, pull the information, get the pads and read the Neiman-Marcus Cookie Recipe for themselves. Guess what? This protocol has completely, totally and utterly failed.

    Not at all. The protocol did what it wanted to do: it told whoever wanted the cookie recipe where to find it, and they found it.


    --
    Here's my mirror [respublica.fr]

  • It can be stated much simpler: a well-educated population in a democracy that doesn't listen only to SIGs.

    --
    Here's my mirror [respublica.fr]

  • I suggest you read the summary, in the last paragraph it states somewhere along the lines of:


    the idea of one-time pad encryption has been around for years, it was used in WWII.

    You should look up a few books on Cryptography before you go attacking people.

    Just my 1.999999999999999
    -sempiternity
  • That's assuming you get a trial. They could just invoke the name of Mitnick and deny you bail, and lock you up in solitary until you agree to waive your right even to have a bail hearing. Then they won't let you examine any of the "evidence" in your case and will generate a few gigabytes of crap. When you finally get the right to examine it, they'll print out tens of thousands of pages of binaries on a dot-matrix printer and let you look at it with a flashlight for five minutes a day in a dark room.

    You should have said "a dot matrix printer with a faded ribbon with holes and creases"...


    --
    Here's my mirror [respublica.fr]

  • Supposing it was distributed with freenet...would someone be able to effectively censor a piece of information (and inadvertantly any other data that relied on a particular pad) by taking one of the required pads first 8 bytes and flooding the network with about 100 different random files with the same first 8 bytes? *shrug* just an idle thought
  • You need to get off your high horse and re-read the article. It has nothing to with encryption, and everything to do with distributed data promulgation. The pads exist only to provide chunks of data which have no inherent meaning in and of themselves. Once linked together with other pads, the pads now take on meaning. This would effectively prevent shutting down a site based solely on the pads contained at that site (although I do believe it would be trivial to prove that a particular pad at a particular site "belonged" to a particular document, in which case censorship laws will simply be changed accordingly to prohibit serving of pads which can be used in part to reproduce a censored piece of information).
  • We're just really keen on random numbers, and when we have a really good pool of entropy, we don't like to see it evapourate - so we store it where others can use it too. (-:
  • well, someone could find some sort of illegal angles on the launch system you are using.

    you might even get shut down by the enviromental protection agency.

    of course, service calls would be a pain. but I do recall that Radio shack is going to send a lander to the moon.

    hmmmmmmm

    It is possible that Microsoft still has enough money to move their operations to the moon ... that has possibilities.

  • The OP is correct (and you've missed a rather subtle point). The OP said "It suffices to build a (roughly) square matrix containing the prefix of all the pads we wish to include in the analysis, run Gaussian elimination, and then see if there is a dependency with the file." The key word in that sentence is the word "square". Moreover, it is emminently possible to use more than just the 64 bit prefixes of the files; if one uses say 3,000,000 (where 3,000,000=the total number of pads), the total size of the matrix is well within the bounds of conventional techniques, to say nothing of SGE or BL/BC.
  • The point of the method is to make it easy to collect the information, while making it difficult to blame the publishers. Janet Reno is supposed to be able to read it; this is supposed to make it more difficult, legally speaking, to get the information offline. I don't think it'll work but it's not utterly mad. It's not exactly unobvious either.

    Your sums are wrong for point 3 as well. If you want a chance on the order of 50%, you'll have to generate around 2^32 pads; that's more like billions than millions. I still think that's too small, but hey, move to a 160-bit identifier (perhaps the SHA-1 of the pad?) and you won't get collisions.
    --
  • And this may be the argument you need to prevent the original message from being removed. If it is removed, it destroys the integrity of ones that are not related to the removal. IANAL, but isn't that illegal?
  • This would help those posting A+B. You may be right that Metalica could not sue over random bits. But the people storing or downloading B+C would not be helped as much. Remember that the Chinese sytem, like most totalitarian systems, is not overly concerned about jurisprudence. If the Chinese government has reason to suspect a chinese citizen of storing antigovernment information... even if that information cannot be accessed or even proved to exist by said government... they will arrest and prosecute. Evidence will be ignored or created as needed.

  • ...are a strong social framework, a tradition for the respect of individual rights, and a rational government working in harmony.

    Strong social framework == strong nuclear families, leading to strong extended families.

    The governments that we have today are by and large working to weaken families, announcements of programs to "strengthen the family" notwithstanding.

    The needs of the many are often used as an excuse to totally ace the rights of the few. Thank you for that pearl of short-sightedness, Dr Spock (I much prefer Professor Bernardo de la Paz's line of reasoning in this respect, although I have many bones to pick with RAH's philosophies in general).

    Which brings us to the fantasy of a rational government, let alone one acting in harmony with anything. Building on a foundation of irrational, selfish, group-minded (implies blame-sharing rather than acceptance of personal responsibility) people largely drawn from broken families does not result in strong, stable, thoughtful government.

    Having said that, I do agree with you.

    While the basic problem is not technological in nature, neverheless technology is relevant to the issue.

    Tools are amplifiers. A hammer, for example, amplifies your ability to concentrate and apply kinetic energy. You can use that amplified power to build rocking-horses or to break skulls. Computers are likewise tools. The black-hats in the censorship field are using these tools to amplify their own power. One effective counter to this is to use our own computers as tools for eroding their power, to keep the balance a little fairer.

    What I'm trying to explain with these analogies is that technology won't solve the problem, and is possibly a dangerous distraction from the real issues - but technology can help to contain the problem somewhat while real answers are found and implemented.
  • In a country such as China, merely maintaining a Freenet server or collection of pads for this scheme would likely be declared a capital offense. And since the authorities are willing to monitor every drip of water that flows through the pipes, they will see when you send that PGP-signed message, and arrest you. Whether they can crack the message or not is in most cases irrelevant.

    What is needed here is a form of encryption in plain sight that doesn't say, "look at me I'm a cypherpunk" when you use it. What about this-

    1. Take a copy of an innocuous 8-10k JPEG file from some large public site. Say some cute little kitty-cat from Pets.com or that sort of thing.

    2. Use a program that takes a small text message, maybe a few dozen words- "The police chief practices Falun Gong and will warn you if trouble is coming."- and embeds them into the JPEG file by, say, flipping a handful of color values around ever so slightly.

    3. Send the munged image to the recipient in an innocuous email- "Isn't this kitty so cute!!! :-)" While indistinguishable to the naked eye, a simple comparison of the differences between the file sent and the publically-available image file would reveal differences.

    4. The crypto here need not be so strong, because the point is to focus on making the sending of the message look as innocuous as possible, and to create plausible deniability for the receiver.

    5. Now the only program is to get the decoding software installed where it needs to be. I don't know what the right answer here would be.

    Anyway, just my two cents. Take it FWIW.

    -cwk.

  • Ultimately, the problem is one of closed systems vs open systems.

    It is alot harder to maintain a safe space to operate from when the entire system is under strick control (see recent developments in red china [slashdot.org]). It is far easier when you can operate from position that is outside the control of tyrants. It is very easy to paint reformers as criminal when they need to use such tactics. This tactic has been used sucessfully by many revolutionaries, pirates, etc. - Anyone operating outside the reach of the law, even when the law is unjust.

    The problem now is that the world is moving towards a unified system.

    This has many benefits when everyone can trust everyone

    (example - the early net before the web - sortof)

    (which is what happens when you a small community of professionals who know each other and have common goals)

    This has many draw backs when you have people who cannot be trusted, especially in positions of authority. The spread of criminal culture and criminal values is something you do not want in government, for example. This sort of thing results in spam, and other system abuses.

    So the senior problem is "who do you Trust?", and the related problems of ethical systems. We then have haggling and flame wars about what system of ethics to use between different groups with their different agendas and political views, while the crooks and the vandles run rampant. We even have social science types promoting the teaching of a "value free" curriculum, god knows to what effect.

    but of course, people do not want to hear that freedom means taking personal responsibility for yourself and the world around you. It means participation, and getting involved.

  • A few years ago some now defunct programing magazine proposed the idea of hiding your data inside another piece of data, say a scanned picture or a sound byte. Most non compressed image and sound formats, such as *.bmp and *.wav, have uneeded bytes, which is why they can be compressed so well into *.jpg or *.mp3. The proposed program would replace a certain number of these bytes with other information based on a password, the file could then be transfered to the intended person, who would then extract the information out. realisticly you could not encrypt a large amount of data like this, but when combined with the pad idea, all that needs to be inserted is the names of the 5 or 6 pads you used. This gives an added layer of protection, because it requires the bad guys to know it contains other data or if its just a badly scanned picture and then have to prove there is encrypted data contained within the picture and not just a fluke.

    Another idea is, use the governments own tricks against them. The best way to hide information is to make it as long and boring as possible. If you don't beleive me try reading some banking law or any budget for the government. This is how the government gets funding year after year for stupid projects.


    ---------------------------------------------
    Jesus died for somebodies sins, but not mine
  • by David A. Madore ( 30444 ) on Sunday June 18, 2000 @05:33AM (#995165) Homepage

    Hi. I'm the author of the page in question, and victim unaware of the Slashdot effect (well, not truly unaware: Erik Moeller, who posted the story, was kind to notify me in time). I received many emails about it, which I've all read, as well as a good many posts in the current discussion. I can't possibly reply to them all, but I'll try to answer some of the most frequent or important comments here.

    First note that the page was written in february (2000/02/19 to 2000/02/23 to be precise), so it is not new. However, I do not claim any kind of originality, nor paternity of the idea: it is a small variation on the protocol described in section 6.3 ("Anonymous Message Broadcast") of Bruce Schneier's book on cryptography. In any case, I think it is pretty obvious in the first place. I am merely suggesting a few practical ideas to make it workable. There is nothing great or revolutionary about anything, and I never made that claim.

    One thing should be made clear from the start: the whole idea is not about obscuring what the data is (i.e. it is not strictly speaking cryptography) but about who is sending the data. And, even more specifically, it is about making legal conviction impossible so long as the presumption of innocence is maintained (whether the presumption of innocence still means anything in these dark days is another question:-/&nbsp); thus, it is normal that the story appeared on Slashdot's "Your Rights Online" section.

    Please also note that I am not making a political statement. This is not a libertarian manifesto. I am not stating that you should use this system to send out assassination messages against the President / the Prime Minister / the King / the Pope / <insert your favorite assassination victim here>; I am merely stating that you can, and that this is none of my business.

    Many have pointed out that my suggested way of naming pads is bad. That's true: using the MD5 (or SHA1 or any other kind of hash) signature would be a better idea. But it doesn't really matter all that much what the pads are named unless we want the system to be resistant to malicious tampering, which was not one of my avowed goals. Indeed, we can get this almost for free, so we might as well. Let's say we could have a symlink pointing from pad_md5_whatever.dat to the pad of the given md5 for each pad in each repository, and "combination recipes" could be given with these links so as to make them resistant to tampering.

    Similarly for secret sharing: my idea was not to have a system which is hard to censor (there are other, far better, solutions for this), but to have one which is hard to track.

    Another thing I should make quite clear is that the system in itself is not used to hide data: it is used to hide the origin of data. This is why all comments on the "OTP is secure as long as the pad is truly one-time" line, or all remarks to the effect that it is trivial to find all relevant data among the padset, are quite true but completely irrelevant. If you want to hide the data on top of hiding the origin, then you use a traditional cipher; for example, you encrypt your data using blowfish and you use that data (the ciphertext, which for all intents and purposes is random) as input to the pad system. So long as you don't release the key, nobody can tell that there's a blowfish-encrypted data hidden in the pad system. The two are completely orthogonal. (It is true that my remark about the difficulty of finding "recognizable data" in the pad system is very misleading and irrelevant. I should remove that: never mind that part.) As for my comment about the birthday effect, it is merely about accidental collisions, not at all about malicious action.

    Somebody asks what is wrong with storing all pads in the same place since anyone can download them all. That is true, but that is beside the point. The point is that as long as a site does not have a complete set of pads yielding readable data, it is not, by iself, breaking any law, and all it is distributing is white noise; whereas if it stores one complete set of pads, then it is distributing the forbidden document in some form. Naturally, if someone wants to collect a complete set of pads, it is a good idea; but to distribute it is dangerous.

    Finally, there is the central question of whether the legal argument (which is the crux of the matter) holds water. Presumably it doesn't, but that will at leas prove one thing: the argument shows that any kind of law restricting free speech contradicts the presumption of innocence. Some have pointed out that one could monitor the pad system, and the last pad published in a set of pads would always be the culprit: this is not true, because it might have been delayed, or it might be provably innocent (which implies the former, actually), and you can never quite be sure.

    Imagine the following scenario: someone points out on some Usenet group that eight publically available pads, when XORed together, give something like DeCSS code. Judge summons the 'someone' in question, who claims that he just noticed that by randomly XORing pads together; not unconvincing, so judge lets the guy go. Then judge summons the pad owners. Starts with the most recently published pad: but the owner explains "look, my pad is just an encryption using the key 'foobar' of the first 128kb of (some standard transcription of) Shakespeare's Tempest; the idea had been floating around for some time, I just decided to publish it". Judge checks statement: it's true. So apparently the data was "published" earlier than was thought, it just took some time to come out; that makes things rather difficult to track. Second owner similarly points out that his pad is just a sequence of decimals of pi in binary. Third owner is in a country over which judge has no jurisdiction, so nothing to do there. Fourth and fifth owners seem to have created their pads at the very same time, and both state obstinately that they generated pure white noise (following, say, a story on Slashdot about pads being a great idea). Sixth owner says he generated his pad by XORing another dozen other pads with an innocent message (which he shows to judge). Seventh owner refuses to answer judge's question. Eighth owner posted his pad before DeCSS even appeared, so must be innocent (or really?). Now what does judge do? Convict some owners? All? None? Problem is, judge is impressed with first poster's proof, and can't run the risk of convicting someone who might afterward prove that his pad was innocent. Presumption of innocence. Even if judge merely issues an injunction that the pads be taken off the network, every owner appeals on the ground that the pads were reused in making some other messages (innocuous ones) and that removing them would be a serious breach of first amendment (or whatever you call this thing about free speech).

    Anyhow, this is the summary: there's nothing new or revolutionary about the whole pad system; in fact, it's pretty trivial. But it does make one point: that information is fundamentally delocalized and that any attempt to pinpoint it or to find a culprit will fail. For the better or for the worse.

  • It would be far simpler (and cheaper) to put it in Antartica, another remote area not controlled by any government (though the environment is nearly as hostile).
  • Why not? Just relabel the white noise as, say, a Metallica MP3 and nobody could tell the difference.

    Then again, you might just open up another can of worms entirely...

    - Jeff A. Campbell
    - VelociNews (http://www.velocinews.com [velocinews.com])
  • Problems (a) and (b) are easily solved:

    (a) In a slashdot discussion a few weeks ago, someone pointed out that Intel and possibly other CPU's provide an analog white-noise random data source, providing something like 75K/second of random data.

    (b) If you need a random number between 1 and 50 billion, then use rand(). Humans should never try to pick random numbers on their own; there are too many biases and patterns.

  • Yes, but the most recently created pad is not necessarily the culprit. It can be a good strategy to create a provably innocent patch (I explained how this can be done in various ways), XOR it with the rest and delay it's publication until much after the others. If anyone tries to pull the "latest created patch is the culprit" argument on you, then you show he's a fool by expliciting the way it was created (you can really make someone look like a fool if he tries to condemn you for publishing a sequence of the decimals of pi or an encrypted version of a part of the Bible!).

  • First of all this is not a new idea and I can not imagine why it would be allowed to GPL it or licenise it otherwise. I guess it is all in the implementation details.

    During the WWII messages were sent back and forward that could only be decoded if the receiving party knew what 'key' was used to encrypt the data. The 'key' could be a well known bestseller, a book, or a letter, or any piece of paper with words on it.
    All the encryption does in this case, it randomly finds a letter (case does not matter) on a page and puts the relative position of the letter instead of the letter itself into the encrypted document. Since the 'key' can have many (literally thousands) of the same letters repeating in various words (say it's a book, how many letters 'a' could you find in it?) the message can not be decrypted without knowing exactly the text that was used to encrypt it.

    for example I could use the text above to encrypt the following message: "FIRST POST" as: "1 16 3 17 24 9 87 7 102 5" - note that 'S' is coded as '17' in "FIRST" but as "102" in "POST" and it could anything else. Imagine using a book as a key, for each letter you could put a page number, line number and position of the letter within line.

    This would be the same idea as the scheme suggested in the article above and this idea is not new at all.
  • In the CPHack, the judge said "in active concert."

    There is conspiracy, where one hand does not need to know what the otherhand is doing. They just need to have a common purpose, publish prohibitted data. And 3 or more of this can be considered RICO.

    Instead of worrying about bypassing the law, why not fight it and change it?

    Recognizing some of these lawsuits as abusive, slapp enough of the companies that bring them.

    If you slapp a company hard enough, the others would stop doing this. That is why I am fighting Mattel. When I win, and I will, I am wanting a large enough sum to make sure that other companies flinch when they think about trying to shut someone up with abusive litigation.

  • mdpopescu--

    If you've got a cogent point to add, please, do so. I don't hold the monopoly on clues; I expect to fuck up pretty harshly in my life. It's part of crypto; you fuck up.

    This was billed as a means of encryption; it fails miserably in that regard. Key material is retrieved over a network, or is compromised when it is submitted to a network. Methodologies of dealing with files greater that 128kb aren't even mentioned. Recipes end up causing a single block to be the non-innocent one. No block that is innocent really is functionally that.

    And so on! Really, I'd love a better response. Crypto's what I do, and I wrote the previous rant on not *too* much sleep. You've gotta admit, Madore's system just isn't very good crypto, but if I missed the reasons why it isn't, I'm all ears.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com
  • Hartwell--

    There are two components here:

    Information Hiding, via Encryption.
    Secret Sharing, via Split Chunks and Recipes.

    As an encryption system, this fails. Madore admits this. But it's still an encryption system in one very classical sense: You have one block which is equal to ciphertext.

    Not two, not three, not m of n.

    One.

    And it's one block, which never changes. One block, which can be easily identified. One block, which is dependant upon network retrieved keying material.

    There are far, far better ways of doing steganography, secret sharing, and cryptography as a whole. That's my point.

    --Dan
  • Generating "purely random data" (or, as someone put it, practically random data) ain't that hard, even several hundred megabytes of it

    1. Set up a webcam pointing at a lava lamp.
    2. Turn on the lava lamp.
    3. Take a screenshot every fraction of a second, take the bitmap sequence and XOR sections of the image together.
    Voila, random numbers. The probability of generating subsequent screenshots with identical bit values is nil. Especially so the higher the color depth/resolution of each image is... You could easily get a hundred K of random data from each image, and you can get (let's say) 10 of those a second. 1 megabyte of random data per second. Now just run the program for ten minutes...
  • Speaking of secret sharing, I just wrote a little portable C program to do just that. You can find it at this place [quatramaran.ens.fr] (all explanations on use are given within the source file itself). It's really cute.

  • by Elvii ( 428 ) <david1975.comcast@net> on Sunday June 18, 2000 @10:40PM (#995242) Homepage
    I've come up with/been inspired with an idea to "encrypt" virtually any data, being near totally unbreakable unless you torture the sender/recivier of that data. It's not pad/block based, it can be used with or without a computer, and the numerics/codes it uses are unbreakable by brute force, look random, yet they're not random or patterned.

    Can answer simple questions, but going to hold off on full blown explanition until mid-week when I have full sample code/implememtation. It's not a hard system, just no time this weekend. Watch my site for more info as the week goes on, if you're interested.

    bash: ispell: command not found
  • Excellent Point.

    There is more than one kind of censorship:

    • Outright Government (Federal) Censorship (e.g. it is illegal to possess kiddie porn, to publish classified material, etc.)
    • Outright Government (State and Local) Censorship (e.g. Cincinnati's witch hunt of the Maplethorp exhibit, Larry Flynt, etc.)
    • Structural Censorship (e.g. Copyright prevents people from publishing another's work without permission, allowing the Church of $cientology to silence many citations of its works by critics, trademark laws restrict how one may refer to a corporate entity, etc.)
    • Institutional Censorship ("We won't display/print/publish that, it would offend too many, cause a lawsuit, etc.")
    • Corporate Censorship (threats of lawsuits, often based on dubious claims of trademark or copyright infringement with little or no legal basis, i.e. Legal Thuggary)
    • Social Censorship ("We don't like your kind around here!")


    I've probably missed some other forms of censorship, but you get the idea.

    Clearly, there is no technological solution that will solve all of these forms of censorship, and as others have pointed out, no technological solution can substitute for political involvement in preventing these kinds of abuses.

    Nevertheless, this sort of thing, coupled with a FreeNet infrastructure, could at least alleviate both Institutional (ISPs) and Corporate Censorship by making it too expensive to persue. It won't win the war, but it could be decisive in a few important battles.
  • [spaceship lands on the burnings ruin of a once flourishing planet]

    [2 aliens come out of the ship]

    Alien1: Wow...this planet is in ruins, but from the wreckage I can guess that once a properous and flourishing culture lived here.

    Alien2: No...I searched all recorded data and only found meaningless random garbage. Let's go home.

    [aliens enter ship and fly away]

It is clear that the individual who persecutes a man, his brother, because he is not of the same opinion, is a monster. - Voltaire

Working...