When Information Theory and Copyright Law Collide

Take a copyrighted piece of digitally stored information. Take another comparable piece of digitally stored information that's in the public domain. Take the two binaries of these files and XOR them together. The resultant binary output will bear no statistically significant resemblance to either of the two original inputs.

Now, who owns the intellectual property rights to the newly created file? If the answer is "no-one" (or even "the person who created the file"), then copyrights have a big problem. For a XOR gate, if you know the output and all but one of its inputs then you can reconstruct the other input by deductive inference, and it's a relatively trivial matter to write a program that does this. If the output file and one of the original source files are freely distributable, then fileswappers can simply share these files completely legally and reconstruct the copyrighted file with ease.

This isn't just a thought experiment. It's real. I'm going to be watching intently for the first time that this comes up in court.

(Edit: Corrected an embarrassing but unimportant thinko.)

Share this

Yeah, I think they missed

Yeah, I think they missed the jurisprudence bus. The problem with this is that the only use for the munged files is to reconstruct copyrighted works in almost all cases. A pragmatically minded judge would say the same and slap you with a huge penalty.

Legally, why should this

Legally, why should this sort of munging be considered any different from, say, MP3 encoding? If you try to interpret an MP3 file as a WAV file, you'll just get noise. It seems to me that this is just another codec (or, rather, a family of codecs), with the distinguishing characteristic being that it serves no practical purpose. It's a cute trick, but I don't see any real significance.

The Monolith guys argue that

The Monolith guys argue that it's different from an MP3 because there is no way to directly interperet the munged file and get something resembling the copyrighted work. So the munged file doesn't meet the legal definition of a derivative or representation of the copyrighted work because it really doesn't represent anything at all. Of course, this is really just an extremely weak form of encryption, so it's not exactly anything new. They're just the first people I've seen make the connection with copyright issues. Unfortunately I think Steve's right that this is more a problem in theory than it is in practice.

This doesn't stand a chance

This doesn't stand a chance in court. The "mono" file is a derived work and subject to the copyright of the original.

Bruce Schneier covered this at his blog.

Also see this analysis.

While I agree that it

While I agree that it probably wouldn't hold up, I don't see how one could make a coherent case that the mono is a derivative work.

For any given logic gate, if

For any given logic gate, if you know the output and all but one of its inputs then you can recunstruct the other input by deductive inference, and it’s a relatively trivial matter to write a program that does this.

Actually, this isn't true. This trick only works for XOR since it effectively calculates the difference between the inputs. As a simple example, take a 2-input logic OR gate with a known input of 1 and known output of 1 (leaving one input unknown). In this case, it is impossible to determine the unknown input, which could be either 1 or 0.

Also, while I do agree that the XOR output probably does not meet the legal definition of infringing on the original copyrighted work, wouldn't the violation still occur at the point of performing a second XOR to reconstruct the original file? Taking any single excerpt from a book or image from a film wouldn't violate a copyrighted work, but is there any reason to think that doing this repeatedly until one had the complete original work would somehow not be violating the copyright? Or to put it another way, are you not violating the copyright of a work because in downloading it you are only dealing with individuals bytes of the work (none of which can meaningfully be considered derivative of the original work)?

Mineavatar, thanks for

Mineavatar, thanks for spotting the howler. Corrected. My only defense is that I was smoking crack when I wrote it.

And yes, I was thinking something along those lines too. This makes it so that I don't think someone could be nailed for distributing a copyrighted piece of data, but they'd still be able to get nailed for intent to circumvent the law and for possession at the end-point.

Huh... I guess we'll just

Huh... I guess we'll just have to wait for licenses that explicitly prohibit this; i.e. "Don't open this CD if you intend to this-and-that."

The mono guy seem hung up on

The mono guy seem hung up on the problem of determining which numbers are copyrighted, and which not. Very hard in concept, yes, because math is infinite!! Oh no! But actually it's easy to determine. The world of algorithms is finite. If you have a number in question, just plug it into all common number-decoder programs, and see if any copyrighted information pops out.

There aren't an infinity of decoders? Not in the real world. It may have escaped notice in the ivory tower, but down here people use just a handful of formats for copied information. There's a good reason for that, if you care to think about it for more than about 1 second.

It's a derivative work

It's a derivative work because of the process by which it is created, not because of anything it "represents." It's basically just a (weakly) encrypted copy of the original. Obfuscating the mechanism by which the copyrighted work is being transmitted between two individuals isn't going to help.

It may work if you can set it up so that no individual is distributing any chunk of data that can be proven to "belong" to the original file. Basically you have a huge pool of random blocks and to produce any given file you need to get some subset of those blocks. But you set it up so that there is no block that belongs to only one or even only two files, that a given file can be produced using a number of different subsets of blocks, and any given subset of blocks could produce multiple different groups of files. Then both the sites offering up the blocks for download and the people downloading them would have plausible deniability. The sites offering information on how to reassemble the blocks would not, but said information would be *much* smaller than the original files. Note that using this technique would almost certainly mean one would need to download several times the amount of information that's in the copyrighted work they were trying to recover.

On the other hand, it's very easy to avoid getting prosecuted for distributing copyrighted works: trade with your friends, and do it in private.

"Now, who owns the

"Now, who owns the intellectual property rights to the newly created file?"

The owners of the source IP.

You've violated both

You've violated both copyrights by creating the file. "Munging" the two files together is just the first step in a copying process once you reproduce either of the originals. You were not sold the right to do that.

Using my factor of production theory both original IP holders are co-owners of the munged file and can both ask for the removal of the information that belongs to them.

From the link: Therefore, if

From the link:

Therefore, if a copyright holder claims that she owns the information in all Mono files that are munged from her work, she is also claiming copyright over all possible binary files that are the same length as her work.

This statement is false for copyrights as interpreted under my theory of factor of production. The copyright holder would only be claiming ownership of munged files that were physically reproduced from the copyrighted work, not all possible files of that length.

When it comes to going to court the issue then becomes one of proof. The odds that you created the exact munge of MSWord and Lotus notes without actually utilizing the copyrighted works is astronomical and therefore it is most likely that the munged file is a derived work. If fact it comes so close to impossible to produce it in any other way that one can say with certainty that this is what was done. So the munger has actually provided proof of his own guilt.

It is also rarely the case that two files are exactly the same lenght. So this will also cause evidence against the munger. He will either have to repeat the shorter files pattern or go with all zeros or ones for the remaining bits. This will show his procedure was not to have made the file from scratch, but instead as a derivative product.

Testing for munging is rather easy too. You just take your original copyrighted work and "unmunge" against the suspected files. If your original is spit out or a repetition of it then you have a violator.

On the other hand in the unlikely case that I have created an original work that is the same lenght of another work, then it is even more unlikely that when I "unmunge" it with that work that I will get another file that is useful. I can thus prove my original work is not a munge. Suppose that a digital version of Shakespear's "Romeo and Juliet" was the same lenght as the MS Word exe file by happenstance. The odds you could XOR that with any repetitive file to get MSWord is astronomical. The odds of XORing it with any useful file to get MSWord is also astronomical.

Sorry, I didn't proofread.

Sorry, I didn't proofread. That's length not lenght. I also put an "If fact" for an "In fact,", etc. Sorry about that.