Idea: Executable bytecode as data compression algorithm?

Could you implement a compression algorithm by defining a byte-code interpreter, then writing software to generate byte-code for that interpreter whose output is the data to be compressed?

Principle

Let:
  • X = data to be compressed
  • I = byte-code instructions
  • p(i) = interpreter that executes byte-code, producing output data
  • c(x) = the "compression algorithm", a form of compiler that generates byte-code instructions

Compression

I = c(X)


Generate byte-code instructions, I, from the original data, X, which when run through the interpreter p(i), produces X

Decompression

X = p(I)


Execute byte-code instructions, I, using the interpreter p(i), producing the original data, X, as output

Method

Decompression for this concept is trivial and so doesn't require much discussion: take a program I and run it through the interpreter p(i), the output is the decompressed data.

The hard, and possibly impossible, part of this system is the compiler c(x). This mythical component has to turn data, X, into valid byte-code, I, in fewer bytes than it takes to just send X.

A naive (and possibly impractical) solution is to implement the compiler as a genetic algorithm: randomly generate byte-code and run it through the interpreter, then score each attempt based on a function of how closely their outputs match the data to be compressed and the length of the byte-code.

Obviously it would be better to find an approach less reliant on brute force!

In Practice?

I have no idea if this is a dumb idea, I just thought of it today and got too excited about it. I suspect that if this worked it'd be used by now, or there'd at least be literature about it. Perhaps there is and I just don't know what to call it in Google.

It seems likely that programs won't be able to magically encode entropy better than any other coding scheme, but I don't know anything about that kind of theory.

Maybe I'll just give it a go and see what happens! :P

Comments

Popular posts from this blog

IMU-3000 and Hello World!

Repetitive Strain Injury