Idea: Executable bytecode as data compression algorithm?

Could you implement a compression algorithm by defining a byte-code interpreter, then writing software to generate byte-code for that interpreter whose output is the data to be compressed?

Principle

Let:
  • X = data to be compressed
  • I = byte-code instructions
  • p(i) = interpreter that executes byte-code, producing output data
  • c(x) = the "compression algorithm", a form of compiler that generates byte-code instructions

Compression

I = c(X)

Generate byte-code instructions, I, from the original data, X, which when run through the interpreter p(i), produces X

Decompression

X = p(I)

Execute byte-code instructions, I, using the interpreter p(i), producing the original data, X, as output

Method

Decompression for this concept is trivial and so doesn't require much discussion: take a program I and run it through the interpreter p(i), the output is the decompressed data.

The hard, and possibly impossible, part of this system is the compiler c(x). This mythical component has to turn data, X, into valid byte-code, I, in fewer bytes than it takes to just send X.

A naive (and possibly impractical) solution is to implement the compiler as a genetic algorithm: randomly generate byte-code and run it through the interpreter, then score each attempt based on a function of how closely their outputs match the data to be compressed and the length of the byte-code.

Obviously it would be better to find an approach less reliant on brute force!

In Practice? Astonishingly improbable.

I strongly suspect this is a fun but ultimately impossible idea. I can't imagine that you could reliably find simple programs that describe complex data. Ah well, fun thought.

Popular posts from this blog

Virtualised Build & Test Environments for Embedded Software

IMU-3000 and Hello World!