Please log in.
Before you can vote, you need to register. Please log in or create an account.
Computer: Compression
Containerized Self-Extracting Archives   (+5)  [vote for, against]
Unprivileged byte code for decompression

The 7-zip executable and DLLs on my Windows machine are two megabytes in total, and the 7-zip algorithms provide some of the best compression and decompression out there. Not everyone has 7-zip installed, but that's not a big deal; in most compression applications, carrying this 2MB of code would be a negligible cost, so someone two or three decades ago came up with the brilliant idea of a self- extracting archive. With that technology, you can provide someone an archive with arbitrarily good algorithms without requiring the end user to install a particular decompression utility. The critical problem with a self- extracting archive is that opening an executable file from an unknown source carries the risk of malware infection.

Taking a lesson from web browser scripting, I propose a universal compression container wherein an arbitrary executable is given access ONLY to a target folder, forbidding access to memory, devices, and files outside of the necessary scope. Someone more clever than me might even be able to write a wrapper for an existing self- extracting archive to provide this containerization. (You COULD already do this with a type 2 hypervisor if you'd like, such as VirtualBox, but that's not convenient.) The key advantage is that an AI engine would be able to use a genetic algorithm to produce an optimally reduced archive and then provide within the archive an arbitrary executable for decompression, not relying on any existing format. No further improvements in compression technology would render this new format obsolete because such improvements would simply be represented within this format. (How long has the EXE file been around?)
-- kevinthenerd, Dec 21 2017

The contained decompression program can fail to terminate, or it can "blow up" and attempt to allocate infinite memory. In a Turing- complete language it is impossible to prove whether or not an arbitrary program will do this without actually running it and seeing what happens.

You may be able to get rid of this possibility by defining a non- Turing-complete bytecode language, but I expect this will limit the scope of the decompression algorithms that it will implement.

"Proof-carrying code" techniques might help, too, though I'm not aware of them being used for anything outside academia.
-- Wrongfellow, Dec 21 2017


I don't understand the technicalities well enough, but I do applaud the notion of having a "quarantine folder" for suspect files, such that they can be contained whilst they're assessed.
-- MaxwellBuchanan, Dec 21 2017


[wrongfellow] At applications like video, which might have frames, wouldn't it be fairly easy to see if the potentially "blown up" nonterminating data was still a video frame?

This could bring continuously improving AI algorithms to video on demand. I like that as just making file compression twice as good, and computers 10x better could fit most things people do into the "cheap data" rung of post network neutrality.
-- beanangel, Dec 21 2017


[Max]: I think [kevinthenerd] was talking about quarantining the decompression process rather than the newly downloaded files themselves, but yes, this is a very valuable way to treat untrusted files - in the extreme, you could use a physically separate computer, but modern computers can simulate other computers (per Turing- completeness) and it's silly not to look for security benefits from doing this.

[beanie]: //the potentially "blown up" nonterminating data// cannot in general be interpreted without running the program to completion and allowing it to produce its output - which is sometimes impossible.
-- Wrongfellow, Dec 21 2017


So what you want is a program which is universally installed, for the case when you can't arrange for a standard program to be present?
-- Loris, Dec 22 2017



random, halfbakery