Halfbakery: Decompress on the Fly

Computer: Compression
Decompress on the Fly (+4, -1) [vote for, against]
Decompress ZIP/RAR/etc files on the fly as they're downloaded

Sometimes when you're downloading a relatively large compressed file, it would be nice to not have to wait for the file to decompress when you're done. Since comparatively little CPU time is used during the actual download, I propose that schema be put into place to allow the file to decompress on the fly as it is downloaded. I'm willing to bet this idea is baked in some software that updates itself, but I've yet to see it for custom user downloads.
-- kevinthenerd, Jun 22 2010

Downloading & Un-tarring in one step http://www.howtogee...arring-in-one-step/
"Curl" and "Pipe" the download into the tar command on a linux system. [Jinbish, Jun 24 2010]

This may have less to do with compression, more with application architecture. You'll want applications to deal with documents arriving piece-by-piece. But a program that just displays some document is much easier to write than one that displays some document, then receives notifications of new parts having arrived and augments its display.

(For a first try at this, the document is a file tree, and the application is a file browser - but from there, you quickly get to wanting to start compiling the tree, or watching the movie, etc.)

There are all kinds of issues here from document format design (put an index on top; put thumbnails in front; sign and encrypt blocks, not the whole thing...) to UI design (make sure things don't move around too much as new ones arrive).
-- jutta, Jun 22 2010

I dig [jutta]'s explanation - this tyro likes this kind of talk. But I was hoping for something I could use while scuba diving.
-- normzone, Jun 22 2010

Given that it's possible to compress data on the fly (eg, to zip format), it must be possible to decompress that way too.
-- Loris, Jun 22 2010

Isn't the method used to transmit DAB signals compression and decompression on the fly? Come to think of it, if someone were to write some kind of lossy speech compression algorithm for some kind of cellular radio system, might it not count as decompression on the fly, and might it not therefore be baked by, i dunno, someone or other?
-- nineteenthly, Jun 23 2010

//...might it not therefore be baked by, i dunno, someone or other?//

Yes. Many 'streaming' formats are decompressed as they are received. But kevinthenerd wants decompression specifically of zip archives (etc), presumably on his web-browser.
-- Loris, Jun 23 2010

[jutta] - your argument seems to be based in part on the assumption that the document is to be displayed as it's decompressed. But from my reading, Mr. Thenerd is just proposing to decompress on the fly, rather than waiting until the whole thing is downloaded. Then the document would, presumably, be opened if requested.

I appreciate that there are still issued with decompressing an unfinished file, but presumably there are at least some file formats for which this would be possible?

(On the other hand, decompression is so much faster than downloading that perhaps this wouldn't save much time.)
-- MaxwellBuchanan, Jun 23 2010

The thing about that, [Loris], is that DAB stuff is supposedly MP2, isn't it? MP3s are maybe not decompressible on the fly because of, er, something. So one's the kind of compression which can be handled like that and the other not. Since i know nothing as usual, ZIP would presumably imply that the file is structured with certain essential parts in the wrong place to do this, just as MP3 presumably also is. However, what would happen if instead of being one big ZIP file, it was in lots of smaller portions? Would that not then be decompressible and presumably also bigger? Say they're divided into pieces which each take about a second to download, depending on the bandwidth. Would that make them a whole lot bigger?
-- nineteenthly, Jun 24 2010

Whoa, there.

Firstly, the idea is about bundling the download process with decompression (by unzipping, or untarring, or whatever). You can do this, pretty much, in a Linux system (link).

Now, to do this we have to establish whether we can start to apply the decompression with only a portion of the received file. This may or may not be possible. This isn't just about unravelling the compression algorithm, this is about the way that the archive is organised. This is the underlying point that [jutta] makes. It depends on how the compression/decompression algorithm works and how the archiving has been done. Some files can be accessed and used directly from a ZIP archive...

Secondly, [nineteenthly] points out that compression/decompression happens on the fly anyway - and this is true. However, the example given here is continuous *streams* of data (DAB radio). For any sensible communications stream, you have to use coding schemes for compression and error protection that you can decode as the stream comes in. This is different from the downloading a discrete file.
{Incidentally, DAB uses the MP2 128kbps audio codec}

If a big ZIP file was, instead, compressed as a series of smaller files then the result will be bigger - probably by "quite a lot". The larger the source to be compressed, the greater potential to be compressed. Not to mention that each seperate file would require it's own file format gubbins like formatting and some kind of checksum or CRC.
-- Jinbish, Jun 24 2010

So [Jinbish], what's needed then is for some kind of compression algorithm for which this can be done on the fly, and the question is then, are there useful compression algorithms for which this can be done? For instance, some kind of video codec used for streaming could be used for a series of still images, but it'd be quite inefficient, and an audio file could also be done, but for it to be done losslessly i think a new algorithm would be needed, though as i said i know nothing.
-- nineteenthly, Jun 24 2010

What I am saying is that this can be done - you can start to decode files as they arrive.
I'm also saying that a continuous stream with an undefined end is a different challenge than a distinct file. There is a time issue - a stream has to be dealt with quickly enough to make it useful, with accuracy not an absolute necessity. A file has to be completely accurate, but a couple of seconds here or there isn't important.
-- Jinbish, Jun 24 2010

Not all files need to be accurate. Text benefits from accuracy, source code almost has to be accurate and it's very unlikely that machine code files would work at all if they were inaccurate, but audio, video and photographic images are often not accurate and it doesn't matter.

The scenario in my head for this is something like a big PDF or Postscript file with either lots of scanned pages or photo-style images, and i can imagine a situation where for some reason the data for an illustration taking up a quarter of the first page is at the end of the file, so you get the text on the first page with a blank space which eventually gets filled in with that image.

I can see it'd be possible in the way you describe. To be picky though, even a stream can be useful if it buffers a lot, provided you're patient and you pause it.
-- nineteenthly, Jun 24 2010

random, halfbakery