h a l f b a k e r yA riddle wrapped in a mystery inside a rich, flaky crust
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Sometimes when you're downloading a relatively large
compressed file, it would be nice to not have to wait for the
file to decompress when you're done. Since comparatively
little CPU time is used during the actual download, I propose
that schema be put into place to allow the file to decompress
on the fly as it is downloaded. I'm willing to bet this idea is
baked in some software that updates itself, but I've yet to
see it for custom user downloads.
Downloading & Un-tarring in one step
http://www.howtogee...arring-in-one-step/ "Curl" and "Pipe" the download into the tar command on a linux system. [Jinbish, Jun 24 2010]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Annotation:
|
|
This may have less to do with compression, more with application architecture. You'll want applications to deal with documents arriving piece-by-piece. But a program that just displays some document is much easier to write than one that displays some document, then receives notifications of new parts having arrived and augments its display. |
|
|
(For a first try at this, the document is a file tree, and the application is a file browser - but from there, you quickly
get to wanting to start compiling the tree, or watching the movie, etc.) |
|
|
There are all kinds of issues here from document format design (put an index on top; put thumbnails in front; sign and encrypt blocks, not the whole thing...) to UI design (make sure things don't move around too much as new ones arrive). |
|
|
I dig [jutta]'s explanation - this tyro likes this kind of talk. But I was hoping for something I could use while scuba diving. |
|
|
Given that it's possible to compress data on the fly (eg, to zip format), it must be possible to decompress that way too. |
|
|
Isn't the method used to transmit DAB signals compression and decompression on the fly? Come to think of it, if someone were to write some kind of lossy speech compression algorithm for some kind of cellular radio system, might it not count as decompression on the fly, and might it not therefore be baked by, i dunno, someone or other? |
|
|
//...might it not therefore be baked by, i dunno, someone or other?// |
|
|
Yes. Many 'streaming' formats are decompressed as they are received. But kevinthenerd wants decompression specifically of zip archives (etc), presumably on his web-browser. |
|
|
[jutta] - your argument seems to be based in part on the
assumption that the document is to be displayed as it's
decompressed. But from my reading, Mr. Thenerd is just
proposing to decompress on the fly, rather than waiting
until the whole thing is downloaded. Then the document
would, presumably, be opened if requested. |
|
|
I appreciate that there are still issued with decompressing
an unfinished file, but presumably there are at least some
file formats for which this would be possible? |
|
|
(On the other hand, decompression is so much faster than
downloading that perhaps this wouldn't save much time.) |
|
|
The thing about that, [Loris], is that DAB stuff is supposedly MP2, isn't it? MP3s are maybe not decompressible on the fly because of, er, something. So one's the kind of compression which can be handled like that and the other not. Since i know nothing as usual, ZIP would presumably imply that the file is structured with certain essential parts in the wrong place to do this, just as MP3 presumably also is. However, what would happen if instead of being one big ZIP file, it was in lots of smaller portions? Would that not then be decompressible and presumably also bigger? Say they're divided into pieces which each take about a second to download, depending on the bandwidth. Would that make them a whole lot bigger? |
|
|
Firstly, the idea is about bundling the download process with decompression (by unzipping, or untarring, or whatever). You can do this, pretty much, in a Linux system (link). |
|
|
Now, to do this we have to establish whether we can start to apply the decompression with only a portion of the received file. This may or may not be possible. This isn't just about unravelling the compression algorithm, this is about the way that the archive is organised. This is the underlying point that [jutta] makes. It depends on how the compression/decompression algorithm works and how the archiving has been done. Some files can be accessed and used directly from a ZIP archive... |
|
|
Secondly, [nineteenthly] points out that compression/decompression happens on the fly anyway - and this is true. However, the example given here is continuous *streams* of data (DAB radio). For any sensible communications stream, you have to use coding schemes for compression and error protection that you can decode as the stream comes in. This is different from the downloading a discrete file. {Incidentally, DAB uses the MP2 128kbps audio codec} |
|
|
If a big ZIP file was, instead, compressed as a series of smaller files then the result will be bigger - probably by "quite a lot". The larger the source to be compressed, the greater potential to be compressed. Not to mention that each seperate file would require it's own file format gubbins like formatting and some kind of checksum or CRC. |
|
|
So [Jinbish], what's needed then is for some kind of
compression algorithm for which this can be done on the fly,
and the question is then, are there useful compression
algorithms for which this can be done? For instance, some
kind of video codec used for streaming could be used for a
series of still images, but it'd be quite inefficient, and an audio
file could also be done, but for it to be done losslessly i think a
new algorithm would be needed, though as i said i know
nothing. |
|
|
What I am saying is that this can be done - you can start to decode files as they arrive.
I'm also saying that a continuous stream with an undefined end is a different challenge than a distinct file. There is a time issue - a stream has to be dealt with quickly enough to make it useful, with accuracy not an absolute necessity. A file has to be completely accurate, but a couple of seconds here or there isn't important. |
|
|
Not all files need to be accurate. Text benefits from
accuracy, source code almost has to be accurate and it's
very unlikely that machine code files would work at all if
they were inaccurate, but audio, video and photographic
images are often not accurate and it doesn't matter. |
|
|
The scenario in my head for this is something like a big
PDF or Postscript file with either lots of scanned pages or
photo-style images, and i can imagine a situation where for
some reason the data for an illustration taking up a quarter
of the first page is at the end of the file, so you get the text
on the first page with a blank space which eventually gets
filled in with that image. |
|
|
I can see it'd be possible in the way you describe. To be
picky though, even a stream can be useful if it buffers a lot,
provided you're patient and you pause it. |
|
| |