h a l f b a k e r y"Bun is such a sad word, is it not?" -- Watt, "Waiting for Godot"
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
As the demand for virtual machines grow, so will the
need
to be able to easily send them across the internet. The
virtual disks(VMDs) which hold the virtual machine(VM)
data are always huge, and current compression
techniques
are insufficient.
If you think about it, about 1GB of data in
a VMD is
taken
up with the OS data. This data should be abstracted and
compiled into a dictionary file which everyone would
need
to download once. Then the compressed VMD would
simply
be the VMD data minus the common data leaving the
user
data to be compressed using common methods.
EG. You have windows XP installed in a virtual
machine. The VMD is 10GB, windows takes up 900MB,
the page file is 2048MB, user data takes up 100MB,
there is 7GB of free space. The transmitted VM file
should be close to 100MB, depending on fragmentation
and with a dictionary block size of 10MB, the over head
would be about 8 (long pointer) * (900 / 10) bytes = 720
bytes.
Data is often defragmented over the VMD, so the
dictionary would need to hold the common data in data
blocks and the VMD should be defragmented to optimise
the process (allow large blocks).
There would need to be different dictionaries for
different
OSes, OS versions and patch levels. Dictionaries could
be
built and updated incrementally with such heirachy. So
for
example there would be a root Windows dictionary which
contains blocks common to all windows versions after
windows 2000, then a Windows XP dictionary subset,
then
a SP2 dictionary subset, then an Update KB123456
subset.
The dictionary could also hold blocks from common
applications such as SQL Server, Visual Studio which all
take up a lot of space.
Dictionaries could also be built from an installation CD,
rather than downloaded. For example, you would use a
program plus the Windows XP SP3 CD to build a
dictionary, allowing you to download smaller Windows XP
SP3 virtual machines.
Such compression would also be a plus for backups. Full
system backups are much easier to produce with VMs,
but you still have to put the backup files somewhere. By
pruning off the OS blocks from the VM, a generous
amount of space is saved daily. Of course, in businesses
the OS blocks would account for a smaller percentage of
space, (especially with installed databases) than with
say VM Slipstreamed applications.
[link]
|
|
Just to clarify, we are talking about an OS/HW system so poorly designed that you have to run several instances at the same time in case one crashes ? |
|
|
I can do something similar with CompFUSEd and VMware Server (both free, in the case of VMware, non-commercial use only) on Linux. |
|
|
You can choose between different compression algos, I find LZO2 works good, it doesn't compress very well but it compresses/decompresses very fast and that's what matters. Unfortunately there's a bug with CompFUSEd where it crashes when a file gets around 3 gigs or so, (it's SUPPOSED to support large files, but the author doesn't test thoroughly enough) and that's a problem. Hopefully that will be fixed soon. |
|
|
It wouldn't be a good idea to try and have them all use the same OS data, that stuff is somewhat volatile (you can only share stuff that doesn't change, otherwise problems happen) and some guest operating systems have enough stability problems as it is. |
|
|
Why is unused space included in your disk image?
Who is it you propose be responsible for maintaining these OS images?
Do you really trust the image to be untainted?
Given the branching you're proposing, how many different images do you suppose will exist? What if you limit that question to just Windows XP?
If I have Windows XP SP3 on CD, and your VM only takes up 100MB, why not just include the VM on the CD?
What if I don't have Internet access?
Welcome back - long time no see! |
|
|
([FlyingToaster] - No, I believe he's talking about Windows.) |
|
|
This would make creating a new machine using a VM SOE very quick in a server farm instead of copying 2+GB and then using something like NewSID.
Also backing up a snapshot could use far less space and time. [+] |
|
|
Now remember that this is half bakery. I might have been able to develop it, but I don't have time, so I'm not going to post a full working design. |
|
|
- Flyingtoaster: I don't know what you're trying to say |
|
|
- Spacecoyote: I don't think compression speed is preferred when your transmitting a VM over the internet. Especially when your building one VM distro to be downloaded many times. Volatility isn't a problem - it's not lossless and blocks which don't match a dictionary are simply compressed using standard compression methods. I'll explain later below. |
|
|
- Phoenix: Unused space is usually found on a virtual disk, I was simply acknowloding the fact - I show that it is not a problem with this method. Responsability - who ever builds the compression software, open source community or OS vendors. Untainted - I didn't go too much into specifics, but I will clear it up below. Branching - An implementation may typically only limit branching to service packs. Why not CD? - Well the whole point is to transmit over the internet - if the receiver has the windows XP sp3 cd on the other side they are saving the time to download data over the internet which they potentially already have got. No internet - Great! You can mail it instead by CD. |
|
|
OK, now some more specifics to help you understand the mechanics better. |
|
|
- The entire file is hashed to ensure integrity on decompression. |
|
|
- The target VMD would be read syncronously (in order). |
|
|
- Blocks of data would be read (possibly in smaller block sizes than the dictionary block sizes) and hashed (say with MD5). |
|
|
- The hash would be searched for in the Shared Dictionary set. |
|
|
- When a hash matches a Dictionary record, the bytes would be matched one to one and the Dictionary record index would be written to the compressed output stream. |
|
|
- Remaining data which didn't match a Dictionary record is compressed using other techniques (i know there are more, but let's just say we use rar). The regular compression technique type is indicated in the compresssed file header. |
|
|
Building the dictionary (CD):
- You may have a Ubuntu Linux distro iso |
|
|
- After the iso is mounted, using a UDF filesystem parser, and then looking inside compressed archives, individual files can be found. |
|
|
- If the file is smaller than the block size (lets use a smaller size - 1MB) (A multi-block size mode could also be empoyed), the whole file is only used as a dictionary record (the hash is calculated and stored) |
|
|
- If the file is larger than the block size (multiple blocks are extracted, if the remaining data is smaller than the block size it is treated as before). |
|
|
- The whole ISO is hashed and labelled with the UDF volume name. The user may also decide to give the Shared Dictionary a name and distribute it. |
|
|
- Anyone with the same ISO will get the exact same Shared Dictionary. |
|
|
Decompression:
- Using the target Shared Dictionary MD5 hash stored in the compressed file header, the correct Shared Dictionary will be sought and used. If the target dicionary doesn't exist, an error will be displayed to the user (STOP). |
|
|
- The compressed file will probably be read syncronously. |
|
|
- As the body is read, two different sections will be enountered a dictionary section (DS) or regular compression section (RCS). |
|
|
- In an RCS section, regular decompression techniques is used (according to indicated type found in the compressed file header) |
|
|
- In a DS section, the dictionary record index is read and the data is retrieved from the target Shared Dictionary and written to the decompression stream |
|
|
- The uncompressed file is hashed (MD5) and compared to the original hash in the compressed file header to ascertain success. |
|
|
I could go in to a lot more details, and there could be many more enhancements, and techniques used, but i'm focusing on the mechnics of the Shared Dictionary. |
|
|
If understand this right, then you are talking using a 'diff' of a VM to a 'well-known' VM (or well known 'parts' of a 'VM'), rather than holding the entire VM and storing these (much compressed) VMs in a giant database. |
|
|
As an aside, I wonder whether it could be used to reduce n/w traffic in general: say you want to download (say) a movie: could this movie be 'diffed' with a well-known s/w package (say Microsoft Word), which both you and the sender have ? |
|
|
Could this be smaller than sending just the original file ? |
|
|
I'm probably talking crap here...but in (pub)_theory_ it possible for whatever reason that strings of bytes might well appear in structured data in programs which could be used as words in a dictionary. The 'diff' could happen across more than one 'well-known' block-of-data (Word, linux kernel etc). |
|
|
The dictionary would also be relatively huge. I doubt that bloatware is going to survive the arrival of cloud computing. Instead of large programs with largely unused tools and resources, small programs that can look up resources (like that damn korean font that i will never use) on demand will be the heart of all personal computing VM or otherwise. Software companies would raise holy hell if you turn their programs into raw ingredients and make them available directly to anyone who claims to "have" license. Furthermore making huge swaths of code public isn't going to make any money. I could go on.... |
|
|
Compression speed is important if this is to be used at run-time. You could send it over in, say, a bz2 file, and then decompress it, and recompress it to a readable/writeable LZO compressed partition, and run it from there. |
|
|
Remember the most applicable target market is for
VM transmission. |
|
|
I was hoping that people would embrace the idea
and throw in some more suggestions (like
monojohnny), however I find that there are just
hopeless critics out there. Instead of critisizing
my design, why not make suggestions which
would help it work? Isn't it what this site is for? To
bake the idea, not burn it? |
|
|
-Monojonny: Yes it's like diff. The best comparison
is to a differential backup, but the initial backup
(the Dictionary) is static. Also, there are network
permutations of this, Microsofts DFS only sends
the changes of files to minimise network traffic.
Unfortunately you would not be able to diff a
movie with say a s/w package - with a large block
size
(preferred), it is highly unprobable to match a
block from a movie to a block in software. When
compressing using my method you would need to
select the most appropriate dictionary (for a
Ubuntu VM, you would specify the Ubuntu 8.10
x86 ISO). Also, if a computer was able to hold TBs
of general dictionary data, there would be so
many dictionary records, that the record index
would be too large to be useful - and not
everyone is expected to all have that much space
with all the same data. |
|
|
-WcW - I don't know what you mean by bloatware,
the compression software would be fairly small -
say <10MB. It's the dictionaries that would be
large - but you only need as many dictionaries as
are used and these can be built on demand.
Software companies (eg. Microsoft)
wouldn't care if you used their installation CD to
compress Windows VMs, it adds value for them,
you're not breaching a license as it's the
installation of the software which requires
licensing. Don't know where cloud computing
came from, how is that relevant? We're talking
about making it easier to transmit a VM over the
internet - that's the tool. |
|
|
- Spacecoyote: No, compression size is the
priority and then speed to compress. But, this
compression method would still be fast, if not
faster than most compression technology available
today. Traditional compression, includes a
dictionary of repeating bytes (in rar,zip). The
dictionary is created by scanning the file and
determining which data is repeating (which is
computationally expensive), higher compression
takes more time to find such repeating data,
lower compression takes less care to give more
speed. With a static shared dictionary, the
dictionary building step is eliminated,
furthermore, the dictionary may be built with
efficient data structures to improve hash
matching performance (maybe a balanced graph? -
but who cares, as I said before there can be many
enhancements to my idea to make it viable, but I
want to just focus on the idea not the
implementation). So run-time performance should
be acheived. Yes, I'm sure traditional compression
is still viable, it's a consumer choice which one
gets used. |
|
|
Dictionary Space: The ubuntu dictionary would
probably take up about 1GB of space, but that's
insignificant when TB hard disks are becoming
prevelant. However there are ways to improve
here too. When building a Dictionary say from CD
or ISO, the resulting file need not include the data
from the CD/ISO, rather it would contain the
hashtables and meta data required to use the
CD/ISO directly as a Dictionary function. This
way, when decompressing, the receiving user
simply needs to put in a CD/DVD containing the
data. Also, the Dictionary Meta file may be
supplied with the Compressed Data, as the meta
file would be small enough to do so (say <50MB). |
|
|
Marketing / Commercialisation: The software
doesn't have to be developed as open source,
that was just one of my suggestions. Here are
some more for those limited in imagination. |
|
|
1. A company builds the software (it'd prob. take a
week), and sells the compressor for $20, and gives
away the decompressor and dictionary builder. |
|
|
2. University students build it and write a thesis
about it, get their honours degree. Then sell it
and become... thosandaires. |
|
|
Applications: Here are some scenarios for my the
intended use of the compression. |
|
|
1. Pre-Installed Applicances (One to Many) - These
are out there and are mostly linux based VMs. A
software company will have software which they
sell (say a linux based proxy server), and they will
install it on the most appropriate linux flavor (say
debian), they will configure it well and then make
the VM downloadable from their website. They
could offer the regular compressed download as
well as a download which features the said
compression method. |
|
|
2. Multi-National Company (One-One, One-Many) -
You have one single template VM (say Windows
Server), which is maintained. You have many
branches in a multi-national company - you have 4
data centers around the world. The template is
only for VM creation. You often move and
replicate VMs between datacentres. Each data
center would have the Windows Server Dictionary
- simple. |
|
|
And yes you could substitute Windows Server for
Linux or Unix or Solaris or BSD, it doesn't really
matter it's just an example :) |
|
|
Parts of this are becoming baked, for instance, in Vista when you get updates the server only sends diffs, not full updates like apt does in Debian-based Linuxes (Debian and co. are working on it, though, there is software to do that but it hasn't been deployed yet because it needs more testing). |
|
|
Once that's off the ground all you have to do to bake this is have preinstalled images of base VMs (baked), and then deliver the "appliance diff" in the same form as an update package. |
|
|
I've done some calculations to work out the best
block size and the resulting overhead. Having the
block size the same as the VMD file systems
cluster size is ideal, as fragmentation would not
be an issue. |
|
|
- Block size is 512B for a Windows XP VM (NTFS
minimum cluster size) |
|
|
- For 1GB (in disk space) of reference data (eg.
Windows XP cd is reference data), there would be
up to 2,097,152 individual records. |
|
|
- With a pointer size of 4B, there can be 2 ^ 32
(records) x 512B (record size) = 2TB (max. ref data
size). If all dictionary records are used, there
would be 8MB of data in the compressed file. |
|
|
- 4 : 512 = 1 : 128 (compression ratio). As a guide,
the best traditional lossless compressor does so at
about 1:7 this is (PAQ - see wiki) and that's on
text only data! |
|
|
- An index would be needed in the Dictionary to
quickly find block matches. 16B (MD5) + 4B (record
index) = 20B / record |
|
|
- While decompressing the file, the decompressor
needs to know whether a decomp block is
decompressed using our method or traditional.
This can be indicated with a bit / block. If a 1GB
reference file is used and every reference block is
used the max. overhead in the compressed file is
raised by (2097152 / 8)B = 128KB. |
|
|
- The dictionary would need to hold a hash of the
ref data source (MD5 of all the data) (16B) plus a
label (string < 128). |
|
|
- For a meta dictionary (one which references the
reference data from a CD or DVD etc.) record
numbers need to be resolved to file positions.
This will simply be an array of positions, with the
index of the array relating to the dictionary
record. If the reference data file is smaller than
4GB, an integer array will be used, otherwise a
long array will be used (supporting up to
16,777,216TB). Also the original file name or
source name would be indicated (string < 128B) |
|
|
So using a 1GB reference file and assuming that all
blocks are used in the compressed file: |
|
|
- A complete dictionary would be 1GB + [2097152
(Record Count) * 20B (Index overhead)] + 16B
(MD5) + 128B (Label) = 1GB 40MB 144B |
|
|
- A meta dictionary would be [2097152 (Record
Count) * 24B (Index and Record Lookup overhead)]
+ 16B (MD5) + 128B (Label) + 128B (Reference) =
48MB 272B |
|
|
The meta dictionary is useful for distribution of a
common shared dictionary. The user would
download the meta dictionary, download the
static compressed file and then be prompted for
the external reference data (either the file, CD,
DVD etc..) |
|
|
- An actual freshly installed VMD would be the
best target for creating a Shared Dictionary -
because not only is there data copied from the
installation CD but data is generated from the
setup application (like registry entries etc.), it is
unknown how much data is generated. Freshly
installed OSes in a VM are not identical and so
when a Shared Dictionary is created, it would
need to be distributed (not ideal). |
|
|
- The hash index would need to be sorted for
binary searching. |
|
|
- Multiple Shared Dictionaries may be an option.
Eg. The compressed file may use the Ubuntu 8.10
x86 Dictionary plus the mySQL vX.X Dictionary. |
|
| |