As the demand for virtual machines grow, so will the need to be able to easily send them across the internet. The virtual disks(VMDs) which hold the virtual machine(VM) data are always huge, and current compression techniques are insufficient.
If you think about it, about 1GB of data in a VMD is taken up with the OS data. This data should be abstracted and compiled into a dictionary file which everyone would need to download once. Then the compressed VMD would simply be the VMD data minus the common data leaving the user data to be compressed using common methods.
EG. You have windows XP installed in a virtual machine. The VMD is 10GB, windows takes up 900MB, the page file is 2048MB, user data takes up 100MB, there is 7GB of free space. The transmitted VM file should be close to 100MB, depending on fragmentation and with a dictionary block size of 10MB, the over head would be about 8 (long pointer) * (900 / 10) bytes = 720 bytes.
Data is often defragmented over the VMD, so the dictionary would need to hold the common data in data blocks and the VMD should be defragmented to optimise the process (allow large blocks).
There would need to be different dictionaries for different OSes, OS versions and patch levels. Dictionaries could be built and updated incrementally with such heirachy. So for example there would be a root Windows dictionary which contains blocks common to all windows versions after windows 2000, then a Windows XP dictionary subset, then a SP2 dictionary subset, then an Update KB123456 subset.
The dictionary could also hold blocks from common applications such as SQL Server, Visual Studio which all take up a lot of space.
Dictionaries could also be built from an installation CD, rather than downloaded. For example, you would use a program plus the Windows XP SP3 CD to build a dictionary, allowing you to download smaller Windows XP SP3 virtual machines.
Such compression would also be a plus for backups. Full system backups are much easier to produce with VMs, but you still have to put the backup files somewhere. By pruning off the OS blocks from the VM, a generous amount of space is saved daily. Of course, in businesses the OS blocks would account for a smaller percentage of space, (especially with installed databases) than with say VM Slipstreamed applications.-- toadth, Oct 26 2008 Just to clarify, we are talking about an OS/HW system so poorly designed that you have to run several instances at the same time in case one crashes ?-- FlyingToaster, Oct 26 2008 I can do something similar with CompFUSEd and VMware Server (both free, in the case of VMware, non-commercial use only) on Linux.
You can choose between different compression algos, I find LZO2 works good, it doesn't compress very well but it compresses/decompresses very fast and that's what matters. Unfortunately there's a bug with CompFUSEd where it crashes when a file gets around 3 gigs or so, (it's SUPPOSED to support large files, but the author doesn't test thoroughly enough) and that's a problem. Hopefully that will be fixed soon.
It wouldn't be a good idea to try and have them all use the same OS data, that stuff is somewhat volatile (you can only share stuff that doesn't change, otherwise problems happen) and some guest operating systems have enough stability problems as it is.-- Spacecoyote, Oct 26 2008 Why is unused space included in your disk image? Who is it you propose be responsible for maintaining these OS images? Do you really trust the image to be untainted? Given the branching you're proposing, how many different images do you suppose will exist? What if you limit that question to just Windows XP? If I have Windows XP SP3 on CD, and your VM only takes up 100MB, why not just include the VM on the CD? What if I don't have Internet access? Welcome back - long time no see!
([FlyingToaster] - No, I believe he's talking about Windows.)-- phoenix, Oct 26 2008 This would make creating a new machine using a VM SOE very quick in a server farm instead of copying 2+GB and then using something like NewSID. Also backing up a snapshot could use far less space and time. [+]-- superjohn, Oct 27 2008 Now remember that this is half bakery. I might have been able to develop it, but I don't have time, so I'm not going to post a full working design.
Responses:
- Flyingtoaster: I don't know what you're trying to say
- Spacecoyote: I don't think compression speed is preferred when your transmitting a VM over the internet. Especially when your building one VM distro to be downloaded many times. Volatility isn't a problem - it's not lossless and blocks which don't match a dictionary are simply compressed using standard compression methods. I'll explain later below.
- Phoenix: Unused space is usually found on a virtual disk, I was simply acknowloding the fact - I show that it is not a problem with this method. Responsability - who ever builds the compression software, open source community or OS vendors. Untainted - I didn't go too much into specifics, but I will clear it up below. Branching - An implementation may typically only limit branching to service packs. Why not CD? - Well the whole point is to transmit over the internet - if the receiver has the windows XP sp3 cd on the other side they are saving the time to download data over the internet which they potentially already have got. No internet - Great! You can mail it instead by CD.
OK, now some more specifics to help you understand the mechanics better.
Rough Compression Steps:
- The entire file is hashed to ensure integrity on decompression.
- The target VMD would be read syncronously (in order).
- Blocks of data would be read (possibly in smaller block sizes than the dictionary block sizes) and hashed (say with MD5).
- The hash would be searched for in the Shared Dictionary set.
- When a hash matches a Dictionary record, the bytes would be matched one to one and the Dictionary record index would be written to the compressed output stream.
- Remaining data which didn't match a Dictionary record is compressed using other techniques (i know there are more, but let's just say we use rar). The regular compression technique type is indicated in the compresssed file header.
Building the dictionary (CD): - You may have a Ubuntu Linux distro iso
- After the iso is mounted, using a UDF filesystem parser, and then looking inside compressed archives, individual files can be found.
- If the file is smaller than the block size (lets use a smaller size - 1MB) (A multi-block size mode could also be empoyed), the whole file is only used as a dictionary record (the hash is calculated and stored)
- If the file is larger than the block size (multiple blocks are extracted, if the remaining data is smaller than the block size it is treated as before).
- The whole ISO is hashed and labelled with the UDF volume name. The user may also decide to give the Shared Dictionary a name and distribute it.
- Anyone with the same ISO will get the exact same Shared Dictionary.
Decompression: - Using the target Shared Dictionary MD5 hash stored in the compressed file header, the correct Shared Dictionary will be sought and used. If the target dicionary doesn't exist, an error will be displayed to the user (STOP).
- The compressed file will probably be read syncronously.
- As the body is read, two different sections will be enountered a dictionary section (DS) or regular compression section (RCS).
- In an RCS section, regular decompression techniques is used (according to indicated type found in the compressed file header)
- In a DS section, the dictionary record index is read and the data is retrieved from the target Shared Dictionary and written to the decompression stream
- The uncompressed file is hashed (MD5) and compared to the original hash in the compressed file header to ascertain success.
I could go in to a lot more details, and there could be many more enhancements, and techniques used, but i'm focusing on the mechnics of the Shared Dictionary.-- toadth, Oct 27 2008 If understand this right, then you are talking using a 'diff' of a VM to a 'well-known' VM (or well known 'parts' of a 'VM'), rather than holding the entire VM and storing these (much compressed) VMs in a giant database.
Great idea. [+]
As an aside, I wonder whether it could be used to reduce n/w traffic in general: say you want to download (say) a movie: could this movie be 'diffed' with a well-known s/w package (say Microsoft Word), which both you and the sender have ?
Could this be smaller than sending just the original file ?
I'm probably talking crap here...but in (pub)_theory_ it possible for whatever reason that strings of bytes might well appear in structured data in programs which could be used as words in a dictionary. The 'diff' could happen across more than one 'well-known' block-of-data (Word, linux kernel etc).-- monojohnny, Oct 27 2008 The dictionary would also be relatively huge. I doubt that bloatware is going to survive the arrival of cloud computing. Instead of large programs with largely unused tools and resources, small programs that can look up resources (like that damn korean font that i will never use) on demand will be the heart of all personal computing VM or otherwise. Software companies would raise holy hell if you turn their programs into raw ingredients and make them available directly to anyone who claims to "have" license. Furthermore making huge swaths of code public isn't going to make any money. I could go on....-- WcW, Oct 27 2008 Compression speed is important if this is to be used at run-time. You could send it over in, say, a bz2 file, and then decompress it, and recompress it to a readable/writeable LZO compressed partition, and run it from there.-- Spacecoyote, Oct 27 2008 Remember the most applicable target market is for VM transmission.
I was hoping that people would embrace the idea and throw in some more suggestions (like monojohnny), however I find that there are just hopeless critics out there. Instead of critisizing my design, why not make suggestions which would help it work? Isn't it what this site is for? To bake the idea, not burn it?
Reponses:
-Monojonny: Yes it's like diff. The best comparison is to a differential backup, but the initial backup (the Dictionary) is static. Also, there are network permutations of this, Microsofts DFS only sends the changes of files to minimise network traffic. Unfortunately you would not be able to diff a movie with say a s/w package - with a large block size (preferred), it is highly unprobable to match a block from a movie to a block in software. When compressing using my method you would need to select the most appropriate dictionary (for a Ubuntu VM, you would specify the Ubuntu 8.10 x86 ISO). Also, if a computer was able to hold TBs of general dictionary data, there would be so many dictionary records, that the record index would be too large to be useful - and not everyone is expected to all have that much space with all the same data.
-WcW - I don't know what you mean by bloatware, the compression software would be fairly small - say <10MB. It's the dictionaries that would be large - but you only need as many dictionaries as are used and these can be built on demand. Software companies (eg. Microsoft) wouldn't care if you used their installation CD to compress Windows VMs, it adds value for them, you're not breaching a license as it's the installation of the software which requires licensing. Don't know where cloud computing came from, how is that relevant? We're talking about making it easier to transmit a VM over the internet - that's the tool.
- Spacecoyote: No, compression size is the priority and then speed to compress. But, this compression method would still be fast, if not faster than most compression technology available today. Traditional compression, includes a dictionary of repeating bytes (in rar,zip). The dictionary is created by scanning the file and determining which data is repeating (which is computationally expensive), higher compression takes more time to find such repeating data, lower compression takes less care to give more speed. With a static shared dictionary, the dictionary building step is eliminated, furthermore, the dictionary may be built with efficient data structures to improve hash matching performance (maybe a balanced graph? - but who cares, as I said before there can be many enhancements to my idea to make it viable, but I want to just focus on the idea not the implementation). So run-time performance should be acheived. Yes, I'm sure traditional compression is still viable, it's a consumer choice which one gets used.
Some more specifics:
Dictionary Space: The ubuntu dictionary would probably take up about 1GB of space, but that's insignificant when TB hard disks are becoming prevelant. However there are ways to improve here too. When building a Dictionary say from CD or ISO, the resulting file need not include the data from the CD/ISO, rather it would contain the hashtables and meta data required to use the CD/ISO directly as a Dictionary function. This way, when decompressing, the receiving user simply needs to put in a CD/DVD containing the data. Also, the Dictionary Meta file may be supplied with the Compressed Data, as the meta file would be small enough to do so (say <50MB).
Marketing / Commercialisation: The software doesn't have to be developed as open source, that was just one of my suggestions. Here are some more for those limited in imagination.
1. A company builds the software (it'd prob. take a week), and sells the compressor for $20, and gives away the decompressor and dictionary builder.
2. University students build it and write a thesis about it, get their honours degree. Then sell it and become... thosandaires.
etc..
Applications: Here are some scenarios for my the intended use of the compression.
1. Pre-Installed Applicances (One to Many) - These are out there and are mostly linux based VMs. A software company will have software which they sell (say a linux based proxy server), and they will install it on the most appropriate linux flavor (say debian), they will configure it well and then make the VM downloadable from their website. They could offer the regular compressed download as well as a download which features the said compression method.
2. Multi-National Company (One-One, One-Many) - You have one single template VM (say Windows Server), which is maintained. You have many branches in a multi-national company - you have 4 data centers around the world. The template is only for VM creation. You often move and replicate VMs between datacentres. Each data center would have the Windows Server Dictionary - simple.
etc...
And yes you could substitute Windows Server for Linux or Unix or Solaris or BSD, it doesn't really matter it's just an example :)-- toadth, Oct 27 2008 Parts of this are becoming baked, for instance, in Vista when you get updates the server only sends diffs, not full updates like apt does in Debian-based Linuxes (Debian and co. are working on it, though, there is software to do that but it hasn't been deployed yet because it needs more testing).
Once that's off the ground all you have to do to bake this is have preinstalled images of base VMs (baked), and then deliver the "appliance diff" in the same form as an update package.-- Spacecoyote, Oct 27 2008 I've done some calculations to work out the best block size and the resulting overhead. Having the block size the same as the VMD file systems cluster size is ideal, as fragmentation would not be an issue.
Here's some numbers:
- Block size is 512B for a Windows XP VM (NTFS minimum cluster size)
- For 1GB (in disk space) of reference data (eg. Windows XP cd is reference data), there would be up to 2,097,152 individual records.
- With a pointer size of 4B, there can be 2 ^ 32 (records) x 512B (record size) = 2TB (max. ref data size). If all dictionary records are used, there would be 8MB of data in the compressed file.
- 4 : 512 = 1 : 128 (compression ratio). As a guide, the best traditional lossless compressor does so at about 1:7 this is (PAQ - see wiki) and that's on text only data!
- An index would be needed in the Dictionary to quickly find block matches. 16B (MD5) + 4B (record index) = 20B / record
- While decompressing the file, the decompressor needs to know whether a decomp block is decompressed using our method or traditional. This can be indicated with a bit / block. If a 1GB reference file is used and every reference block is used the max. overhead in the compressed file is raised by (2097152 / 8)B = 128KB.
- The dictionary would need to hold a hash of the ref data source (MD5 of all the data) (16B) plus a label (string < 128).
- For a meta dictionary (one which references the reference data from a CD or DVD etc.) record numbers need to be resolved to file positions. This will simply be an array of positions, with the index of the array relating to the dictionary record. If the reference data file is smaller than 4GB, an integer array will be used, otherwise a long array will be used (supporting up to 16,777,216TB). Also the original file name or source name would be indicated (string < 128B)
So using a 1GB reference file and assuming that all blocks are used in the compressed file:
- A complete dictionary would be 1GB + [2097152 (Record Count) * 20B (Index overhead)] + 16B (MD5) + 128B (Label) = 1GB 40MB 144B
- A meta dictionary would be [2097152 (Record Count) * 24B (Index and Record Lookup overhead)] + 16B (MD5) + 128B (Label) + 128B (Reference) = 48MB 272B
The meta dictionary is useful for distribution of a common shared dictionary. The user would download the meta dictionary, download the static compressed file and then be prompted for the external reference data (either the file, CD, DVD etc..)-- toadth, Oct 28 2008 Various random thoughts:
- An actual freshly installed VMD would be the best target for creating a Shared Dictionary - because not only is there data copied from the installation CD but data is generated from the setup application (like registry entries etc.), it is unknown how much data is generated. Freshly installed OSes in a VM are not identical and so when a Shared Dictionary is created, it would need to be distributed (not ideal).
- The hash index would need to be sorted for binary searching.
- Multiple Shared Dictionaries may be an option. Eg. The compressed file may use the Ubuntu 8.10 x86 Dictionary plus the mySQL vX.X Dictionary.-- toadth, Oct 28 2008 random, halfbakery