Link from Non-Fragging OS

I just spent 3 days dealing with a problem on my most- frequently-used computer. A message appeared saying that I was running dangerously low on disk space. It was true. So I erased a bunch of unnecessary files and moved some other things off that drive onto a different one. But, a few hours later, the same message popped up again!

Something more permanent was needed. The Operating System was willing to compress all the files on the drive, which would approximately halve the amount of space used. OK. But before I did that, I decided to "defrag" the drive first. Those who know what I'm about to describe can skip the next three paragraphs.

There is a rather standardized/generic method that computer Operating Systems use, with respect to organizing files on a storage device. The total available storage space is divided into equal-sized segments, and a special table is used to record which segments are being used to hold which files. Large files often need many storage segments, and this is where "fragmentation" enters the picture.

The process for saving a file starts by looking for the first available storage-segment, as recorded in the special table. If multiple adjacent segments are available, fine. But if not, the table is able to indicate that the file that is located in this storage-segment continues in another distant storage-segment. It works well, and efficiently uses the total storage-space on a drive, which is why most computer Operating Systems use that method.

However, just because storage-space is used efficiently, that doesn't mean it is ACCESSED efficiently. If the read/write "head" of a disk drive has to go here and then there and then 'way over there to access all the "fragments" of a total file, that is NOT efficient. "Defrag" programs exist to re-write the data on the disk, so that most files end up using contiguous storage-segments. It can take a while to do this (I just spent more than 2 days running defrag programs over and over again, because of limited available space for shuffling files about.)

So, what if the Operating System used a modified filing method? All it really needs is a second special table. This table keeps track of the sizes of all the available contiguous storage-segments (basically the same thing as the file sizes). If some file is modified such that it becomes larger than the space into which it originally fit, the OS does NOT put most of the file here, and the rest of it over there. It puts ALL of the file over there, into a space big enough to hold it, and marks the space where it used to sit as "available contiguous storage-segments", to be used for the next file that can fit in that space.

The short-term result will of course be a lot of "holes" in the first special table, and so one could argue that this is not very space-efficient in terms of overall disk usage, even if it is very efficient in terms of accessing the files. However, a modest modification to the preceding could help reduce the problem, involving some more special data....

Using the same second special table, it would also record the "distance" (in terms of storage-segments" each file is located from a "hole" (doesn't matter if hole precedes or follows the file; distances can be negative or positive). Upon moving the too-big-to-fit file to a place where it can fit, the OS now does a simple search for another file --the nearer a hole the better-- that can fit in the just-available space, and moves it to there.

The net effect of that is that holes tend to be consolidated. If the Operating System did the preceding thing just twice for every file that got too big and was moved 'way over there, In The Long Run the total number of holes in the first special table will, statistically, probably seldom exceed a certain number (to be determined). If the OS did the thing described in the previous paragraph three times instead of twice (moving a third file into the space opened up when the second file was moved), the statistical number of holes will go down. Move 4 files, and more-hole consolidation happens, and again the total number of holes tends to go down. Possibly, that first file, moved 'way over there, can be moved "back", into the space where one of those other files (plus- adjacent-hole!) used to be.

All we need to decide is what sort of trade-off do we want, between minimizing the numbers of holes in the first special table, and how many files we want to shuffle any time one file gets too big, when the goal is to keep files non-fragmented.