Computer: Storage: Drive
layout optimisation for filesystems   (+2, -1)  [vote for, against]
lay out on disk data structures for faster traversal

Despite the dropping price of flash most computers still use a hard disk for storing files. They're are kind of slow though. Reading a randomly chosen block can take 5-10 milliseconds.

The problem is that hard disks aren't random access. There's a spinning disk and a moving arm. To read or write a block of data at some position on the platter, the arm has to move to the right offset and the disk has to spin round until the sliver of data passes under the read head to be read. When reading or writing blocks one after another my current disk can process 50,000 per second. When reading blocks randomly, It can barely manage 100 per second.

Modern filesystems are built mostly with trees. BTRFS (B-TRee FileSystem) is a filesystem named after the kind of tree it uses internally. A directory, for example, would have a start block. For small directories, all the data about the files might fit into a single block. For a large directory, that data will have to be split up into parts and stored in other blocks.

When your computer looks for a file in that directory, it will read the start block, figure out which of the linked blocks has the file information and then fetch that block. It might have to go another step and get another block if the directory is really big.

Trees are used for lots of other things in file systems and navigating them quickly is important to cut down on waiting time.

Since trees are usually accessed sequentially, following links in one block to find the next, we can lay the data out on the platter so that after reading a block, the blocks we might want next are nearly at the read head. The block gets read, the computer quickly sends a request for one of the next blocks in the tree and the hard disk moves the head a little bit and gets you the next block really fast. Repeat for however many levels there are in the tree.

The only barrier to doing this today is figuring out how hard disks lay out their blocks. This can be done by requesting blocks and seeing how long between the hard disk returning one and returning the other. This would be done for lots of blocks until the way the drive has them layed out can be mapped. It's going to be some variation of a spiral.

The end result would be a faster filesystem. And more performance for the software engineers to waste.
-- RichardT, Jun 08 2017

Hard disk block layout http://hddscan.com/...acks_and_Zones.html
how data is physically stored on disk [RichardT, Jun 08 2017]

Other optimisations http://pages.cs.wis.../OSTEP/file-ffs.pdf
Locality and The Fast File System [RichardT, Jun 08 2017]

Hmm ... laborious optimisation of a dying technology; [+]
-- pertinax, Jun 08 2017


Knowing the file and directory data's physical measurements before the data is generated will help this optimization no end.
-- wjt, Jun 10 2017


Thanks for posting an actual idea!

instead of a read write head you could have a read-write asterisk * so the reading and writing would be multiparallel on the disk. I am sure they have though about that, yet it seems a potentially cheap way to speed up hard disks 8x
-- beanangel, Jun 10 2017


//an actual use for it//

Wikipedia tells me it's all about scalability, so ...

Oh, wait. You were probably joking. Carry on.
-- pertinax, Jun 10 2017


// Modern filesystems are built mostly with trees. //

That's interesting. I thought they were built out of metal and plastic.
-- Cuit_au_Four, Jun 10 2017



random, halfbakery