Please log in.
Before you can vote, you need to register. Please log in or create an account.
Computer: Storage: Filing System
Disk-level filing system   (+11, -2)  [vote for, against]
Run the filing system from inside the hard disk.

If you don't have the stamina for technical spiel, go elsewhere now. This is gonna be a long one.

I propose that the software for managing the filing system should run directly on the processor inside a hard disk, rather than as part of the operating sytem.

Details: A hard disk stores data in a single large access space. The filing system is the operating system's way of organizing that information, and the software for managing the filing system is stored in main memory and run on the CPU.

Hard disks already have a degree of internal processor power, used to cache the data that is read and written, and provide some small degree of safety from power-outages. I suggest this if this power was expanded, it could be used to run the filing system as well.

Why would you want to do this? Well, the CPU and hard disk sit at opposite ends of a chain of interfaces, including the North Bridge, South Brige and the IDE or SCSI controller. When you're managing an advanced filing system, a lot of information needs to be passed back and forth. Not only is this pipeline relatively slow, it also gets in the way of any other information being sent through the system.

Right now, there are dozens (if not hundreds) of filing systems in existence. Many of them duplicate the same job as each other on different operating systems, and many of them are out of date. We are reaching the stage where, with a little hard work and intelligence, we will soon be able to finalize on the best filing system out there, probably a hybrid of HFS+, BeFS and NTFS (sorry Linux, ext2fs just doesn't cut it). This ideal fs will include features like journalling, metadata and unique file IDs. It will probably take quite complex software to run though, meaning that small and innovative operating sytems will be unable to use its wonderful features.

If the filing system was managed by the hard disk, then any OS would be able to use it. It would be invulnerable to any blue screens of death in the main OS, and its performance would be guarunteed independent of CPU load.

How would this work? Start with these ingredients: - a hard disk internal controller with a few more MHz of processor power, and a few more k of memory. - either some flash memory in the hard disk casing, or simply a few k of space at the start of the disk - an agreed, unified protocol for accessing files and folders on a 'remote' filing system.

The software for running the filing system is kept either on flashable memory chip in the hard disk, or in a reserved space at the start of the disk. A flash chip is faster and more reliable, and also doesn't need to be loaded into the disk's RAM store, but is also an expensive item.

This software can be updated by the CPU whenever the disk is reformatted, but will generally not change very often. It's job is to manage all the details of the filing system, including journalling, caching and defragmentation. It presents a protocol to the CPU for reading and writing files, and manipulating the directory structure and file metadata (including forks if necessary).

As a result of this move, the connection between the CPU and the hard disk is used by far less information, ensuring that it doesn't clog up the motherboard for all the other components that have to use the same channels, such as main memory, network and other disks.

If you wish to bypass this you would always be able to revert to the existing system, where it is managed by the OS. A similar system could also be used for other systems like CD and DVD drives, since most of the formats are now well established.
-- sadie, Sep 08 2002

Ethernet as Bus Architecture http://www.halfbake...0Bus_20Architecture
Fits in well with this [sadie, Sep 08 2002, last modified Oct 05 2004]

NetApp http://www.netapp.com/
These guys, among plenty of others, built their business making roughly the product you describe (on a larger scale). [egnor, Sep 15 2002, last modified Oct 05 2004]

Would this mean that a layer of shared filespace would also go the way?
-- reensure, Sep 08 2002


Take it further . . . disaggregate the whole computer.
-- bristolz, Sep 08 2002


i was thinking of a network card that talked IP and zeroconf. But of course, TCP or UDP would still need to be managed (or at least configured) by the OS.
-- sadie, Sep 08 2002


(er, what do you mean, reensure?)
-- sadie, Sep 08 2002


(is it my imagination, or did an annotation by General something or other just vanish?)
-- sadie, Sep 08 2002


[phoenix]'s Law: Intelligence tends to move toward the periphery.

This idea will happen one day.
-- phoenix, Sep 09 2002


So, something like a file system API would be involved, where the OS would say 'Store this file in this folder' rather than 'put these bytes here'?

Sony inched a little in that direction with the MD Data drive* where the drive defined the file structure, not the OS.

*withdrawn because it didn't sell well -> didn't sell well because they crippled it to protect their music industry (and I'm not bitter about this, honest)
-- st3f, Sep 09 2002


[phoenix], I'm not so certain about that. The cost pressure of the computing business would tend to push for the elimination of dedicated hardware and the commonization of as many functions as possible onto one chip. Sure, you could build this thing, but unless it presented a significant system speed advantage to the user, it would never match the price competition.

Rayford's corallary to phoenix's law: business tends to move towards the center (and consequently away from intelligence).
-- RayfordSteele, Sep 09 2002


//hard disk internal controller with a few more MHz of processor power//

Finally, a use for all those leftover Pentium 166 chips!
-- Mr Burns, Sep 09 2002


I'm with [phoenix] (how heavenly) in that I think computing at the edge is where it's going to be.
-- bristolz, Sep 09 2002


Essentially what you're seeking is equivalent to storing your files on a file server separate from the machine on which you're running most of the software, except without the need for having another computer. Some definite advantages to such a system, though many of the most significant ones require the file server to have its own keyboard and screen. For example, the ability to have the file server keep access logs that cannot be hacked except by the file server's keyboard is great, but unless such logs can be viewed in a manner that cannot by hacked or corrupted externally it's of little benefit.
-- supercat, Sep 09 2002


What I meant, ¯sadie, was very simple as I see it. File management is not usually farmed out to an auxilary processor as might be the case with some processing tasks (domain name management is an example of which I'm aware). If it was, and if all users' files were indexed and logged on the drive, the operating system would have relatively more resources available to manage other things like active pages or CGI transactions.

Advantages:
1. Better use of bandwidth
1a. OS managed files become disk managed files.
1b. Background services move back further.
1c. Very heavy caching on the drive to mitigate an OS crash.

2. What ¯supercat said

3. Threads would not be held waiting on a disk action

Disadvantages:
1. Compatability, scalability, acceptability.
2. Means outrunning parallel processing on its home turf.
3. May be already baked in the form of hush-hush advanced embedded chipsets.

Don't expect any brilliant insight from me here, for my knowledge of a system “under the hood” is limited; nonetheless, I do see great potential in the configuration you're describing.
-- reensure, Sep 10 2002


// So, something like a file system API would be involved, where the OS would say 'Store this file in this folder' rather than 'put these bytes here'? //

Exactly. The problem, as pointed out by several people, is that the protocol would need to be agreed on, and it would also need to be 'flat' and unbiased, ie not corrupted with silly shit like DRM (which is really a higher OS function anyway, even you are stupid enough to want it applied to your music).

This idea will also allow for a much greater variety of filing systems to be developed, in drive-specific ways (such as storing specific files on the first platter, maybe), since nothing outside the drive need have any knowledge of the system.

[supercat]: // equivalent to storing your files on a file server... most advantages require the file server to have its own keyboard and screen //

Why should the server have a keyboard and mouse? All the necessary information is available through a public API, and I suspect there will be a private API if it's needed for accessing things that users shouldn't be allowed to access.

The biggest source of errors nowadays is the user. A component that is isolated from all of that will become inherently more stable and safe.

// 2. Means outrunning parallel processing on its home turf. //

Parallel processing requires a lot of work to manage the interaction of threads between the two processors, make sure memory access is always done safely, etc. This has the effects of: - software needs to be rewritten to take advantage of the processors, and is more complicated to write - a performance hit, especially for software that hasn't been written all that well (for example, the standard JVM for Windows will only execute on a single processor, even though the solaris one is properly multi-processed) - incompatible software with the multi-processing version of the OS (such as all games from Westwood).

I do think that widespread multi-processing will eventually be the way to go, but in either case this idea doesn't have any of those problems. Software doesn't become more complicated, it becomes simpler, because the pipeline for communication is small and clearly defined. And there's no reason parallel processing machine couldn't happily co-exist with this.

// May be already baked in the form of hush-hush advanced embedded chipsets. //

If nobody is selling it or will admit to its existence then, as far as the consumer space is concerned, it's not baked.
-- sadie, Sep 10 2002


This seems like adding a middle manager to the computer. Instead of the OS telling the HD what to do, this OS tells that OS what to do. I don't really see where anything would be saved by this.

<I'm seeing this as the old Windows Accelerator video cards, the ones that were supposed to speed <certain versions of> Windows up by containing the code for video access in memory on the card and working on it there, rather than working through the CPU. They never worked really well...>
-- StarChaser, Sep 11 2002


In a way, one piece of software already tells another piece of software what to do. This simply distributes the work load more sensibly. What you save is the long round trips that need to be made.

Imagine your boss asked you to write a document, but that he really wanted to write it himself. So every time you write something, you have to go out your office, down the hall, up the stairs and into his office to ask him. And then back again, type that bit of the document out, before going and asking for more. Dumb, isn't it?

This way, the boss (CPU) gives more control to you (the hard disk), meaning less work for both of you.
-- sadie, Sep 12 2002


The advantage of having a separate keyboard/screen for a file server is that it's possible to have all accesses (or at least all writes) to certain key files be logged. Such logging can be very useful for detecting viruses, trojans, and other malware. While such logging could certainly be done even without the separate keyboard and screen, it would be possible for malware to intercept attempts to access the log and filter out traces of its existence from the log reports.
-- supercat, Sep 12 2002


[RayfordSteele] "The cost pressure of the computing business would tend to push for the elimination of dedicated hardware and the commonization of as many functions as possible onto one chip."
My point. 'Commonization' turns it into a commodity. Being a commodity makes it cheap. Being cheap allows it to be used for jobs it wasn't intended. I can remember when a calculator cost >$500. Now, my microwave oven has more processing power. Tomorrow my stapler might be that intelligent.
-- phoenix, Sep 12 2002


Processing power swings back and forth between the center and the periphery. Anyway, if you want to buy a dedicated NFS server appliance, there are lots of companies happy to sell that to you. There are advantages and disadvantages, but they don't scale down very well; they're mostly useful on an organizational scale.
-- egnor, Sep 15 2002


USB and Firewire both offer block-level access to storage devices. The computer manages the filesystem; the device just reads and writes sector number whatever. Likewise for compact flash and pretty much everything else you'd ever plug into a PC or consumer device.
-- egnor, Sep 16 2002


Perhaps a "Snap Server (TM)"? Snap offers something like this in their product. Their servers already contain an O.S. and file system. Their only bottle neck is the Ethernet connection to the connecting computers.
-- namuh, Sep 16 2002


I'm not talking about people who can afford to run a separate file server, and i don't think an ethernet link in the way is going to deliver quite the performance improvements that i envisage.

All i want to do is more logically arrange the workload inside an ordinary consumer machine. I'm talking about the actual hard disk of a computer, the one that your operating system and applications sit on.
-- sadie, Sep 19 2002


I think you greatly exaggerate the impact of the round-trip time for block-level access compared to the round-trip time for filesystem access. These are fast interfaces.

Any real disk drive interface these days has a queueing system, anyway, so any number of outstanding requests can be queued up.
-- egnor, Sep 27 2002


Five years later... off-the-shelf NAS devices that look no larger than external disk drives, and talk SMB/NFS/etc are now commonplace.

retroactive bun.

G
-- gtoal, Aug 17 2007


I've never quite understood why file-level interfaces never caught on, especially with flash drives. Flash drives must be kept defragmented for optimal performance; if a flash drive kept its own file system, it could defragment itself continuously when idle, without imposing any significant overhead on the underlying machine. The only 'cost' would occur if the flash drive began an erase operation just before the PC wanted to read something; in that case, the PC would have to wait for the erasure to complete.

If a flash drive doesn't know what sector read/write requests 'mean' in terms of the overall disk structure, such lack of knowledge will greatly limit its ability to sequence operations effectively, especially if the file system must be kept coherent at all times.
-- supercat, Aug 18 2007


I think I own one of these (LaCie Etherdisk). On the enterprise side, you have EMC^2 arrays, which are standalone computers that serve as very fast redundant storage devices in your data center.

There are disadvantages such as not being able to have decent access control, encryption, etc.

Nonetheless, it's useful...
-- cowtamer, Aug 18 2007



random, halfbakery