Halfbakery: Filesystemdb

Computer: Network: Sharing
Filesystemdb (+2) [vote for, against]
One system, one credentialling scheme, fast searches and large storage

I don't know how to explain this. I am not an it professional.

I will half explain it.

Filesystems are great ways to share data, and can easily be mounted on all kinds of systems. But their queries are slow, oh so slow. Probably due to the unstructured hierarchical component of them and the mass of individual entities and the actual storage access itself.. But if you pair a filesystem with a database you get the best of both worlds. With clever indexing, filesystems can be searched very fast.

However, the credentialling red tape to access a database is often separate and convoluted, when you take into account access tokens and sso and web apis. But why is this if we are just talking about data storage and retrieval at the end of the day.

So, this idea is to do everything through a filesystem interface, easily mounted on all kinds of systems. If you can mount the 'drive' you are guaranteed access to heavy filesystem data plus a fast database.

Normal ls, find, etc accesses normal files. But to access the paired database, you ls a special subfolder in any folder.

Roughly equivalent to a rest call like curl etc.

Ls /usr/tmp to get tmp file list Ls -ltr /usr/tmp/.db/searchkey/searchvalue/ would be similar to select * from /usr/tmp where searchkey=searchvalue and path startswith '/usr/tmp

Or maybe instead of that .sqlite style files are used but they are special and are not actually files; they are managed by a database system. Opening a file handle to that file actually opens a secure connection to a database. Lucene or sql searches then give very fast results.

But the key thing is that access is granted directly. Access between filesystem and database is 1:1, tokens used for one give access to the other, groups and ownership shared between the two.

Surely this is done already but I haven't seen it. Aws roles.may give access to both db and fs at the same time? But that is not an sso scheme.

Why do filesystems and databases be at odds with each other, why can't they just be one thing?

I did ask chatgpt about this and it mentioned hadoop but it also gave a command that had a traditional url in it.

Halfbaking over, here is some nice raw chicken pies!
-- mylodon, Feb 23 2023

Filesystems ain't filesystems. For example, FAT32 vs. NTFS vs. the journaling filesystems favoured by Linux.

Now, I think the way to do this idea is to put the indexing database inside the filesystem itself, making a new type of filesystem. So, when you formatted a drive with MylodonFS, space would be reserved for that dedicated DB, and when the O/S queried the FS about available space, it would return a smaller number, to make allowance for the associated index entries.

That might be awkward for unfamiliar users, who would be thinking "I had 10 MB free, then I added a 1MB file, and now I only have 8.8 MB" (supposing that you needed 0.2 MB to add indexing for a 1MB file - the actual number might vary widely).

But, if you could get past that, the other main hurdle would be to secure O/S support for MylodonFS. You could start by adding it to Linux. If it caught on in that community then MS would probably add support for it twenty years later, while pretending it had been their idea all along, and Apple would be a bit quicker off the mark, but their version would only work on their devices.
-- pertinax, Feb 26 2023

Microsoft Vista Longhorn was meant to have a filesystem database.

The supposed problem was IO performance.

Ancient systems used record based rather than file bytes APIs and maybe mainframes do still too!

I think this is a good idea it's just getting buy in from big tech and operating system vendors. you need people to target your alternative file system.
-- chronological, Feb 26 2023

The difference between this and longhorn is that the structure and schemas and entries in the database would not necessarily be tied to what was on disk. Only the access is important. The database could be very sparse and lightweight. Contents arbitrary.

So you could write 10 billion images to disk, and then write a single entry to the database with your favorite picture. So you could do a search and get that result super fast.

For image sequences say if you have 10 minutes of video stored as jpegs it is a waste to index anything other then the sequence pattern. or larger structures, you only need to index potentially only a folder.
-- mylodon, Feb 27 2023

random, halfbakery