Halfbakery: internet attached storage

Please log in.

Before you can vote, you need to register. Please log in or create an account.

Computer: Web Technology
internet attached storage (+1, -2) [vote for, against]
NAS with legs

The problem:

1) Directories and files don't represent the structure of modern data very well any more.

2) Databases aren't standard enough or user-friendly enough for general consumption.

3) Web applications that manage our data for us in modern ways take the storage, and the ownership, of the data out of our hands.

The solution:

Application-enabled internet-attached storage (IAS). IAS devices are basically hard-drives, probably with a Linux running on them. Each drive is divided into zones, and each zone has an application that manages the data on that zone.

Possible shapes of the solution:

1) Google storage appliance: Google publishes software, and doesn't store data. Anyone can obtain the software to load onto a 'google-partition' on their IAS, and become a google node. Google becomes privacy-friendly, decentralized, and scaleable. (Probably ISPs would take over the role of the google-datacentre, but anyone could). Same goes for all the social networking sites.

2) Freenet and other filesharing networks would be ideal clients for IAS.

Benefits:

1) Reduction in the number of single-points of failure around the internet.

2) Ownership of data could be reflected in its physical location better.

3) A greater proliferation of mass-user-base applications: datacentre costs would evaporate as users could provide their own storage. Possibly datacentres would become less popular. Which is good, as they're terrible power users.

4) Users could have better control over their data, and it could be more secure, against attack or accident.

5) The age of the web application can really begin. Web applications should be just that: applications. Up till now web applications have been bound up with remote storage. That's a mistake in so many ways.
-- conskeptical, Jun 24 2007

Wikipedia: Content management system http://en.wikipedia...t_management_system
[jutta, Jun 24 2007]

(Tangent.) Power use of computers http://www.climatesaverscomputing.org/
Industry group working on lower-power servers, PCs. Current technology is incredibly wasteful. [jutta, Jun 24 2007]

(?) (Tangent.) Google power provisioning. http://labs.google....er_provisioning.pdf
I just heard a talk about that. [jutta, Jun 24 2007]

Isn't this already available all over the place? Maybe I have misunderstood.
-- wagster, Jun 24 2007

Decentralized, web-accessible, application-layer storage is called a "content management system"; there are several to choose from, and lots of enterprise applications run on top of it without anything centralized other than the seller's tech support.

What's new here is combining a centralized application with decentralized storage. That makes sense if there's a benefit to having a centralized index; if you don't want that (and you seem to argue against it), why bother centralizing at all?
-- jutta, Jun 24 2007

<feels old and out of touch>
-- wagster, Jun 24 2007

[jutta]

1) We can still have indexes (or any other data) centrally accessible, even if they aren't centrally stored.

2) Central app with no central data storage and no remote storage can also make sense: it means the user doesn't have to keep copies of software, or know how to install them, the user is always up-to-date, the app-provider doesn't have to support old versions, better OS independence etc. etc. It might also make subscription models easier, possibly help sidestep piracy etc. etc.
-- conskeptical, Jun 24 2007

If there is no central index, how do you know where to look for it?

The further away indexes are from the software that looks things up in them, the slower the query gets.

Much as you probably don't like keeping your data on other people's servers, I really don't like it when vendors change or break applications without me pushing a button first. These are my tools, I've grown used to how they work. So, I wouldn't like a software distribution model that makes that easier.

I also don't think installing application software is beyond the reach of most computer users - certainly much, much less so than managing, provisioning, and backing up a data store.

The power used by data centers is minuscule compared to the power that would be wasted if each of the users of the data center's services ran their own service instead.

<hands wagster a glass of milk and a cookie.>
-- jutta, Jun 24 2007

[jutta] - yes OK, of course there will need to be some sort of central index in the scenario you talk about. But it really could be as simple as a few pointers to main external indexes.

For high performance we can use caching. For good security, we can have secure, sensibly expiring caches. But the points of truth shouldn't have to be tied down by performance concerns: they should be footloose.

As for changing or breaking applications: that's what specifications and APIs are for. If you want to nurture your own idiosyncratic set of tools, that dies when your house burns down or you accidentally delete them, that's up to you (and virtually your only option at the moment). But if you want to take part in a persistent, well managed, ecosystem of tools and tool users, then we need to connect together. Take the google maps API for example: you can develop against any release of it, despite it being centrally served, it's a nice system.

Installing app software every now and again is fine for users. But auto-web-update is testament to how we don't really want to install software. The less messing I have to do as a user, the more work I can get done.

Same goes with the data store, that's what the whole point of this IAS is. Where before google took care of storing my google office documents, which gave me peace of mind for data integrity if not data security, now I can buy a google IAS box, or allow google a slice of my existing IAS setup, and let google manage my data integrity without actually physically housing my data. I own my data, even if someone else manages it. Much nicer. In an ideal world there could even be data integrity/security provided by a freenet style of encrypted partitions spread around the world storing people's files remotely, anonymously and securely, in a you scratch my back i'll scratch yours type of way.

For remote data management systems to work like this and be trusted obviously they would have to be open-source and verifiable...

As for datacentre power: as users we do run our own services as well: NAS is finding its way everywhere, and local hard-discs are extremely common. In certain use-cases datacentres are as well as, not instead of. Datacentres shouldn't proliferate beyond their sensible use when there are other data storage pools that could be effectively used.
-- conskeptical, Jun 24 2007

(Glossary: NAS -- Network-Attached Storage, for example a NetApp file server.)

// As for datacentre power: as users we do run our own services as well
That's true. But we only run them when we are using the data. If we were our own data center, we'd have to run whenever anyone might want to use the data.
-- jutta, Jun 24 2007

// If we were our own data center, we'd have to run whenever anyone might want to use the data.

That's true. But I have a lot of personal data for my eyes only, that I would still be happy for someone else to manage for me.

And there no reason why there couldn't be a P2P system whereby my public data was spread across multiple user's machines (and theirs on mine) to provide datacentre grade availability.
-- conskeptical, Jun 24 2007

I'd have thought that the ideal solution would be to allow a section of your available hard-disk to be added to a huge, global RAID system - all your data is stored both locally, and massively redundantly, in distributed locations - each non-local 'chunk' too small to individually pose any security issues, but distributed widely enough that despite thermo-nuclear annihilation of one or more minor population centres should still allow its retrieval.

Similarly, with widely distributed data, read-write becomes faster, assuming you can utilise a broad bandwidth, because instead of one machine trying to do a 1GB write, you've got 1000 machines doing a 100KB each, simultaneously (assuming you want a widely redundant system - which you should since some people switch off their computers sometimes). Of course, you need to have a way to keep track of all this data - the index files are going to be huge - but then, these too, could be stored in pieces, a meta-index telling you where those bits are, and a meta-meta-index allowing you to locate those bits.

I think this is a good idea - but I don't see how problems 1) and 2) have got anything to do with it. We're talking physical mechanics/architecture of storage/application here - individually, both of which have been thought of before - nothing to do with databases or the usage of a traditional os file/directory paradigm.

Finally, the question of ownership vs management - I think you have to be careful about where to draw the line. Say I 'own' some data (i.e. it's on my personal hard drive) but someone else has the index. Without rebuilding an index of my own (which I may not be able to do) how do I retrieve my data? Despite owning it, without access to the meta-data I need to retrieve it, it becomes really difficult to access.

I don't mind a distributed management model - something where I, as a node, take part in generating as well as retrieving the indexes/meta-data. I think this completely decentralised model is going to be the future – completely free from corporate control, and maintained completely by participants of the system. It requires a lot more intra-network traffic to operate, but it’s so much more robust and performant(if that’s a word) than what we have today.

Furthermore - it's a similar model to the way information systems appear to work in nature - massively distributed, self-assembling (based on performance) and relying upon layer-upon layer of meta data.
-- zen_tom, Jun 25 2007

//the way information systems appear to work in nature // - <David Attenborough>"...the sight of this herd of SQL databases taking part in their annual migration is truly majestic..."</DA>
-- hippo, Jun 25 2007

<munches contentedly>
-- wagster, Jun 25 2007

I studied the theme 'Complexity', in particularly a book by Stefan or Stephen KAUFMANN,.. might be what this is about , ??,.
-- sirau, Jun 01 2011

random, halfbakery