Define GUBOID = MD5 hash of any binary object.
A GUBOID would be guaranteed, statistically speaking, to uniquely identify any digital resource across space and time.
Uses: 1) Ever notice that not ever MP3 is created equally? Some rippers are crap, some people are idiots etc. A GUBOID could be used to identify every MP3 file. People could then rate MP3 files based on quality of reproduction without having to download and listen to them. Ratings would be associated with GUBOID.
2) Software, particularly shareware and warez, is spread out all over the damn place. Every repository indexes their stuff differently making it hard to find stuff your looking for. So GUBOID every thing. Then when a friend tells you about some great stuff he can tell or email you the GUBOID and you can go to your favorite or closest resource and find it directly.-- GeneticCrypto, Sep 16 2000 Not a new idea http://www.aspencry...om/task_fileid.htmlA quick net search turned up this link. Not as half-baked as I thought. ;) [GeneticCrypto, Sep 16 2000, last modified Oct 04 2004] The Bitzi Bitcollider Utility http://www.bitzi.com/bitcollider/"It examines the file, calculating a distinctive digital fingerprint, or bitprint, and taking note of some other identifying information that can be extracted from the file, like file length and the local filename." The site includes reviews of specific mp3 files. [Jim, Sep 16 2000] I think this would be unlikely for pirated stuff...would people -really- want to attach their ID to an illegal copy of something?-- StarChaser, Sep 16 2000 StarChaser, GUBOIDs (or MD5 checksums, as I'd call them) don't refer to people, just to the objects they move around. They don't give you any more information than seeing the object itself.-- jutta, Sep 16 2000 Thats right. If two different people hashed identical files the GUBOID would be the same. In the case of MP3s, where many different people do their own ripping, the hash values would all be unique.-- GeneticCrypto, Sep 16 2000 Ah, so desu. I thought it was like a machine/user ID, like the P3's have.-- StarChaser, Sep 16 2000 I imagine some of those Freenet / Gnutella / MojoNation / unscalable-but-incredibly-trendy P2P systems must work this way. (At least, it would be pretty stupid if at least one of them didn't...)
Note that MD5 in particular has been thrown in some doubt as a cryptographically strong hash. (It hasn't been *broken* per se, but I think the cool kids are using SHA-1 or something, these days.)-- egnor, Sep 16 2000 Yes, those pesky cool kids. The desire was not so much to have cryptographically irreproducible values as much as it would be nice to have a succinct way to (almost) uniquely identify a bundle of binary data such as a file.
I have come up with a specific application where this would be nice to have: When doing file searches on Napster GUBOIDs would eliminate duplicate files and/or prioritize them by conenction speed. I often dl a file, find that its a poor reproduction, go back a find another version to dl only to find that it was the same one, but at a different location.-- GeneticCrypto, Sep 17 2000 This could also be helpful in reducing dupes because some butthead named the file wrong. If there are 20 files with the same GUBOID, and 18 have one name, and 2 have another, it'd make sense if -- even if you picked one of the 2 different ones -- the downloading app named it after the other 18 on your HD.-- ZediWarrior, Sep 18 2000 GUBOID Application: Anti-virus software could scan a file, GUBOID it if virus free and list the GUBOID in a globally accessible GUBOID index as being certified virus free. People could then re-GUBOID a file at any time and check in the global index for whether it is listed as virus free.
The critical necessity as pointed out by egnor is that the has algorithm needs to be as cryptographically secure as possible. SHA-1 seems fine, MD5 seems fine (still). Come to think of it a double pass of MD5 using the first pass hash value as the key in the second pass should dispell any of its weaknesses at the cost of extra processing time, however.-- GeneticCrypto, Sep 18 2000 Don't assume that extra passes improve the security of a hash algorithm. If anything, that tends to reduce the strength of the algorithm. (Unless you have a good reason to believe this works for MD5?)-- egnor, Sep 18 2000 Very true. It is possible to overshuffle a deck of cards for example. MD5, like a good hash function has a serial bit correlation of very close to 0.5 which is all that would be required here. The difference in the cryptographic strength between MD5 and SHA is most likely due to the length of the hash values (128 vs 160 bits).-- GeneticCrypto, Sep 18 2000 Signing documents is a very well known field. The real magic occurs with classifiers. Using a basic standards based lexicon and attaching type tree's (see wordnet) to the object allows us to see it's canonical form and some approximation of it's variance.-- nanomid, Jan 29 2001 Could be a useful way to show how things are mirrored across the web. So, if a file is available for download in two places and both publish the GUBOID, you can use both servers to do a simultaneous download.
Also, I love the name.-- stevec2, Jan 26 2004 random, halfbakery