h a l f b a k e r yAlas, poor spelling!
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
GUBOID
Globally Unique Binary Object ID | |
Define GUBOID = MD5 hash of any binary object.
A GUBOID would be guaranteed, statistically speaking, to uniquely identify any digital resource across space and time.
Uses:
1) Ever notice that not ever MP3 is created equally? Some rippers are crap, some people are idiots etc. A GUBOID could
be used to identify every MP3 file. People could then rate MP3 files based on quality of reproduction without having to download and listen to them. Ratings would be associated with GUBOID.
2) Software, particularly shareware and warez, is spread out all over the damn place. Every repository indexes their stuff differently making it hard to find stuff your looking for. So GUBOID every thing. Then when a friend tells you about some great stuff he can tell or email you the GUBOID and you can go to your favorite or closest resource and find it directly.
Not a new idea
http://www.aspencry...om/task_fileid.html A quick net search turned up this link. Not as half-baked as I thought. ;) [GeneticCrypto, Sep 16 2000, last modified Oct 04 2004]
The Bitzi Bitcollider Utility
http://www.bitzi.com/bitcollider/ "It examines the file, calculating a distinctive digital fingerprint, or bitprint, and taking note of some other identifying information that can be extracted from the file, like file length and the local filename." The site includes reviews of specific mp3 files. [Jim, Sep 16 2000]
[link]
|
|
I think this would be unlikely for pirated stuff...would people -really- want to attach their ID to an illegal copy of something? |
|
|
StarChaser, GUBOIDs (or MD5 checksums, as I'd call them) don't refer to people, just to the objects they move around. They don't give you any more information than seeing the object itself. |
|
|
Thats right. If two different people hashed identical files the GUBOID would be the same. In the case of MP3s, where many different people do their own ripping, the hash values would all be unique. |
|
|
Ah, so desu. I thought it was like a machine/user ID, like the P3's have. |
|
|
I imagine some of those Freenet / Gnutella / MojoNation / unscalable-but-incredibly-trendy P2P systems must work this way. (At least, it would be pretty stupid if at least one of them didn't...) |
|
|
Note that MD5 in particular has been thrown in some doubt as a cryptographically strong hash. (It hasn't been *broken* per se, but I think the cool kids are using SHA-1 or something, these days.) |
|
|
Yes, those pesky cool kids. The desire was not so much to have cryptographically irreproducible values as much as it would be nice to have a succinct way to (almost) uniquely identify a bundle of binary data such as a file. |
|
|
I have come up with a specific application where this would be nice to have: When doing file searches on Napster GUBOIDs would eliminate duplicate files and/or prioritize them by conenction speed. I often dl a file, find that its a poor reproduction, go back a find another version to dl only to find that it was the same one, but at a different location. |
|
|
This could also be helpful in reducing dupes because some butthead named the file wrong. If there are 20 files with the same GUBOID, and 18 have one name, and 2 have another, it'd make sense if -- even if you picked one of the 2 different ones -- the downloading app named it after the other 18 on your HD. |
|
|
GUBOID Application: Anti-virus software could scan a file, GUBOID it if virus free and list the GUBOID in a globally accessible GUBOID index as being certified virus free. People could then re-GUBOID a file at any time and check in the global index for whether it is listed as virus free. |
|
|
The critical necessity as pointed out by egnor is that the has algorithm needs to be as cryptographically secure as possible. SHA-1 seems fine, MD5 seems fine (still). Come to think of it a double pass of MD5 using the first pass hash value as the key in the second pass should dispell any of its weaknesses at the cost of extra processing time, however. |
|
|
Don't assume that extra passes improve the security of a hash algorithm. If anything, that tends to reduce the strength of the algorithm. (Unless you have a good reason to believe this works for MD5?) |
|
|
Very true. It is possible to overshuffle a deck of cards for example. MD5, like a good hash function has a serial bit correlation of very close to 0.5 which is all that would be required here. The difference in the cryptographic strength between MD5 and SHA is most likely due to the length of the hash values (128 vs 160 bits). |
|
|
Signing documents is a very well known field.
The real magic occurs with classifiers. Using a
basic standards based lexicon and attaching type
tree's (see wordnet) to the object allows us to
see it's canonical form and some approximation
of it's variance. |
|
|
Could be a useful way to show how things are mirrored across the web. So, if a file is available for download in two places and both publish the GUBOID, you can use both servers to do a simultaneous download. |
|
| |