h a l f b a k e r yThis ain't rocket surgery.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
I want this to be more widely applied than just the HB but
I've just spent twenty minutes looking for another category
and couldn't find one. It does, however, apply here as well
as elsewhere. I do not like this being in this category.
Anyway:
On the Halfbakery and elsewhere, particularly
on social
networking sites and fora, discussions frequently veer off-
topic at a rate of knots. Unfortunately there seems to be
no way of measuring this. Until now.
The Dewey Decimal System could be applied to smaller
texts than books, down perhaps to the level of individual
words. It should therefore be possible to assign Dewey
numbers to ideas and annotations here, and to posts and
replies, comments, retweets and so forth elsewhere.
Topicality is inversely proportional to the absolute value of
the difference between the Dewey numbers assigned to
consecutive comments. A line graph could be plotted of
these differences with reference to both the original
idea/post or consecutive annotations. This graph can then
be smoothed into a curve and a tangent line can be placed
upon this curve and measured.
Taking this slightly and perhaps implausibly further, an
algorithm could be plotted to attribute certain items of
vocabulary to Dewey classifications. For instance, if words
such as "electron" and "momentum" are used in a particular
text, it's probably about physics, and if "neodymium" and
"enthalpy" occur, it's probably chemistry, and so forth.
This can be done! There IS a way of crudely measuring the
tangent of off-topicality.
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Annotation:
|
|
What is the DD number for excellent? |
|
|
If you took every HB topic, and paired it with it's DD counterpart, multiplied by 3, and then divided by the corresponding LC (Library of Congress number), you would have a big frigging mess. |
|
|
However the various patent office classification numbers, could be used to make matters worse. |
|
|
Seriously, dance with the gal you brought to the dance. The HB classification system can tell you how far the discussion has wandered, can't it? |
|
|
The first level of HB classification is nominal so it won't work for big
veers. However, I'm definitely open to other systems than Dewey. |
|
|
I don't think you'd get a mess. It depends on the range of data you
consider. |
|
|
I have some very early results on this - but have had to
trim the number of topics down to the low hundreds
before running out of memory. |
|
|
Interestingly, when topics are further trimmed to consist
only of nouns, they seem to occupy more robust and
cohesive spaces, at least compared to topics consisting of
whistful, thoughtful or abstract notions. |
|
|
Grouping words together and collating the top-7 co-
occurring words to form clusternames yielded some
interesting and revealing topic clusters - so I've got that
part (sort of) working. |
|
|
Applying these clusters to annotations and classifying
each of them is then doable - however, how do you
measure "distance" between clusters? OK, so you could
assign a Dewey number to each one, and hope for the
best, but for example, it just isn't the case that Religion
(200) is closer to Technology (600) than Literature (800).
I'm sure there are other better examples than that. |
|
|
It *could* be the case that there exist super-clusters
where existing clusters have a higher likelihood of
association, however given the data available (i.e. the
halfbakery) I'm getting some nice clusters like { coffee-
cup-mug -tea-hot-heat-drink }, and {space-moon-earth -
orbit-mars-gravity-parking} and {ideas-halfbakery-idea -
bakers-croissant-hb-baked } and other "outlying" clusters
such as the rather dubious { sex-porn-condoms -bacon-ed-
hot-sexual } |
|
|
So the other problem with tangent off-topicality is
identifying the difference between a series of riffs on a
theme, each of which might include a degree of
offtopicality, vs a singular wildly tangential veer off into
realms new. It's a case of comparing apples and ¶. |
|
|
I find that most exquisite! |
|
|
Well, gods know we could use some sort of system for keeping tangentiality in check and preventing ideas from drifting off in wild and sundry directions. Speaking of wild and sundry directions, did I ever tell you about the Intercalary's work for (and I use the word "for" quite wrongly) the US Geological Survey in southern Kadugistan? It was just after the Kurdic Revolt, so obviously most lines of communication were down, and he had to fall back on the "yodelling herdsmen" to relay his information back to base camp. Of course, since he had only a rudimentary grasp of Kadugistani Yodellese, I suppose the misunderstanding that lead to the inadvertent deployment of two cruisers and a gunboat to the Altai mountains were as inevitable as they were regrettable. |
|
|
Surely some meta-data concerning the number of
links between items should be available and should
dictate the cloudspace to reduce the topological
complexity of crossings in whatever diagram shape
emerges. |
|
| |