h a l f b a k e r yFewer ducks than estimates indicate.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
There's something like 1500 idea categories on the Halfbakery, making it really hard to pick the right one from the menu on the "new idea" page.
I think that a little pattern-matching software could automatically pick categories for ideas, or at least make plausible suggestions.
The idea is
to use what's called a "naive Bayesian classifier." This is a fairly simple bit of software that would extract features from ideas and, using probabilities gleaned from a training set, assign a probability that the idea belongs in each possible category. It could display the top ten (say) as hints on the "new idea" page.
For a training set, we can just use the current Halfbakery database.
The main open question is what the feature set ought to be. Obvious candidates are words in the text of the idea, or better, their thesaurus categories. (Word pairs or triples might work even better.)
Bayesian classifier in Python
http://www.divmod.org/Reverend/ Sample code. [td, Oct 04 2004]
Classifying spam
http://www.paulgraham.com/spam.html This article, which is about statistically recognizing spam, has a good description of the naive Bayesian classifier buried in it. [td, Oct 04 2004]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Destination URL.
E.g., https://www.coffee.com/
Description (displayed with the short name and URL.)
|
|
I find the best way to get a category is simply to search for closely-related ideas. This also has other benefits. |
|
|
Searching for something closely-related works poorly for sufficiently weird ideas. This idea came to mind because searching wasn't helping me find an apposite category for Wasabi Nasal Spray. |
|
|
You can think of this idea as a (fairly sophisticated) search feature that works on the text of the submitted idea. |
|
|
Tom certainly does have a point though. Manually picking a category is nigh impossible these days. |
|
|
Not impossible, but tedious certainly. Fortunately I've got around the problem by not having any new ideas. |
|
|
I find picking the category to be half the fun, sometimes. Although often people disagree with my decision. |
|
|
This smacks of Windows XP to me. |
|
|
I really don't like it when computers try to be more intelligent than they are capable of being. |
|
|
Regarding what feature set to use... to get the fastest and simplest, if not necessarily most accurate, clasifier, just use the words themselves. Using their thesaurus categories *might* (or might not) increase accuracy (and would certainly reduce the database size), but it would decrease speed. Using word pairs or triples will certainly increase accuracy, but it will both decrease speed *and* increase the database size. |
|
|
I'd also like to make a suggestion on the user interface for it: Since the clasifier would reside on the HB server, any new idea (that the author wants to pick a category for) needs to be submitted to it, in order to be analysed. I would suggest that this be done via a "Suggest a Category" button. |
|
|
When pressed, the form data (including the summary and text of the idea) would go to the HB server, analysed by the clasifier, and then the new html page would have the *entire* category list sorted, with most likely category at the top, and least likely at the bottom. Another button, "Sort Categories Alphabetically" would send it back to it's normal order. |
|
|
If you just display the top 10, the author would still have to scroll through the category list to find the item, and select it; if the sorted list is in the <select> box, then it's easy to just click one. |
|
| |