h a l f b a k e r ySuperficial Intelligence
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Consider this typo:
"ince"
Spelling correction suggestions could be "Inch, Inca, Since, Nice, Once, etc..."
I propose a keyboard-proximity filter (based on the system's current designated keyboard layout) to order them in the most likely typo order. O is very close to I on the keyboard, so:
Since,
Nice, Once are probably the most likely words.
Inca and Inch are much less likely to be typed by accident, and accidents form 90%+ of my spelling errors.
Add to that theme with some intra-sentence grammar checking and common word tagging, and spellchecking could be much more useful.
(How often does one really intend to use the words "tot he" in a sentence? it is more likely to be "to the", for example. Some systems auto-correct that automatically, though.)
US Patent 6,801,190
http://www.google.c...AAAAEBAJ&dq=6801190 [jutta, Aug 17 2009]
Context-sensitive spell check in Microsoft Office 2007
http://blogs.msdn.c...6/06/05/617653.aspx [jutta, Aug 17 2009]
Context-sensitive spell check in Google Wave
http://googlesystem...-spell-checker.html [jutta, Aug 17 2009]
Wikipedia: Damerau-Levenshtein distance
http://en.wikipedia...evenshtein_distance Edit distance with bells on. [jutta, Aug 17 2009]
Wikipedia: Needleman-Wunsch algorithm
http://en.wikipedia...an-Wunsch_algorithm This very clearly needs to be worked into a popular "dance craze" song, "Do the Levenshtein-Damerau Needleman-Wunsch". [jutta, Aug 17 2009]
[link]
|
|
All good ideas, and patented and implemented in a few systems. A patent- or literature-search for "spell checking algorithms" might be in order. |
|
|
The keyboard proximity thing is implemented, if one bothers with it, as a "confusion matrix" that, given two keys, tells you how likely they are to be confused. When computing the edit distance between two words (-> Levenshtein distance), instead of assigning equal probability for each substitution error, the confusion matrix is used to look up the possibility of this specific error. |
|
|
Well, duh... plainly obvious you mean Vince |
|
|
//Levenshtein// - so that's what it's called - I once wrote a program that was intended to act as an "engine" for ALL card games, from snap through Gin Rummy to any/all variants of Poker, with each ruleset defined as a (relatively easy to edit) xml file - the tricky part came during draw/replace scenarios, trying to get the machine to try to decide whether it had a good/bad enough hand to draw a card (and decide which one to burn in the process), and there are lots of routines that reference the Hamming distance between a given hand, and a target one (e.g. four of a kind, or a series of hearts, or a numeric sequence) that the program might have "wanted" - I'm now going to have to go back and rename some of my methods to usd the word "Levenshtein". |
|
|
//confusion matrix// - I'm pretty sure I can implement that myself without any algorithms. |
|
|
//I think spellcheckers should be programmed to deliberately fail every so many words, or even insert barely- noticeable typos whilst typing that won't show up on the finished-product spellcheck.// |
|
|
I find that happens already, as some errors form another word. |
|
|
examples:
your/you're
lose/loose
discrete/discreet |
|
|
(The first to are quite common on the net, and widely reviled.) |
|
|
One typo I often come across is a 'dyslexic' (no offense to dyslexic people) error - hitting the (theoretically) correct key with the wrong hand (eg. putting 'k' when you needed 'd').
<Pet peeve> People getting 'than' and 'then' mixed up! Grrr!</pp> |
|
|
//I can't believe editors, who used to have to earn their pay by proofreading, can cheat...// |
|
|
Yeah!! And what about those lazy sailors who use GPS to navigate..? |
|
|
//The first to are quite common// |
|
|
Was that "to" intentional? Yeah. Must've been. |
|
|
I know someone who frequently misuses "of" - as in must of, could of, should of etc - I don't have the heart to tell them. |
|
|
You must learn to Give In To Your Hate, [zen]. |
|
|
//Was that "to" intentional? Yeah. Must've been.// |
|
| |