h a l f b a k e r yi v n i n seeks n e t o
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Back in the days of small disk drives and tiny RAM I had the idea of a spelling checker that worked statistically. Analysis of all English words would produce some sort of feasibility scale for any given group of letters. So when you type ptye it would be flagged as extremely unlikely. A bit of thought
made it obvious that this idea was truly half-baked.
But now we have huge disks, masses of RAM and the whole web to analyse and I want to start and really do it.
It would provide marketing bods with a measure of how English a new word is. (Or for a new lager - how German, for perfume - how French, and so on.)
And I have a strange desire to set up a website full of non-real words, probably starting with four or five letter words that are the most likely to be real English words but somehow dont exist, and watch Google index them all. (It would be great for Scrabble if you could persuade your opponents to accept the web as the reference source. Of course that word exists, here it is on this site.)
PS So far as I could find, 'trome' did not exist as an English word until . . . . now. It means an arbitrary grouping of letters don't form an existing English word but could reasonably be thought to be one.
(?) Shannon's n-th order English approximations
http://www.stanford...py_of_english_9.htm (at end). [rmutt, Sep 28 2001, last modified Oct 17 2004]
Probability for Linguists
http://humanities.u...ial/Probability.htm Congrats, snagger, you've hit on the basics behind practically all speech recognition, machine translation, and cryptographic systems. [rmutt, Sep 28 2001, last modified Oct 17 2004]
Random Word Generator
http://www.fourteenminutes.com/fun/words/ Expand your vocabularly beyond existing words. [calum, Oct 13 2002]
List of real exons and the corresponding predictions by trome database
http://bioinformati...hio-state.edu/FEGB/ Medical word. [gtoal, Feb 27 2006]
Bloom filter
http://www.gtoal.co...mes/hash/hashtest.c Found one! I knew I had this code somewhere [gtoal, Feb 27 2006]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Destination URL.
E.g., https://www.coffee.com/
Description (displayed with the short name and URL.)
|
|
Not so halfbaked after all, although the application isn't the one you're thinking of. |
|
|
Newer versions of Microsoft Word use three-letter group statistics to detect which language a text is in (to determine which language to spell-check
against.) |
|
|
jutta: Gads. The insidious bastards--now Word can tell me I'm misspelling in French when I don't even know the language and all I want to do is abbreviate Office of Uninformed Unpleasant Idiots. Bah! |
|
|
//spelling checker that worked statistically// |
|
|
A long time ago in a galaxy far, far away I used to own a number of CP/M-based computers, which were (by modern standards) quite resource constrained. I can't recall the name of the darn thing, but there WAS a spelling checker on that platform that worked that way, since it was impossible to store any sort of useful dictionary on a 180K floppy. |
|
|
<sigh> Showing my age, I guess. |
|
|
I think people are missing this one entirely. |
|
|
I reckon this is a fantastic idea. Not to mention fairly easy to do... I think the easiest way would be this: Set up a neural net. Now, train this neural net via whatever algorithm (genetic is probably easiest) with random words from a simple dictionary. Once you've trained it sufficiently that it can hit all real English words with a rating of over 90% Probably English, let it loose. You could feed it random strings of letters until it found one it liked. |
|
|
Actually, this is a brilliant thing. I'm off to make one! Now, anyone know anything about Neural Nets in PHP? Drop me an email imagin8or@despammed.com |
|
|
alcomme
polictor
coatilted
shiffic
longintend
immedies
freelying
twainter
inabinicato
savialsence
decisible
hereoplegit
explicity
revisix
substizon
veryptinue
severnegot |
|
|
This is some output from a program I wrote. It simply makes words out of other words, matching up groupings of letters. It'd probably work better if it used statistics, but it generates an interesting word about 1/3 of the time. |
|
|
I once wrote a statistical analyser
for text in BBC Basic. You'd feed it
a bunch of text files, then it would
spit out randomly generated words
based on the distributions in the
text. I only had it working on
letter pairs (ie given any single
letter, spit out a probable next
letter) so most of the words were
gibberish. I never got around to
building the third-order version.
With a little culling, it worked OK
for my purposes - generating
place names for a D&D campaign. |
|
|
Yes, I was a nerd. So sue me. |
|
|
// Newer versions of Microsoft Word use three-letter group statistics to detect which language a text is in (to determine which language to spell-check against.)// |
|
|
And that's why my version of word always assumes that my resume is in French, thus making it impossible to run a spell check. Word is ass. |
|
|
I'm sorry, did I say that out loud? |
|
|
Actually, I like this idea. It would have the problem of adapting to English's adaptability, in that we've got so many words from other languages that I doubt any word could be considered "English." Other than that, it's golden. |
|
|
Word is fantastic. Your resume is ass. |
|
|
Wrong and right, in that order. |
|
|
// A long time ago in a galaxy far, far away I used to own a number of CP/M-based computers, which were (by modern standards) quite resource constrained. I can't recall the name of the darn thing, but there WAS a spelling checker on that platform that worked that way, since it was impossible to store any sort of useful dictionary on a 180K floppy. // krelnik - I wrote one that worked like that for the BBC micro to do a *rough* check of user plays in a Scrabble game. They're called "Bloom filters". Terrible error rate, more of an intellectual curiosity than serious code. I think the first real spelling checker for the DEC10 used a Bloom filter however. |
|
|
Googling 'trome' now sends you here. I can't wait to use it in a sentence. Frabjous! |
|
|
Big pharma will pay you a lot of honkel for this. |
|
|
//So when you type ptye it would be flagged as extremely unlikely.// |
|
|
I don't think that's any less likely than 'pterydactyl'. |
|
|
I mean, just last week, out for a nice dinner, I wore a suit and a ptye. |
|
|
It may not be commonly known, but all airway navaid intersections have 5 or 6 letter pronouncable names, and they're charted on the aeronautical charts. I'd bet that 'trome' is one of these somewhere. |
|
|
Also, found this on a site -- apparently about Star Wars...
"Trome, Notha:
this depot manager worked for Starway Services, in Taldaak Station. It was Trome, under orders from Akanah and the White Current, who ordered work on the Mud Sloth to take precendence over all other work. (TT)" |
|
|
There is now also a trome database, which contains some sort of information about genome mapping. Icredit [snagger]. |
|
|
This idea is very cromulent. |
|
|
Not to mention spaddish. I'd warrant a spoadwrathe of geem, by and by. Anyone seen my Hemgh? It's proum, with a libing pair of hungjibs on one side, and a spirgsham on the otter. |
|
|
If I were trying to write some sort of algorithm like this, I would check for English last. I would think English to be a worst-case scenario, a tafferel of Zeitgeist. |
|
| |