Computer: Algorithm
Is 'trome' an English word?   (+17)  [vote for, against]
Measure how English arbitrary letter groupings are.

Back in the days of small disk drives and tiny RAM I had the idea of a spelling checker that worked statistically. Analysis of all English words would produce some sort of feasibility scale for any given group of letters. So when you type ‘ptye’ it would be flagged as extremely unlikely. A bit of thought made it obvious that this idea was truly half-baked.

But now we have huge disks, masses of RAM and the whole web to analyse and I want to start and really do it.

It would provide marketing bods with a measure of how ‘English’ a new ‘word’ is. (Or for a new lager - how ‘German’, for perfume - how ‘French’, and so on.)

And I have a strange desire to set up a website full of non-real words, probably starting with four or five letter words that are the most likely to be real English words but somehow don’t exist, and watch Google index them all. (It would be great for Scrabble – if you could persuade your opponents to accept the web as the reference source. “Of course that word exists, here it is on this site.”)

PS So far as I could find, 'trome' did not exist as an English word until . . . . now. It means an arbitrary grouping of letters don't form an existing English word but could reasonably be thought to be one.
-- snagger, Sep 28 2001

(?) Shannon's n-th order English approximations http://www.stanford...py_of_english_9.htm
(at end). [rmutt, Sep 28 2001, last modified Oct 17 2004]

Probability for Linguists http://humanities.u...ial/Probability.htm
Congrats, snagger, you've hit on the basics behind practically all speech recognition, machine translation, and cryptographic systems. [rmutt, Sep 28 2001, last modified Oct 17 2004]

Random Word Generator http://www.fourteenminutes.com/fun/words/
Expand your vocabularly beyond existing words. [calum, Oct 13 2002]

List of real exons and the corresponding predictions by trome database http://bioinformati...hio-state.edu/FEGB/
Medical word. [gtoal, Feb 27 2006]

Bloom filter http://www.gtoal.co...mes/hash/hashtest.c
Found one! I knew I had this code somewhere [gtoal, Feb 27 2006]

Not so halfbaked after all, although the application isn't the one you're thinking of.

Newer versions of Microsoft Word use three-letter group statistics to detect which language a text is in (to determine which language to spell-check against.)
-- jutta, Sep 29 2001


jutta: Gads. The insidious bastards--now Word can tell me I'm misspelling in French when I don't even know the language and all I want to do is abbreviate Office of Uninformed Unpleasant Idiots. Bah!
-- Dog Ed, Sep 29 2001


//spelling checker that worked statistically//

A long time ago in a galaxy far, far away I used to own a number of CP/M-based computers, which were (by modern standards) quite resource constrained. I can't recall the name of the darn thing, but there WAS a spelling checker on that platform that worked that way, since it was impossible to store any sort of useful dictionary on a 180K floppy.

<sigh> Showing my age, I guess.
-- krelnik, Oct 12 2002


I think people are missing this one entirely.

I reckon this is a fantastic idea. Not to mention fairly easy to do... I think the easiest way would be this: Set up a neural net. Now, train this neural net via whatever algorithm (genetic is probably easiest) with random words from a simple dictionary. Once you've trained it sufficiently that it can hit all real English words with a rating of over 90% Probably English, let it loose. You could feed it random strings of letters until it found one it liked.

Actually, this is a brilliant thing. I'm off to make one! Now, anyone know anything about Neural Nets in PHP? Drop me an email imagin8or@despammed.com
-- imagin8or, Jul 02 2003


alcomme polictor coatilted shiffic longintend immedies freelying twainter inabinicato savialsence decisible hereoplegit explicity revisix substizon veryptinue severnegot

This is some output from a program I wrote. It simply makes words out of other words, matching up groupings of letters. It'd probably work better if it used statistics, but it generates an interesting word about 1/3 of the time.
-- nelso, Dec 31 2003


I once wrote a statistical analyser for text in BBC Basic. You'd feed it a bunch of text files, then it would spit out randomly generated words based on the distributions in the text. I only had it working on letter pairs (ie given any single letter, spit out a probable next letter) so most of the words were gibberish. I never got around to building the third-order version. With a little culling, it worked OK for my purposes - generating place names for a D&D campaign.

Yes, I was a nerd. So sue me.
-- BunsenHoneydew, Jan 25 2005


// Newer versions of Microsoft Word use three-letter group statistics to detect which language a text is in (to determine which language to spell-check against.)//

And that's why my version of word always assumes that my resume is in French, thus making it impossible to run a spell check. Word is ass.

I'm sorry, did I say that out loud?

Actually, I like this idea. It would have the problem of adapting to English's adaptability, in that we've got so many words from other languages that I doubt any word could be considered "English." Other than that, it's golden.
-- shapu, Jan 25 2005


Word is fantastic. Your resume is ass.
-- bristolz, Jan 25 2005


Wrong and right, in that order.
-- shapu, Jan 25 2005


// A long time ago in a galaxy far, far away I used to own a number of CP/M-based computers, which were (by modern standards) quite resource constrained. I can't recall the name of the darn thing, but there WAS a spelling checker on that platform that worked that way, since it was impossible to store any sort of useful dictionary on a 180K floppy. // krelnik - I wrote one that worked like that for the BBC micro to do a *rough* check of user plays in a Scrabble game. They're called "Bloom filters". Terrible error rate, more of an intellectual curiosity than serious code. I think the first real spelling checker for the DEC10 used a Bloom filter however.
-- gtoal, Feb 27 2006


Googling 'trome' now sends you here. I can't wait to use it in a sentence. Frabjous!
-- spidermother, Feb 28 2006


Zipf-tastic!
-- Jinbish, Feb 28 2006


Big pharma will pay you a lot of honkel for this.
-- wagster, Feb 28 2006


//So when you type ‘ptye’ it would be flagged as extremely unlikely.//

I don't think that's any less likely than 'pterydactyl'.

I mean, just last week, out for a nice dinner, I wore a suit and a ptye.
-- zigness, Mar 02 2006


It may not be commonly known, but all airway navaid intersections have 5 or 6 letter pronouncable names, and they're charted on the aeronautical charts. I'd bet that 'trome' is one of these somewhere.

Also, found this on a site -- apparently about Star Wars... "Trome, Notha: this depot manager worked for Starway Services, in Taldaak Station. It was Trome, under orders from Akanah and the White Current, who ordered work on the Mud Sloth to take precendence over all other work. (TT)"
-- zigness, Mar 02 2006


There is now also a trome database, which contains some sort of information about genome mapping. Icredit [snagger].
-- bungston, Sep 06 2006


This idea is very cromulent.
-- sleeka, Sep 07 2006


Not to mention spaddish. I'd warrant a spoadwrathe of geem, by and by. Anyone seen my Hemgh? It's proum, with a libing pair of hungjibs on one side, and a spirgsham on the otter.
-- zen_tom, Sep 07 2006


Gewd ninkthing
-- Five_Swords, Sep 08 2006


If I were trying to write some sort of algorithm like this, I would check for English last. I would think English to be a worst-case scenario, a tafferel of Zeitgeist.
-- undata, Sep 11 2006



random, halfbakery