h a l f b a k e r yExperiencing technical difficulties since 1999
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
|
We initially read this as "Predictive Text at end of world" which struck us as something of an oxymoron ... |
|
|
If I understand this right, having defined the start of the word, you want to work back from the end? |
|
|
Basically, although spelling in reverse is not something I expect people to do without stopping and muttering about it. (Let's see: a-u-t-o-(skip)-n-o-i-t, no, I don't like it.) |
|
|
So the "skip" key is switching you between two separate input strings, both proceeding as left-to-right input, and on each keypress, update the suggestions: |
|
|
SELECT words FROM wordlist WHERE words LIKE s1%s2 |
|
|
you just pick up the words which match both the head and tail partial strings. |
|
|
This idea seems to be predicated on the idea that
the end of the word has more predictive power than
the early or middle bits. Is that true, though? |
|
|
It strikes me that word endings ("ed", "s", "ly", "er",
"ion") are less varied than the middle. |
|
|
Once you've gotten to a list of forms-of-the-same-word, then I think the predictive power of the end gets a lot better. |
|
|
Plus, I think "skip to end" would be a lot easier for the user to visualize and use than "ok, we're now going to jump to someplace in the middle". |
|
|
//easier for the user to visualize and use than "ok, we're
now going to jump to someplace in the middle"// |
|
|
No, I didn't mean "jump to the middle", I meant that it
*might* be more effective to simply type the next letter
than to skip to the end. |
|
|
For example, suppose I want "exciting", and I've typed
"ex". If I now add the last letter ("g"), it could predict
"exciting", "exiting", "extrapolating", "existing",
"extolling" and many more. However, if I add the "c"
instead ("exc"), the choice is more restrictive. |
|
|
I'm not saying that this is always the case, I'm just
questioning whether the last letter has more predictive
power than the next letter in the word. |
|
|
There's one other drawback to your system, though.
Most predictive software narrows the choice as you add
more letters. If you jump forward and add the last
letter, then you would have to work backwards (adding
the penultimate letter etc) if you wanted to restrict the
choices further. |
|
|
Well, yes, in your example, (checking on my phone here) I get "except", "example", "exactly", "extra", "express", "expect" - so there would be no reason for me to skip to the end. |
|
|
Likewise, when I hit the "c", now I've got "except", "exchange", "excuse", "excellent", "excited"... still lots of variety. |
|
|
However... when I put in the "i", the list looks like "excited", "exciting", "excitement", "excite", "excitation", "excise", "excitable", "excites", "excitedly" - the predictive value of the next letter just fell off a cliff. But: (skip)-g and you've nailed it. |
|
|
Fair enough. Research trumps theory. |
|
|
But after typing "exc" your choices are still pretty broad,
because you could be typing "exciting", "excited", "excitable",
"excitation", "excites", and so on. |
|
|
Luckily, this is easily testable. Just iterate over the alphabet,
building two regexps per letter: one of the form /...a..*/ and
another of the form /.....*a/. Essentially, you're looking for
words of at least five letters where either the fourth or the last
letter is specified. |
|
|
In fact, I've taken the liberty of running this test against the
Unix words file, and come up with 1,126,044 matches for the
former regexp, and 1,135,799 for the latter, indicating overall
only a marginally higher degree of specificity for typing the
fourth letter compared to skipping to the last letter. So while
it's a wash overall, there are likely some circumstances where it
could be a significant timesaver. |
|
|
Anyway, bun for essentially suggesting using a regexp for
predictive text. |
|
|
That's some fine geekery, [ytk]. Which reminds me, has anyone used the equivalent of UNIX tab completion in this context? |
|
|
Upon further reflection, my methodology is flawed.
Not only did I oversimplify
the problem, but the conclusion I reached should
have been impossible, and I
would have noticed that but for also doing a fairly
impressive job of cocking up
the regexps. (Apropos: Some people, when
confronted with a problem,
think, I know, I'll use regular expressions. Now
they have two problems.) |
|
|
After a bit of thought, I arrived at the following new
methodology: Iterate over the strings "aaaa" to
"zzzz", and for each one build two regexps of the
format /^1234.+$/ and /^123.+4$/ (substituting each
number with the character in that position). Run
each pair of
regexps against the dictionary, adding a count of the
matches to a pair of arrays. For each array, I then
remove all of the zeros, since they don't match any
actual words. Finally,
I took the averages of each array and used that for
the comparison. |
|
|
Given a word of at least five letters, typing four
consecutive characters yields an mean of ~11.9
matches, and a median of 3 matches. Typing the
first three and then skipping to the last one yields a
mean of ~7.6 matches, and a median of 2 matches.
Assuming this methodology is sound (which is a pretty
big assumption), it would seem that typing three
characters and skipping to the last one is significantly
more efficient than typing four consecutive
characters at the start of a word. You may be on to
something here, [lurch]. |
|
|
[ytk] Kudos, for serious research, and: But the
user
has the option of either standard- or skip-ahead
text completion, and chooses, for any given word,
the one which is more efficient. So the method's
efficiency exceeds standard-only text completion
by an even larger margin, no? |
|
|
On the subject of methodology, the dictionary
words you tested this against ought to be
weighted by their frequency, i.e. if the method's
less efficient for rarely-used words, and more
efficient for common ones, then your method
underestimates its efficiency in practice. Or vice
versa. The
large difference between mean and median in
your results suggests very skewed distributions, so
this effect may be large. |
|
|
Some shorthand systems use one-letter abbreviations for common suffixes, e.g. m for -ment, g for -ing, d for -ed, a for -ation. The phone should be programmed to test your typed string as a shorthand form as well as a real word, hence: |
|
|
cong would predict both "conglomerate" and "continuing" |
|
|
docm would predict "document" (using m as the suffix) |
|
|
rena would predict "rename" and "renovation" |
|
|
//predictive text at end of world// |
|
|
"We apologize for the inconvenience..." |
|
|
(that would be the voice-over for a Fatal Asteroid Collision (or other catastrophe) Song) |
|
|
Predictable Textile Atherosclerosis Endometriosis
Offal Worsted |
|
| |