h a l f b a k e r yLeft for Bread
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Web crawlers like Google create indices that contain all the words present on each crawled site. It would be interesting to correlate these word lists with the dictionary, to measure the size of each sites vocabulary. (The dictionary is necessary to eliminate invented terms, non-words and proper
names). Then boil these numbers down into a ranking.
You could use these rankings for a number of things, some perhaps even useful.
Educators could judge the appropriateness of a reference web site for a given class level. (Dont send your fourth graders off to a site that uses many college-level words they havent learned yet).
Web searchers looking for highly in-depth web sites on a given subject might be able to eliminate those that only scratch the surface.
Elitist snobs could use it as yet another reason to look down their noses at others.
ISPs could brag that their users' home pages are more intelligent than their competitors.
I have a feeling that this web site would have a quite high ranking.
crumbs, for starters.
http://www.halfbakery.com/idea/crumbs! [po, Oct 05 2004]
Macromedia Flash Search Engine SDK
http://www.macromed...load/search_engine/ Don't know if Google or any of the other majors are going to start using it, though. [krelnik, Oct 05 2004]
Google's file types
http://www.google.c...p/features.html#pdf "Google has expanded the number of non-HTML file types searched to 12 file formats. In addition to PDF documents, Google now searches Microsoft Office, PostScript, Corel WordPerfect, Lotus 1-2-3, and others." [krelnik, Oct 05 2004]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Annotation:
|
|
I like the idea and think it an interesting metric. |
|
|
[MrKlaatu]: So, by the expression "today's world," are you indicating that people are less educated today than at some earlier time? If so, do you mean that there are fewer people with education or that the people with education are less educated? |
|
|
They's getting dumberer all the time, ain't they? |
|
|
we could also implement this for individual 'bakers...just a thought |
|
|
Google indexes PDFs and other non-HTML content now, Rods. Macromedia is offering a developer kit that lets you easily extract the text from Flash movies too, and they are trying to get the major search engines to use it. So I don't see that as a problem. (See links). |
|
|
Movies and songs are a tough problem, indeed. |
|
|
this is a pretty good idea. id love to see this. |
|
|
In counterpoint, I think that our imposed or implied requirement to lessen the reading level is partly to blame for the current situation. Just as people pick up accents amazingly quickly, struggling through a few publishings much over one's head lifts your comprehension level in a hurry. A modern shortage of patience is the only problem, or very soon we shall find ourselves abandoning the 12th grade altogether. |
|
|
//...my spelling of recognise...//
I think that's just a matter of using an appropriately large dictionary, like the OED. |
|
|
you could also incorporate details uploaded from people's custom.dic tionary files. |
|
|
Google has apparently now added Flash support to the index, even though their own documentation does not yet admit this. (Thanks to [waugsqueke] for noticing this). |
|
| |