h a l f b a k e r yIf ever there was a time we needed a bowlologist, it's now.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
<start here>
Eliminate navigation garbage for better searching. | |
Many pages on the web, this one included, have a lot of navigation information as well as actual content. Authors should have a tag (or other means) avaiable to differentiate between these.
The benefit of this is that search engines, data mining tools, etc, need only read the relevant part of a
page.
CLARIFICATION: I didn't actually think <start here> would be the tag. As annotated, <content>...</content> would do the job admirably.
[link]
|
|
<start here> wouldn't work, as 'here' would be interpreted as an attribute. <croissant></croissant> |
|
|
Why not just put the <start here> information on the main page and skip the extra step? |
|
|
As I remember it, the interspersing of navigation garbage with actual content was the whole benefit of html, which otherwise does rather poorly at layout. As example, the links on contributors' IDs on this page, linking you to their account pages. |
|
|
If anything, people should be encouraged to more closely integrate navigation and content, not to strip it out. So, feeshbone. |
|
|
// search engines, data mining tools, etc, need only read the relevant part of a page.
It depends on the indexing software, but they already do (<head>, <meta> <a> e.g.)
Fewer and fewer web pages are 'hand written' these days, a lot are published from content management software, which makes it impossible to say <start here> anywhere on a page fragment. You have no prior knowledge of where the fragment will appear in relation to anything else. |
|
|
I see site designers using the presence of such a brower enhancement to redirect any offsite traffic to a gateway page. |
|
|
That gets rid of the 'official' metadata, but not the garbage that appears on the page itself. |
|
|
I don't see how it would be hard, [namaste]. On most auto-generated pages, it's quite clear which parts are real content and which are meta-content. |
|
|
Applying this to HB, it's quite easy. The title, slugline, original idea and annotations would be marked as content, the logo, sidebar and related ideas list at the top wouldn't. Is that so hard? |
|
|
<content>So, something like an optional <keyword>content tag</keyword> pair that you put around that which is actual content rather than navigational stuff.</content>Return to top. <content>The browser wouldn't recognise the tag and so would ignore it. The <keyword alt="searchbot, find, webcrawler, crawler">search</keyword> 'bot upon finding the tags would strip out all other tags between them and be left with a list of words names and numbers that appear on the web page. |
|
|
If I understand you right then this looks to be a simple move. Not much benefit not much cost either. |
|
|
You could even extend the concept by putting <keyword><keyword>keyword</keyword> <keyword>tags</keyword></keyword> around relevant words to increase the chance of them getting indexed.
</content> |
|
|
¯sadie, do you want a tag to indicate to your browser it is (or is not) showing the start page for the browsed site? If so, you could design a custom toolbar button <-Start Back> to browse the site back to the start of the author's content. I'm deliberating if this type of searching is more powerful, more respectful, or less fun. |
|
|
I don't think so, reensure. What I think sadie is after is a pair of tags that you use to indicate content on a per page basis. |
|
|
Oh, okay. I suspected it was more for the benefit of search engines after reading a few annos. At first the idea seemed to be a call to revise the entry point of a browser to a point in a url where its author had tagged data as content, regardless of where a search bot may have cached some keyword from the body of the content. |
|
|
You're along the right lines, but not just search engines. There are an increasing number of data mining tools, translators, summary tools, things that read pages for blind people, etc. A content tag would help all of them. |
|
|
search engines should just automatically mark the relavence of content down if that content is copied between the current page and pages linked to/from it. This should focus in more on specific pages, instead of listing every page on a site who's main menu matches a keyword you enter. |
|
| |