h a l f b a k e r yOK, we're here. Now what?
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
(Apologies if this already exists, I looked and looked and couldnt find one that did this).
Anyone who has had to maintain a web site knows that broken links are a continual pain in the neck. There are a huge number of tools to deal with this problem, but what they do is simply spider your web
site, retrieve each link, and determine if a 404 (Page Not Found) error has occurred. Then they notify you in some way so you can fix it.
I think this could be taken further. Instead of waiting for maintenance time, spider the links while the site is in good working condition, to grab the intended target page of each link. Scrape out all the redundant HTML, menus, scripts and such to get to the actual content of the page. Then store this content (or snips of it) locally on the webmaster's system for use later. Include a manual interface so the webmaster can tweak what pieces of text from the target page are considered significant.
Then at maintenance time the software is armed with not only the URLs of the links, but the content which each is intended to display. Again spider the web site, and verify not only that each page leads somewhere, but that the intended content is present at the target page.
But wait, theres more. If a link breaks, the software has what it needs to find a replacement. It can use a search engine to search the web for the target text from the original link. (Google has an API for this sort of thing). This would allow it to find a moved page on the original site, as well as mirrors of the original material. It could also automatically search the Internet Wayback Machine for a cached archive of the page, as well as the Google cache.
When it notifies you of broken links, it can supply suggested replacements, which you can verify and easily select.
Aside from making link maintenance a breeze, this tool would also allow you to catch broken links that many current tools would never notice such as:
· Content changed URL within site
· Domain has been sold or repurposed
· Domain has been parked to a search page
Area 404
http://www.plinko.net/404/area404.asp But what if you had purposely linked to one of these pages? [hippo, Oct 04 2004, last modified Oct 05 2004]
BBC News: Web tool may banish broken links
http://news.bbc.co....hnology/3666660.stm Oct 05 2004: They call it "Peridot" [krelnik, Oct 05 2004]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Destination URL.
E.g., https://www.coffee.com/
Description (displayed with the short name and URL.)
|
|
yep, proactive link fixing, like it (+) |
|
|
Recalls [Rayford]'s WebReaper experience. |
|
|
Yes, if you build this and run it on the bakery, make sure it is not logged in under a user account at the time. |
|
|
I found one on Google, but the link was broken... |
|
|
Apparently I wasn't so crazy after all, some student interns at IBM UK baked almost exactly what I described here. See link. |
|
| |