h a l f b a k e r yThis ain't rocket surgery.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Googling web pages often leads to search results dominated by Wikipedia. Even if Wikipedia itself is excluded, the results frequently turn up copies of Wikipedia entries. Now, i don't consider it to be a bad thing per se and i even contribute to it, but as a regular contributor i'm aware of how stupid
it can be and how stupid I can be even if i try not to be when i'm editing it. There are for example edit wars about whether the bra is a unisex garment or not. Therefore, this is a very basic and simple idea: a search engine facility which ignores Wikipedia and compares its search results to large blocks of text from Wikipedia and excludes those. To me, this seems either easy to implement or is possibly already implemented, and i'm not sure this is even much of an idea, but it is one, just about.
Google search for "transistor"
http://www.google.c...search?q=transistor [Jinbish, Oct 20 2010]
Google search for "transistor -site:wikipedia.org"
http://www.google.c...fp=88a8e262cf585f70 [Jinbish, Oct 20 2010]
[link]
|
|
Not sure of the Google syntax, but you can exclude terms by using the '-' operator. |
|
|
Search for "search term" -site:wikipedia.org and you will get stuff from sites except Wikipedia. |
|
|
As for clone sites - like About.com etc. you could add them to your search exclusions or just put up with a manual sift in your quest for decent references. |
|
|
If only wikipedia came up, that wouldn't be bad, you can ignore one site. It's when the top 10-20 sites are all copy paste of the wikipedia text that it gets annoying. |
|
|
I can relate to this issue. If you want information
from Wikipedia, you can just go there. You don't
need to Google to tell you that. |
|
|
I just ran several random searches on Google
(including "Half Bakery") with and without a -"wiki"
prefix. Worked perfectly every time. -"wiki" seems to
eliminate the Wikipedia entries. |
|
|
Good idea, [19th].[+] Good solution, [Jinbish] {+}. |
|
|
Thanks, but as [MechE] says, there are still the clone
sites to deal with. It's not practical simply to have a
long string of exclusions, but it seems entirely
practical for the text on Wikipedia to be used as a
filter. |
|
|
Never mind searching for "transistor", try searching for a specific transistor by part number. Were you hoping to find a datasheet, or a stockist? No, you will instead find an abundance of sites which have collated every electronic device number known to man, woman or other (possibly algorithmically generated), and which will happily lie to you that they have the device in stock, if you fill out their Request For Quote form. Or that they have the datasheet in question, behind their paywall, or will have as soon as someone submits it. |
|
|
Google's utility is rapidly being subsumed by searchword spammers. The copypaste Wikipedia clones are merely one form.</rant> |
|
|
Searchio-diversity: So what we need is actually a
"diversity" view, where the CONTENT of each
result must be very different from its predecessor.
Then you can choose "results like this" and get
closer and closer. |
|
|
Not that I don't find what I want. But usually its a
keyword that I'm missing. |
|
|
Keyword fishing: So maybe the best search
algorithm should pick up on keywords IN the
wikipedia (which at least gets groomed and
advances with time) and then gives results for the
DIFFERENT keywords, hopefully those results will
be different from the original, so you get a larger
searcheodiversity. |
|
|
Another way to go could be to check in wikipedia
version history (that search is available) for the
dates of the clones, and exclude anything from
around that time - supposing that most cloning
was done around the same time. |
|
|
WP isn't the only oft-copied site. |
|
|
Maybe any site that a) acts as a source, and b) is an archive, could embed certain ascii whitespace characters as a tell for search-engines, most usefully embedding the original webpage addy, steganographicaly. |
|
|
You can filter out a phrase from the search results. If a
bunch of search results are copies of Wikipedia, the the
site previews would have the same phrase coming up over
and over again. Simply do a filter on that phrase. E.g.,
transistor -"A transistor is a semiconductor device used to
amplify and switch electronic signals and power." |
|
|
I suppose Google itself might simply offer a checkbox --check it to exclude Wikipedia content and copies thereof. Their indexing system must be able to identify all the sites out there that hold cloned Wikipedia data. |
|
|
Search spam is a great evil and those who perpetrate
it are vile and need to be put down like animals. |
|
|
I'll find us a good wall; you find the spammers to go up
against it. |
|
|
[Alter]'s Ready, Aim, Firewall stops spammers dead. |
|
|
There is something to be said for this. I took some criminology courses many years ago and apparently a large factor in whether or not people commit an act is their assessment of "A" how likely they are to be caught multiplied by "B" the severity of the penalty. Ergo, the harder it is to identify and catch spammers the higher you need to make the penalty in order to dissuade the rest of them. Conversely, if you could catch every single one everytime a $5 fine should suffice. A wall is easier. |
|
|
Erm speaking of wikipedia, I was looking at some aerogel stuff on there this morning and came across this "SEAgel was invented by Robert Morrison at the Lawrence Livermore National Laboratory in 1992." |
|
|
Now, I genuinely don't remember doing that. Admittedly I did used to drink scrumpy a lot back in those days, but I'm pretty sure I would have remembered it, wouldn't I? |
|
|
<imagines barging into the lab waving a half-empty scrumpy bottle going " naaah, naarghh mate, you got it wrooong, itsch like thish ...."(one major scientific achievement later) staggers out of the lab, falls asleep on a park bench, still clutching the bottle..> |
|
|
Who can say how many of have done the same..maybe better to scour wikipedia now, maybe some back-patent rights out there... |
|
|
Now I'm confused. Are you, or are you not, Morrison R.M.? |
|
|
It might be interesting to be able to select one search result and tick a box that says "Calculate Hamming Distance". A slider could then be used to exclude those results that were similar/copies of the selected result. |
|
|
Buy "Search spam is a great evil and those who perpetrate it are vile and need to be put down like animals" at GreatProducts4U.com today! |
|
|
'Scrumpy' and 'remember' are mutually exclusive terms,
without exception. You could have perfected cold fusion in
your basement and you wouldn't have remembered it five
minutes later, much less today. |
|
| |