Halfbakery: Everything except Wikipedia

Please log in.

Before you can vote, you need to register. Please log in or create an account.

Computer: Web: Searching
Everything except Wikipedia (+7, -1) [vote for, against]
Or copies

Googling web pages often leads to search results dominated by Wikipedia. Even if Wikipedia itself is excluded, the results frequently turn up copies of Wikipedia entries. Now, i don't consider it to be a bad thing per se and i even contribute to it, but as a regular contributor i'm aware of how stupid it can be and how stupid I can be even if i try not to be when i'm editing it. There are for example edit wars about whether the bra is a unisex garment or not. Therefore, this is a very basic and simple idea: a search engine facility which ignores Wikipedia and compares its search results to large blocks of text from Wikipedia and excludes those. To me, this seems either easy to implement or is possibly already implemented, and i'm not sure this is even much of an idea, but it is one, just about.
-- nineteenthly, Oct 20 2010

Google search for "transistor" http://www.google.c...search?q=transistor
[Jinbish, Oct 20 2010]

Google search for "transistor -site:wikipedia.org" http://www.google.c...fp=88a8e262cf585f70
[Jinbish, Oct 20 2010]

Not sure of the Google syntax, but you can exclude terms by using the '-' operator.

Search for "search term" -site:wikipedia.org and you will get stuff from sites except Wikipedia.

As for clone sites - like About.com etc. you could add them to your search exclusions or just put up with a manual sift in your quest for decent references.
-- Jinbish, Oct 20 2010

If only wikipedia came up, that wouldn't be bad, you can ignore one site. It's when the top 10-20 sites are all copy paste of the wikipedia text that it gets annoying.
-- MechE, Oct 20 2010

I can relate to this issue. If you want information from Wikipedia, you can just go there. You don't need to Google to tell you that.

I just ran several random searches on Google (including "Half Bakery") with and without a -"wiki" prefix. Worked perfectly every time. -"wiki" seems to eliminate the Wikipedia entries.

Good idea, [19th].[+] Good solution, [Jinbish] {+}.
-- Boomershine, Oct 20 2010

Thanks, but as [MechE] says, there are still the clone sites to deal with. It's not practical simply to have a long string of exclusions, but it seems entirely practical for the text on Wikipedia to be used as a filter.
-- nineteenthly, Oct 20 2010

Never mind searching for "transistor", try searching for a specific transistor by part number. Were you hoping to find a datasheet, or a stockist? No, you will instead find an abundance of sites which have collated every electronic device number known to man, woman or other (possibly algorithmically generated), and which will happily lie to you that they have the device in stock, if you fill out their Request For Quote form. Or that they have the datasheet in question, behind their paywall, or will have as soon as someone submits it.

Google's utility is rapidly being subsumed by searchword spammers. The copypaste Wikipedia clones are merely one form.</rant>
-- BunsenHoneydew, Oct 24 2010

Searchio-diversity: So what we need is actually a "diversity" view, where the CONTENT of each result must be very different from its predecessor. Then you can choose "results like this" and get closer and closer.

Not that I don't find what I want. But usually its a keyword that I'm missing.

Keyword fishing: So maybe the best search algorithm should pick up on keywords IN the wikipedia (which at least gets groomed and advances with time) and then gives results for the DIFFERENT keywords, hopefully those results will be different from the original, so you get a larger searcheodiversity.

Another way to go could be to check in wikipedia version history (that search is available) for the dates of the clones, and exclude anything from around that time - supposing that most cloning was done around the same time.
-- pashute, May 15 2012

WP isn't the only oft-copied site.

Maybe any site that a) acts as a source, and b) is an archive, could embed certain ascii whitespace characters as a tell for search-engines, most usefully embedding the original webpage addy, steganographicaly.
-- FlyingToaster, May 15 2012

You can filter out a phrase from the search results. If a bunch of search results are copies of Wikipedia, the the site previews would have the same phrase coming up over and over again. Simply do a filter on that phrase. E.g., transistor -"A transistor is a semiconductor device used to amplify and switch electronic signals and power."
-- ytk, May 15 2012

I suppose Google itself might simply offer a checkbox --check it to exclude Wikipedia content and copies thereof. Their indexing system must be able to identify all the sites out there that hold cloned Wikipedia data.
-- Vernon, May 15 2012

Search spam is a great evil and those who perpetrate it are vile and need to be put down like animals.
-- Voice, May 15 2012

I'll find us a good wall; you find the spammers to go up against it.
-- Alterother, May 15 2012

[Alter]'s Ready, Aim, Firewall stops spammers dead.

There is something to be said for this. I took some criminology courses many years ago and apparently a large factor in whether or not people commit an act is their assessment of "A" how likely they are to be caught multiplied by "B" the severity of the penalty. Ergo, the harder it is to identify and catch spammers the higher you need to make the penalty in order to dissuade the rest of them. Conversely, if you could catch every single one everytime a $5 fine should suffice. A wall is easier.
-- AusCan531, May 15 2012

Erm speaking of wikipedia, I was looking at some aerogel stuff on there this morning and came across this "SEAgel was invented by Robert Morrison at the Lawrence Livermore National Laboratory in 1992."

Now, I genuinely don't remember doing that. Admittedly I did used to drink scrumpy a lot back in those days, but I'm pretty sure I would have remembered it, wouldn't I?

Who can say how many of have done the same..maybe better to scour wikipedia now, maybe some back-patent rights out there...
-- not_morrison_rm, May 15 2012

Now I'm confused. Are you, or are you not, Morrison R.M.?
-- Loris, May 15 2012

It might be interesting to be able to select one search result and tick a box that says "Calculate Hamming Distance". A slider could then be used to exclude those results that were similar/copies of the selected result.
-- zen_tom, May 15 2012

Buy "Search spam is a great evil and those who perpetrate it are vile and need to be put down like animals" at GreatProducts4U.com today!
-- phundug, May 15 2012

'Scrumpy' and 'remember' are mutually exclusive terms, without exception. You could have perfected cold fusion in your basement and you wouldn't have remembered it five minutes later, much less today.
-- Alterother, May 15 2012

random, halfbakery