Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
See website for details.

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


                                                       

One, But You're Not The Same

Search Results Should Present Different Information
  (+5, -2)
(+5, -2)
  [vote for,
against]

When searching a term that is present in a hot press release, one finds that hundreds, or even thousands of hits, are simply referring to the exact same text.

Since the search engine is indexing the text, it might actually notice, and prune these out, making the haystack somewhat smaller.

theircompetitor, Mar 07 2005

Google cheat sheet http://www.google.c...elp/cheatsheet.html
[waugsqueke, Mar 09 2005]

Googlr search: turvy -"topsy turvy" http://www.google.c...+-%22topsy+turvy%22
"turvy" other than "topsy turvy" [waugsqueke, Mar 10 2005]


Please log in.
If you're not logged in, you can see what this page looks like, but you will not be able to add anything.



Annotation:







       Is that a distinct problem from link farms, which I believe Google does its best to filter already?
DrCurry, Mar 07 2005
  

       In the case that triggered this idea, I was looking for more data on a specific company that was mentioned in a press release.   

       Instead, I'm getting thousands of hits all quoting the press release. Mind you, the summary paragraph is actually fairly obviously the same. So Google kind of knows it's showing the same data
theircompetitor, Mar 07 2005
  

       Google sort of does this now...   

       "In order to show you the most relevant results, we have omitted some entries very similar to the ## already displayed. If you like, you can repeat the search with the _omitted results included_."
waugsqueke, Mar 07 2005
  

       This would be easily solved if Google allowed a NOT operand.
Worldgineer, Mar 07 2005
  

       [waugs]: I think that only excludes multiple hits from the same domain. (I could be wrong, per usual.)
angel, Mar 07 2005
  

       I thought this idea would require that we carry each other. Carry each other.
bungston, Mar 07 2005
  

       Damn you [bungston]! you beat me to it.
Freefall, Mar 07 2005
  

       guys, just clarifying -- these are hits from multiple different sites that all refer to the same exact text.   

       So the only clue that Google has to the "sameness" of the text is the abstract.   

       And it's absolutely generating thousands of them.   

       Now, you can "minus" certain terms and eliminate all of the hits -- which is not what you would want either.   

       Ideally, you'd want to see unique information referred to in a unique way, and no more than necessary.   

       [confidential to Freefal -- you should have said U2 bungston].
theircompetitor, Mar 08 2005
  

       Bono idee' = +   

       And I can't be holdin' on...
csea, Mar 08 2005
  

       //if Google allowed a NOT operand//
It does already. Just put "-" before an item in the query.
  

       This searches for pages with "foo" but not "bar":
foo -bar
  

       This searches for pages with "foo" but not on the halfbakery:
foo -site:halfbakery.com
krelnik, Mar 08 2005
  

       sure, though it's tricky to do that for a whole paragraph or article.   

       I think the criticism, though valid, misses the point.   

       Sure I can be smart enough to still find what I want.   

       But why would you show me 1000s of copies of the same entry? My assistant wouldn't, right?
theircompetitor, Mar 09 2005
  

       [UB], no sadly. I'm sure it's not my personality, though
theircompetitor, Mar 09 2005
  

       Not all of them, anyway.
Detly, Mar 09 2005
  

       // it's tricky to do that for a whole paragraph or article. //   

       Grab a fairly unique phrase from it, put it in quotes and then put a - before it. That tells Google to ignore anything that includes this passage of text.   

       Added link to Google's cheat sheet, showing all the operators. They can be combined in very useful ways.
waugsqueke, Mar 09 2005
  

       <aside>The Brits among us may care to check the first Gooogle hit for "fuckwit".</aside>
angel, Mar 09 2005
  

       'Miserable failure' is interesting, too.
waugsqueke, Mar 09 2005
  

       I'd like an "other-than" boolean operator which would exclude text matches that satisfied a particular criterion, but not exclude entire pages on that basis.   

       For example,   

       "turvy" other than "topsy turvy"   

       would find places where the word "turvy" appeared not preceded by the word "topsy". Sites containing the phrase "topsy turvy" would not be completely excluded, but would only be included if the word "turvy" appeared without the word "topsy" in front of it.
supercat, Mar 09 2005
  

       [angel] I'm a Yank, myself, but I checked the google as you suggested. I wonder who set it up to go to [John Leslie Prescott]
normzone, Mar 09 2005
  

       //Grab a fairly unique phrase from it, put it in quotes and then put a - before it. //   

       waugs--it seems like that would get rid of every instance it occurs, whereas the intention of this idea (I think) is to show it once and only once.
yabba do yabba dabba, Mar 09 2005
  

       yabba dabba yes!
theircompetitor, Mar 09 2005
  

       // "turvy" other than "topsy turvy" //   

       supercat, Google will do that too (see link). Note it found a surprisingly large number of instances of 'autopsy-turvy'.   

       // the intention of this idea (I think) is to show it once and only once. //   

       Yes but on the initial search, where you've found thousands of the same thing (which I still think Google will reduce down, even over multiple domains), you know on subsequent searches what to exclude.
waugsqueke, Mar 10 2005
  

       //I wonder who set it up to go to [John Leslie Prescott]//
That was a joint effort by several bloggers, led by a guy called Tim Worstall. Google his name for details.
angel, Mar 10 2005
  

       I don't think Google's doing what I want, since it would not return a page containing the phrase "flopsy-turvy topsy-turvy"; the phrase "topsy-turvy" should not disqualify the page altogether, but when using the "minus" operator on Google it does.
supercat, Mar 10 2005
  

       Not a bad idea on the surface, but begins to look less attractive when you consider some of the questions that would have to be answered in implementation. In particular, how should Google (or any other search engine) select a "definitive source" for a given document?   

       Perhaps it would be better if we could attach our own intelligent agents to the search services, to sit between us and the raw flow of information and filter out the useful bits according to our own individual criteria. Otherwise we put search engines in the business of pre-filtering our information for us, and I don't know that we really want that.
uhlume, Mar 10 2005
  

       // it would not return a page containing the phrase "flopsy-turvy topsy-turvy"; the phrase "topsy-turvy" should not disqualify the page altogether, //   

       Hm. I'm confused reading that, so I'm sure Google would be too. You're saying you don't want pages that have the phrase "topsy turvy" on them to appear in the search results, then say that the phrase "topsy turvy" shouldn't prevent a page from appearing when it's filtered out. Umm.
waugsqueke, Mar 11 2005
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle