Forgot your password?
typodupeerror
Patents The Internet

Google Patents Search Algorithm 367

Posted by michael
from the google-this dept.
blastedtokyo writes "Google gets the first web search patent. According to this News.com.com article, Google was able to patent how they crawl and rank web pages. They claim "an improved search engine that refines a document's relevance score based on interconnectivity of the document within a set of relevant documents.""
This discussion has been archived. No new comments can be posted.

Google Patents Search Algorithm

Comments Filter:
  • Mis-title (Score:4, Informative)

    by Amsterdam Vallon (639622) <amsterdamvallon2003@yahoo.com> on Thursday February 27, 2003 @10:49AM (#5395592) Homepage
    It's not really their Search algorithm, it's their method of comprehensive PageRanking.

    They basically measure Web pages as either 1) portals, or 2) authorities.

    Sites like Kuro5hin [kuro5hin.org] and *nix [starnix.org] have a lot of "Google juice" (i.e. weight in their ranking system) because they have so many links to other sites, while also garnering a slew of links to their main page.
  • by tiltowait (306189) on Thursday February 27, 2003 @10:52AM (#5395626) Homepage Journal
    Google didn't invent the concept behind PageRank, just its name. See my E2 writeup on citation analysis [everything2.com] for more.
  • by Frank of Earth (126705) <{moc.snikrepf} {ta} {knarf}> on Thursday February 27, 2003 @10:55AM (#5395679) Homepage Journal
    ..google, you will feel their wrath [wsj.com]
  • Software patents (Score:4, Informative)

    by killmenow (184444) on Thursday February 27, 2003 @10:59AM (#5395713)
    I find it interesting that because it's google, some /.-ers are saying essentially "good for them!" But at the heart of it, it makes no difference who it is or what their intention is.

    Kids, software patents are bad, mm-kay... [mit.edu]
  • Re:Mis-title (Score:5, Informative)

    by MilTan (171504) on Thursday February 27, 2003 @11:00AM (#5395723)
    PageRank doesn't actually distinguish between "portals" and "authorities." It "only" does a link-analysis of the web by essentially mutiplying some ranking vector by a matrix representing the links in the web, with a random jump to another location taking place with a certain probability to create a new ranking vector. Once this converges, you have the new "PageRank."

    PageRank scores are calculated completely independently of the search query. You are probably thinking of Kleinbergs HITS (or Hubs and Authorities) algorithm which uses an initial search query to prune the search space, and then identifies hubs and authorities in the web. In contrast to PageRank, which only uses forward links to calculate its rankings, HITS uses both forward and "backward" links to figure out its ratings. Furthermore, unlike PageRank, HITS produces different scores for different queries.

    The above tells us the following: That Kuro5hin and Slashdot have high pageranks not because of their excessive numbers of outlinks, but because many people point to their frontpages. Similarly, these high PageRanks mean that people that Slashdot or Kuro5hin point to get higher scores as well.
  • Not necessarily... (Score:5, Informative)

    by TopShelf (92521) on Thursday February 27, 2003 @11:12AM (#5395825) Homepage Journal
    Patents are also widely used as a means of rewarding an inventor by giving them an avenue to license their technology to one or many users who can then implement it into commercial products. In that way you don't get a monopoly, nor does the inventor have to provide the capital required to bring something to market. You only get a monopoly if the patent holder refuses to sell licenses, or sells it to a single user.

    Think fuel injectors [autofieldguide.com], for example, which are made by several suppliers, but have a patent holder who gets license revenue.

  • by Anonymous Coward on Thursday February 27, 2003 @11:14AM (#5395845)
    Let me give you an example from the non-IT world - perfume. You cannot patent a smell/fragrance, but you can patent the formula you use to achieve that fragrance. Which is why there are knock-offs.

    The patents we all scream about are those that are comparable to the "fragrance" - patenting the concept of the shopping cart or the concept of transferring multimedia streams over the Internet. The Patent Office violated their own rules when awarding patents like that.

    Google didn't patent the concept of a ranking system, they patented the way they do it. And that is a good patent. It patents the formula and not the fragrance.

    If someone can figure out how to achieve the same result with a diffrent formula, more power to 'em!

  • by Anonymous Coward on Thursday February 27, 2003 @11:15AM (#5395846)
    What a coincidence. Today's UF topic covers patent obsession. Check it out. [userfriendly.org] Although amazon.com is the target of the joke, it shows how patent-obsessed software companies can be. I'd say it sure does a good job satirizing it. Who knows? Maybe Google will be targeted in tomorrow's strip.
  • by Sir Runcible Spoon (143210) on Thursday February 27, 2003 @11:19AM (#5395875)
    May be, may be not.

    Some have been talking about similar techniques since before this patent was filed:

    http://www.carnet.hr/cuc/cuc2000/radovi/prezenta ci je/F/F3/F3_f.pdf

    http://citeseer.nj.nec.com/context/856618/0
  • by zmahk31 (33160) on Thursday February 27, 2003 @11:20AM (#5395885)
    In fact, the algorithm as a computational method goes back to Jacobi 1804-1851, and is essentially an iterative solver for large systems of linear equations.
    <p>
    Of course, it's still a significant contribution to see the application of the Jacobi method to ranking web pages, and I assume that they have done some clever and many more dirty tricks to get more realistic results, weed out duplicate pages, etc., which may or may not be part or the patent.
    <p>
    In any case, the basic page rank algorithm is quite intuitive to anyone who has worked with iterative numerical methods, and in fact a very nice illustration of the power of such methods.
  • by MichaelCrawford (610140) on Thursday February 27, 2003 @11:33AM (#5395992) Homepage Journal
    Many of my pages show up in the first page of Google's results for relevant search terms, sometimes even being the number one result. For example, lately a google search for software consultant resume [google.com] lists my resume [goingware.com] as the #1 search result. (Your search results may vary.)

    I didn't pay a search engine optimization service to make this happen. I didn't use any tricks like "doors" either. It cost me no money, but it did take time and hard work to achieve it.

    I explain everything I did in How To Promote Your Business On the Internet [goingware.com].

    What's my secret? No secret at all:

    That's it. But read my article for the full discussion, as well as an explanation of why I'm telling everyone my secret.

    Other pages I have that you may find helpful are:

    My most popular page is a C++ style guide called Pointers, References and Values [goingware.com].

    and finally, from my K5 diary, A Webmaster's Strange But True Tale [kuro5hin.org].

    Thank you for your attention.

  • Patent # 6,526,440 (Score:5, Informative)

    by esme (17526) on Thursday February 27, 2003 @11:39AM (#5396053) Homepage
    read the patent [uspto.gov]
  • by nagora (177841) on Thursday February 27, 2003 @12:45PM (#5396800)
    Create a robots.txt file in your site's root directory with:
    User-agent: googlebot
    Disallow: /
    and then go to the "Urgent URL removal" page on google (click here [google.com] and follow the instructions. Your site should be removed within 24hrs. I've taken mine off and I suggest that any programmer that is concerned about being able to work freely should too.

    TWW

  • by iocat (572367) on Thursday February 27, 2003 @12:55PM (#5396928) Homepage Journal
    Now that they've been awarded a patent for page-rank, it's required for them to make it public so that people can license it. You can't patent a trade secret and still have it be secret. People now have the opportunity to build new methods and innovate with Pagerank as a basis for that innovation. (Real innovation, not MS innovation.)
    Actually, they are required to disclose it, but not to license it. The patent gives them a 17 year legal monopoly to do whatever they want with it (use it, license it, bury it, etc.). As an example, Capri Sun never licensed their patented "juice bag" technology, forcing others to use inferior "drink boxes" to deliver product. Now that the patent is expired, other "drink bags" are on the market.

    More worrying is that software patents are sometimes granted using such general language that the entity getting the patent *doesn't* really have to disclose anything, enabling them to get both protection while keeping their invention secret, which is exactlty the opposite effect of what patents were intended for -- to get duplicable knowledge into the public domain after a period of protection for the original inventor.

  • Why not? (Score:4, Informative)

    by FallLine (12211) <fallline@oper a m a i l .com> on Thursday February 27, 2003 @02:01PM (#5397784)
    In fact, I bet a few hours of research into Sociology, Psychology, and Linuquistics papers will turn up generic proofs and observations of the very same things that page rank takes care of in a different context. A context shift shouldn't be patentable. Much software (but not all) involves making these logical leaps. Many times they are leaps from pure science that is copyrighted (on the one hand) but (increasingly less so) open on the other. This is human knowledge we are dealing with. The Scientific Method... all that crap. It doesn't work unless everyone shares their toys. Start locking them up and you stifle innnovation (at the least) or become dictatorial master of (increasingly more of) everyone's lives.
    What a bunch of psychobabble. Google should be able to patent what they have done because:

    A) The algorithm is highly useful.

    B) It required a significant amount of risk and technical effort to make it worthwhile.

    C) The scope of the patent really just covers what it is that they've added, i.e., the ideas that they are supposedly deriving from are not being locked up.

    What more do you really need to know? Regardless of what language you wish to put your claims in, that they've just made a "context shift" or what have you, it is a worthwhile effort and it is the kind of effort that requires the potential for substantial profits to secure continued efforts. People don't take risk without at least the potential to profit and the greater the potential reward the greater risks people are willing to take. Are you really going to argue that the idea was obvious or easy? If so, then explain why no one did it before, when billions of dollars and many years were (and are) being spent on such internet technology. There was a considerable lag time between the appreciation of the need for a good search engine (and the resources to develop them) and google's appearance. What's more, keep in mind that:

    a) Google's core methodology is no secret now

    b) The patent's life is limited.

    c) The ideas that they presumedly derived from a still as open as they were prior to this patent

    d) This country produces far more than any country despite the fact that we arguably "share our toys" less than most countries, even more than countries with much larger populations (even technically educated ones)....

    Now I agree that there are dangers in allowing people to patent any and everything, e.g., well known sorting algorithms and other fundamental building blocks, but this clearly is not happening here.
  • They already do (Score:2, Informative)

    by yerricde (125198) on Thursday February 27, 2003 @02:49PM (#5398372) Homepage Journal

    The patent notice contains a U.S. patent number. When entered into the USPTO search engine [uspto.gov], a patent number calls forth a complete description of how to implement an invention.

  • by mentin (202456) on Thursday February 27, 2003 @02:50PM (#5398387)
    One piece of prior art is known to every science student who needed to find most important articles on a subject. Without any computers, we looked the references (now hyperlinks) and searched for articles that are referenced most, thus finding articles with highest "rank".

    But since USPTO considers "find a common knowlegde algorithm and patend a way to do it with computers" a valid patenting method, they probably would not consider it a prior art.

  • Re:Why not? (Score:3, Informative)

    by Fulcrum of Evil (560260) on Thursday February 27, 2003 @02:52PM (#5398412)

    d) This country produces far more than any country despite the fact that we arguably "share our toys" less than most countries, even more than countries with much larger populations (even technically educated ones)....

    How is this even relevant? Anyway, which countries did you have in mind?

  • Re:Good for them... (Score:4, Informative)

    by Xofer D (29055) on Thursday February 27, 2003 @03:11PM (#5398682) Homepage Journal
    Well, actually...

    See J.M. Kleinberg, "Authoritative Sources in a Hyperlinked Environment", Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, ACM Press, New York and SIAM Press, Philadelphia, 1998, pp.668-677

    That discusses the HITS algorithm, which is the core of PageRank (which is a simplified version of HITS). Sergey Brin and Lawrence Page in fact developed [1] Google from HITS [2].

    References:
    [1] S. Brin and L. Page, "The Anatomy of a Large Scale Hypertextual Web Search Engine", Proceedings of the 7th World Wide Web Converence, Elsevier Science, Amsterdam, 1998. pp. 107-117

    [2] Chakrabarti, Dom, et. al., "Mining the Web's Link Structure", Computer, August 1999. pp. 60-66
  • Re:Mis-title (Score:2, Informative)

    by naoiseo (313146) on Thursday February 27, 2003 @03:43PM (#5399071)
    This post doesn't make sense...

    Google, when it's 'reading' a page, is having a bot spider it. If google is spidering a page and comes across a link to a page it has not 'read', then it follows the link, spiders the page, and includes it in the index.

    As for returning results to the 'unread page' based on the context of the link, what do you mean by 'returned results to the page'? Do you mean, is now capable of displaying that page in its results set for a specific query?

    You *might* mean that a 'freshbot', which is googlebot's little bro, can go and pick up a new page and temporarily add it to the google index for the month, without calclulating its true PageRank (it waits until after the next update, so it can compair the new page to everything else in context).

    in this sense an 'unread' page could mean a page that is not properly indexed yet, but is a new addition.

    It's true that google can return a page in its results that don't have the search phrase on it, if that search phrase has been used to point to the page in question, but it doesn't mean google hasn't 'read' the page in question, it has.

    but google does not return pages in its results set that it hasn't spidered, and has only seen the links to. If google sees a link, it goes and indexes the page.

Pause for storage relocation.

Working...