GOOGLE :: MANUAL

:: Top-level ::

From the very top level, there's really only one object that is interesting: the Google (or Googler, they are synonyms) class.

Using it is really easy, as this example demonstrates:

from google import Google

g = Google(verbose=1)
results = g.search("python")

for hit in results:
    print hit.url
What's going on here? A Google instance has one method that is interesting, search. You can call this with a string like you would type in the Google entry box. It then returns an instance of GoogleSearchResults, which is really just a wrapper class with three attributes:
  • search -- the query we entered
  • numhits -- the number of hits reported by Google
  • results -- a list of GoogleHit instances
GoogleHit is not a difficult object either; it encapsulates the data of one "search hit", as returned by Google. It is also a wrapper class, containing these attributes:
  • url -- the URL of the site that showed up in the list of search results
  • title -- the "title" of the URL, or, IOW, the text that accompanies it
  • text -- Google quotes some text taken from the site. This is it.
  • description -- The Google description of the search hit. (optional)
  • category -- The Google category of the search hit, translated to a tuple. (optional)
/* I guess I could have used dicts rather than wrapper classes, but these seemed a little bit easier to use. */

One page is loaded by default, which means that you get a maximum of 10 search results back. However, this can be changed by specifying the maxpages keyword argument of the Google.search method. For example:
g.search("python", maxpages=3)
loads a maximum of 3 pages of search results.

:: Lower-level ::

The Google class is merely an easy-to-use interface that hides more complex machinery. Most of the work is done by the GoogleHelper class and some auxiliary classes. Here's what happens:
  1. Google.search does a Google query with the search string. It gets back a HTML page.
  2. The HTML page is split into "atoms" by the HTMLSplitter class. (An "atom" is either a tuple with a tag and its attributes, or a string containing data. This allows for easier scanning of the page, finding tags, etc.)
  3. A list of these atoms is taken by GoogleHelper, which analyzes the document, finds the search results, number of tags, etc.
  4. The results of GoogleHelper's inspection are stored in a GoogleSearchResults instance (which in turn contains GoogleHit instances), and returned to Google.search.
  5. If maxpages > 1, repeat.



<google> <python>