| |
|
|
|
|
GOOGLE :: MANUAL
:: Top-level ::
From the very top level, there's really only one object that is interesting: the Google (or Googler, they are synonyms) class.
Using it is really easy, as this example demonstrates:
from google import Google
g = Google(verbose=1)
results = g.search("python")
for hit in results:
print hit.url
What's going on here? A Google instance has one method that is interesting, search. You can call this with a string like you would type in the Google entry box. It then returns an instance of GoogleSearchResults, which is really just a wrapper class with three attributes:
- search -- the query we entered
- numhits -- the number of hits reported by Google
- results -- a list of GoogleHit instances
GoogleHit is not a difficult object either; it encapsulates the data of one "search hit", as returned by Google. It is also a wrapper class, containing these attributes:
- url -- the URL of the site that showed up in the list of search results
- title -- the "title" of the URL, or, IOW, the text that accompanies it
- text -- Google quotes some text taken from the site. This is it.
- description -- The Google description of the search hit. (optional)
- category -- The Google category of the search hit, translated to a tuple. (optional)
/* I guess I could have used dicts rather than wrapper classes, but these seemed a little bit easier to use. */
One page is loaded by default, which means that you get a maximum of 10 search results back. However, this can be changed by specifying the maxpages keyword argument of the Google.search method. For example:
g.search("python", maxpages=3)
loads a maximum of 3 pages of search results.
:: Lower-level ::
The Google class is merely an easy-to-use interface that hides more complex machinery. Most of the work is done by the GoogleHelper class and some auxiliary classes. Here's what happens:
- Google.search does a Google query with the search string. It gets back a HTML page.
- The HTML page is split into "atoms" by the HTMLSplitter class. (An "atom" is either a tuple with a tag and its attributes, or a string containing data. This allows for easier scanning of the page, finding tags, etc.)
- A list of these atoms is taken by GoogleHelper, which analyzes the document, finds the search results, number of tags, etc.
- The results of GoogleHelper's inspection are stored in a GoogleSearchResults instance (which in turn contains GoogleHit instances), and returned to Google.search.
- If maxpages > 1, repeat.
<google> <python>
|