Text Processing in Python, by David Mertz, 2003, Addison Wesley, 520 pages.
If you have read an introductory book or two about Python
programming, but you are far from being an expert, then you will benefit
a lot from reading this book. If you are a competent programmer in any other
language, you will benefit from this book. If you are an expert Python programmer,
you will also benefit from this book.
For, as you know, there are many good introductory texts about Python. This
is not one of them, for this is an advanced book, but not an inaccessible
one. David Mertz has a unique style and focus that we have become familiar
with from his "Charming Python" series of articles on the IBM
Developer Network. Dr.
Mertz is more interested in facilitating our learning process than
in lecturing us, and rather than fill his pages with impressive examples
designed to illustrate his expertise, he gently guides us by offering subtle
yet important examples of code and analysis that makes us think for
He has a special talent for programming in the functional style, and this
is a great introduction to that style of Python programming. Thus, this is
also a good guide to using the newer features introduced into Python in the
last few revisions, which often facilitate the functional style of programming.
The text includes, in an appendix, a 40 page tutorial covering the basic
Python language. This tutorial is, like the book, unique in its approach
and is worthwhile even for experienced Pythonistas, as it sheds light on
some of the underlying ideas behind the syntax and semantics, and it also
illustrates the functional style of programming, which is sometimes quite
useful when doing text processing. And, despite its many other virtues, this
is a book about text processing.
Chapter 1 covers the Python basics, but with a particular eye towards those
features most critical and useful for text processing. Chapter 2 covers the
basic string operations as found in the string module and the newer built-in
string functions. Chapter three is about Regular Expressions, and, although
I am shy about regexes because of their relative complexity, I am very glad
to have read this chapter and will no longer be intimidated when regexes
are the correct approach to take! Chapter 4 is on Parsers and State machines,
which are important for processing nested text, as in everyday HTML, XML
and the like. This chapter is not as esoteric as its title may sound to relative
newbies (like myself), as it does offer useful ideas and principles for dealing
with HTML. How much more useful can a topic be than that? It is true that
a deep understanding of this subject may be beyond myself and other relative
duffers, but this chapter has much to offer those like me and I am sure much
more to offer professionals.
Chapter 5 is on Internet tools and techniques, and this a good example of
how text processing touches every important area of computer programming.
We manipulate text for email, newsgroups, CGI programs, HTML and many other
aspects of net programming. A good summary of XML programming is included,
as well as useful synopses of other Python internet modules, from a text
processing point of view.
Appendix A is the aforementioned selective and short review of Python basics.
Appendix B is a ten page Data Compression primer that is quite educational.
Appendix C offers the same good service for Unicode, and Appendix D covers
the author's own software, a state machine for adding markup to text, which
is backed up by his extensive web site that has a lot of free software to
support those doing extensive text processing. Lastly, Appendix E is a Glossary
for technical terms from the book. This is very much an educational book,
and would be suitable for classroom work at the University level, beyond
the introductory programming level; in fact, as part of a curriculum to teach
programming using Python at the University level, this would be an excellent
text for the second course.
One of the highlights of the book is that each chapter is concluded with
a problem and discussion section. These are of the highest quality I have
encountered in computer texts. Rather than overwhelming the reader with a
large number of problems, the author has obviously given a lifetime of thought
in coming up with a few key problems that are meant to stimulate thought,
creativity, and ultimately understanding and growth in the reader. I will
be coming back to the problems often, as they cannot be absorbed quickly
anyway; they require thought. These would be most useful in a classroom environment;
but as they are accompanied by excellent discussion material, and backed
up by the author's web site, the individual reader will be well served also.
The book is more than the sum of its parts. It will be a most useful reference
source for when I am doing various text related tasks for some time to come,
and it was also a delightful and educational quick read in the here and now.
It also amply illustrates the centrality of text processing in all areas
of computer science, and I am confident that the book will be useful and
educational for all programmers, whatever their area of expertise.
To sum it all up, this book is educational. It is also beautifully bound
and printed, and excellently written. I rate it five stars, my highest rating,
and heartily recommend its purchase.