Skip to main content

Tom White

Tom White is a committer on the Apache Hadoop project, and a member of the Lucene Project Management Committee. He works as an independent consultant specializing in Hadoop and distributed computing. He has been writing Java full time since 1996, and writing about Java since 2003 for O'Reilly, java.net and IBM's developerWorks. Outside programming Tom enjoys making his daughters laugh, and watching 1930s Hollywood films.

 

Articles

In the second part of this look at the Nutch web indexing and search engine, Tom White looks at how to perform searches on the index generated in part one's crawl, and shows how to integrate Nutch's search capabilities with your applications through direct Java calls to its API or via the...
Do you need your own search engine, when the world already has Google? Quite possibly so: you may belong to an organization with enough of its own contents that you want to manage and run your own search engine--and know how it works. Nutch is an open source search engine written in Java. In this...
All modern search engines attempt to detect and correct spelling errors in users' search queries. This article shows you one way of adding a "did you mean" suggestion facility to your own search applications using the Lucene Spell Checker.
Parallel computing allows some programs to run faster by dividing them up into smaller pieces and running these pieces on multiple processors. ComputeFarm is an open source Java framework for developing and running parallel programs.

Weblogs

MapReduce is a programming model for processing vast amounts of data. One of the reasons that it works so well is because it exploits a sweet spot of modern disk drive technology trends. In...

I've bumped into consistent hashing a couple of times lately.

I've raved about the MapReduce parallel programming model in the past, and Apache Hadoop (the framework for...

I noticed that Paul Dowman has created a Ruby on Rails AMI for use on Amazon EC2 (...

The long-awaited final version of jMock 2 was released today. There are some big changes since version one. For example, you can now write

...

We kept breaking our XML catalog resolution in the course of developing an application. We would refactor the parser code, or we...

In Literate Programming with jMock
I enthused about jMock's idea of constraints and...

In a previous blog entry I mentioned a literate functional testing framework that we had developed at...

[Update: changed wording per comments to fix error.]

In case you haven't heard of it, Amazon S3 is a web service for storing data.
The two great things about it are that it's simple (look at its nice...

Singulars and plurals are so different, bless my soul.
Has it ever occurred to you that the plural of "half" is "whole"?

Allan Sherman...

According to the dictionary, an anaphor is a word used to avoid repetition. It refers back to something in the conversation. The...

We've been using jMock at our company for some time now. We've found it great for test driven development
and...

Anders Møller's dk.brics.automaton is a Java regex
package whose main claim to fame is that it is significantly...

With the launch of Amazon S3 (Simple Storage Service) we are seeing a continuation of the trend for the big web companies to monetize their computing...

In a previous blog
I wrote about Nutch's MapReduce implementation, for distributed processing of massive...

There's been a bit of a backlash against XML config files lately. The Ruby On Rails community has a crisp putdown: avoid "doing XML sit-ups". And the...

Doug Cutting has done it again. The creator of Lucene and Nutch has implemented (with...

One of the things that writing Object Oriented systems encourages you to consider is:
what is the responsibility of each class in the system?
For example, the Model-View-...

There is an old saying that mathematicians only know three numbers: 0, 1 and ∞ (infinity).
There is some truth in this in computing too, as dealing with a single entity can be very...

Perl is famous for its one-liners. By using the -e command line switch you can execute...

During a panel discussion at the 1999 JavaOne conference Bill Joy, talking about the things he didn't like about Java, stated that...

There was no fanfare, in fact it's not even linked to from the JDK 1.5 documentation, but
the third edition of the Java...