Name::Find

What Is It?

Name::Find is a Perl module for finding names in a text string. It doesn't look for a particular name, it finds variations of names in the form: Honorific GivenName1 GivenName2 Surname Suffix. It can also find names in the form Surname Suffix, GivenName1 GivenName2. Some parts may not exist and GivenName1 and 2 may be initials. Further, names can be hyphenated or repeated (More than 2 given names for instance). It uses a dictionary based approach, so names not in the dictionary will not be found. Their is a separate dictionary for each of the word positions in the name, so you don't have to have a list of every possible name combination.

What Do I need?

There are two things you need, first is the Name::Find perl module and second are some name dictionaries. For your convienence, you can download both off sourceforge at http://sourceforge.net/projects/namefind the Name-Find package has the Perl Modules and the namelists package has dictionaries. The other requirement is that you have the Berkley DB DBFile module installed. you can get this from CPAN (http://www.cpan.org).

How Do I install the name lists

The namelists package has several files. male-names.txt, female-names.txt, middle-names.txt, family-names.txt, non-names.txt and buildnamedbs. the first two files are combined into the given name database, the first three into the middle names db, the fourth makes the family name database (last names), and the fifth contains phrases that get recognized as names but really aren't. The 'buildnamedbs' file is a perl script, it does what you think it does. Just run it with no arguments and it will create three name databases, one for family names (family-names.db), one for given names (given-names.db), and one for given2 (middle) names (given2-names.db), it will also create a database of non-names (non-names.db). The format of the text files are one name per line. Feel free to add any names you find that I have missed. You can also post missing names to the 'Feature Requests' message board on the project info page on sourceforge http://sourceforge.net/projects/namefind if you so desire.

After you have the databases, you should put them where you can find them. My personal preference is in /usr/local/words/ but you can put them anywhere. Keep in mind, that if you put them somewhere other than /usr/local/words, you will have to change the test.pl and findnames scripts to have the correct location.

How do I install the Perl Module

after downloading it, uncompress and untar it 'tar -xvzf Name-Find-number.tar.gz' or if that doesn't work, 'gunzip -c Name-Find-number.tar.gz | tar -xvf -' then cd into the directory that gets created and run 'perl Makefile.PL' (If you put the name databases in a location other than /usr/local/words/, you must edit test.pl and findnames to change the variable $dbbase to the correct location.) Now try 'make test' You should see a bunch of lines, at the end, it will run a bunch of tests to make sure the module is working correctly they should all be 'ok ...' if they don't work, chances are, either you haven't installed the name databases correctly, or you don't have DBFile installed.

Once it passes all the tests, become root and run 'make install'. Then stop being root.

That's all there is to it, it is installed. If you have your MANPATH set up correctly, you can even get the man page with 'man Name::Find'

How do I use the findnames program?

findnames is an example script. It is primarily meant as a practical example of how you can use the Name::Find module in your own code, but it may be useful on its own. It takes one or more files as an argument. Try it out with 'findnames Find.pm' It prints the results to standard out.

How do I use Name::Find

I would strongly suggest looking at the 'findnames' program to get a good idea how to use the module, but here is a quick example (that hopefully works)

#!/usr/bin/perl

use Name::Find;

# the string we are going to look for names in
$textstr = "My favorite character on Gilligan's Island was Thurston Howell.";

# the location of your name databases
$dbbase = '/usr/local/words/';

# set up the name finder
$namefinder = new Name::Find(
        Given1db        => "${dbbase}given-names.db",
        Given2db        => "${dbbase}given-names.db",
        Surnamedb       => "${dbbase}family-names.db",
        NonNamesdb      => "${dbbase}non-names.db",
        );

# get the list of names
@namelist = $namefinder->findNames($textstr);

# print them out.
print join(', ', @namelist), "\n";

The above example will print out:
Thurston Howell

Bugs / Errata / Future work

There is of course a bit of room for improvement.

SourceForge.net Logo