Wednesday, August 5, 2009

Loading up the FCC License Database

I am putting up a site that requires access to a database containing the latest and greatest list of amateur radio licensees. This data is freely available directly from the FCCs website. They provide the data in a very clean format, consisting of a schema (txt file) and several data files (dat files). They also distribute PDFs that describe the data format in great detail. I hear people on TV and radio constantly taking stabs at the FCC for this or that. Well, i'd like to take a quick second and extend my thanks to the FCC, for providing this data in a simple format that is easy to obtain and Very well documented. Hats off to you FCC!

Now, the question is: What is the fastest way to get this data into a relational format ready to be used by a website? The data is only 355Mb raw, not much to deal with. There's most definitely several solutions to this problem in whatever language you choose. This blog posting will describe my solution, using SQLite and Python.

Importing the Schema

The schema txt file comes in a format that can not be used directly. Depending on your dbms, you'll need to hack at the formatting a bit. There are 'go' delimiters between each create table statement and none of the creates are semicolon delimited. I wanted to perform the database creation in a single 'executescript' call using sqlite, so I had to fix up the text. I wrote a few lines of script in Python that would fix up the data and import it into a new Sqlite database.
data = open(sqlFilename,'r').read()
#hacks: remove go tokens and close off create statements
# so this can be executed as a script
data = data.replace('go\n','')
data = data.replace(')\n',');\n')

con = sqlite3.connect(dbFilename)
con.execute("attach '%s' AS dbo;" % (dbFilename))
con.executescript(data)
con.commit()
con.close()
sqFilename can be something like "test.db" and dbFilename should be something like "pa_ddef44.txt". After executing the snippet, you'll end up with a complete sqlite database that's empty as can be. Now we need to get all our data imported. This is where I hit a stumbling block.

Importing the Data Files

I wanted to perform a pure-python implementation of the import, directly from csv (without resorting to use the sqlite3 executable). All the data files come in a "csv" like format, where | is the delimiter. The problem with this is, dbapi 2.0 does not promote a "Import from CSV" method (and that's a good thing). If I had to do this manually by creating the insert statements, it would have been necessary to manually deduce which fields for each table are varchar, then quote the varchar fields. I definitely did not want to deal with all that jazz, so I went ahead and just used the command line utility. Thankfully, sqlite has the .import command. I ended up using python anyway to script it, since there was some necessary mangling of the table name:
files = glob.glob('*.dat')
for file in files:

tableNameSuffix = os.path.splitext(os.path.basename(file))[0].upper()
tableName = 'PUBACC_' + tableNameSuffix

cmd = 'sqlite3 -separator "|" %s ".import %s %s' % (dbFilename,file,tableName)
print 'ex: ', cmd
print 'size: ', os.path.getsize(dbFilename)
os.system(cmd)
The tableNameSuffix statement probably looks like a huge hack but the last portion of each table name comes from the file names. Rather nifty! So, what does this all look like at the end of the day? I went ahead and ran, here's the results:
created database...
size: 193536
ex: sqlite3 -separator "|" test.db ".import AM.dat PUBACC_AM
size: 193536
ex: sqlite3 -separator "|" test.db ".import CO.dat PUBACC_CO
size: 43587584
ex: sqlite3 -separator "|" test.db ".import EN.dat PUBACC_EN
size: 48450560
ex: sqlite3 -separator "|" test.db ".import HD.dat PUBACC_HD
size: 190622720
ex: sqlite3 -separator "|" test.db ".import HS.dat PUBACC_HS
size: 339019776
ex: sqlite3 -separator "|" test.db ".import LA.dat PUBACC_LA
size: 414668800
ex: sqlite3 -separator "|" test.db ".import SC.dat PUBACC_SC
size: 414686208
ex: sqlite3 -separator "|" test.db ".import SF.dat PUBACC_SF
size: 416502784
Voila! Done. So, now what can we do with all this? First off, where am I in all this?
sqlite> select * from PUBACC_AM where callsign = 'KJ4JIO';
AM,3077751,,,KJ4JIO,T,D,4,,,,,,,,,,

That's me!! And the T stands for Technician. If we dig into another table, we can find out some more:

sqlite> select first_name, last_name, city from PUBACC_EN where call_sign='KJ4JIO';
Joseph|Armbruster|Orlando


Which makes me wonder, how many other Armbrusters out there have their amateur radio licenses? Let's find out!

sqlite> select first_name, last_name, city from PUBACC_EN where last_name = 'Armbruster';

John,Armbruster,Erma
Catherine,Armbruster,Palm City
Michael,Armbruster,Palm City
John,Armbruster,Denver
Joseph,Armbruster,Orlando
Kirsten,Armbruster,Denver


Quite a list indeed. Let's get some more useful information, how about when each of these individuals were granted their licenses?

sqlite> select first_name, last_name, city, grant_date from PUBACC_EN inner join PUBACC_HD on PUBACC_EN.unique_system_identifier = PUBACC_HD.unique_system_identifier where last_name = 'Armbruster';

John,Armbruster,Erma,08/21/2003
Catherine,Armbruster,Palm City,03/27/2001
Michael,Armbruster,Palm City,03/27/2001
John,Armbruster,Denver,02/19/2003
Joseph,Armbruster,Orlando,02/11/2009
Kirsten,Armbruster,Denver,03/12/2009


Problem solved. I can now dig through all the amateur radio licensees of the US and generate some potentially useful results. It would be a fun task to integrate this with Google earth or maps to get a view of all amateur radio operators in the US. In fact, that would make an excellent topic for a future posting.

Cheers!

2 comments:

Open Source Redneck said...

Just wanted to let you know I've used parts of your import code in my automation process to allow a public free xml api for programmers. Great Job!

Joseph Armbruster said...

Good to hear! I'm glad someone made some constructive use of it. Shoot me an email sometime with the details.

Joe