Now, the question is: What is the fastest way to get this data into a relational format ready to be used by a website? The data is only 355Mb raw, not much to deal with. There's most definitely several solutions to this problem in whatever language you choose. This blog posting will describe my solution, using SQLite and Python.
Importing the Schema
The schema txt file comes in a format that can not be used directly. Depending on your dbms, you'll need to hack at the formatting a bit. There are 'go' delimiters between each create table statement and none of the creates are semicolon delimited. I wanted to perform the database creation in a single 'executescript' call using sqlite, so I had to fix up the text. I wrote a few lines of script in Python that would fix up the data and import it into a new Sqlite database.
data = open(sqlFilename,'r').read()sqFilename can be something like "test.db" and dbFilename should be something like "pa_ddef44.txt". After executing the snippet, you'll end up with a complete sqlite database that's empty as can be. Now we need to get all our data imported. This is where I hit a stumbling block.
#hacks: remove go tokens and close off create statements
# so this can be executed as a script
data = data.replace('go\n','')
data = data.replace(')\n',');\n')
con = sqlite3.connect(dbFilename)
con.execute("attach '%s' AS dbo;" % (dbFilename))
Importing the Data Files
I wanted to perform a pure-python implementation of the import, directly from csv (without resorting to use the sqlite3 executable). All the data files come in a "csv" like format, where | is the delimiter. The problem with this is, dbapi 2.0 does not promote a "Import from CSV" method (and that's a good thing). If I had to do this manually by creating the insert statements, it would have been necessary to manually deduce which fields for each table are varchar, then quote the varchar fields. I definitely did not want to deal with all that jazz, so I went ahead and just used the command line utility. Thankfully, sqlite has the .import command. I ended up using python anyway to script it, since there was some necessary mangling of the table name:
files = glob.glob('*.dat')The tableNameSuffix statement probably looks like a huge hack but the last portion of each table name comes from the file names. Rather nifty! So, what does this all look like at the end of the day? I went ahead and ran, here's the results:
for file in files:
tableNameSuffix = os.path.splitext(os.path.basename(file)).upper()
tableName = 'PUBACC_' + tableNameSuffix
cmd = 'sqlite3 -separator "|" %s ".import %s %s' % (dbFilename,file,tableName)
print 'ex: ', cmd
print 'size: ', os.path.getsize(dbFilename)
created database...Voila! Done. So, now what can we do with all this? First off, where am I in all this?
ex: sqlite3 -separator "|" test.db ".import AM.dat PUBACC_AM
ex: sqlite3 -separator "|" test.db ".import CO.dat PUBACC_CO
ex: sqlite3 -separator "|" test.db ".import EN.dat PUBACC_EN
ex: sqlite3 -separator "|" test.db ".import HD.dat PUBACC_HD
ex: sqlite3 -separator "|" test.db ".import HS.dat PUBACC_HS
ex: sqlite3 -separator "|" test.db ".import LA.dat PUBACC_LA
ex: sqlite3 -separator "|" test.db ".import SC.dat PUBACC_SC
ex: sqlite3 -separator "|" test.db ".import SF.dat PUBACC_SF
sqlite> select * from PUBACC_AM where callsign = 'KJ4JIO';That's me!! And the T stands for Technician. If we dig into another table, we can find out some more:
sqlite> select first_name, last_name, city from PUBACC_EN where call_sign='KJ4JIO';
Which makes me wonder, how many other Armbrusters out there have their amateur radio licenses? Let's find out!
sqlite> select first_name, last_name, city from PUBACC_EN where last_name = 'Armbruster';
Quite a list indeed. Let's get some more useful information, how about when each of these individuals were granted their licenses?
sqlite> select first_name, last_name, city, grant_date from PUBACC_EN inner join PUBACC_HD on PUBACC_EN.unique_system_identifier = PUBACC_HD.unique_system_identifier where last_name = 'Armbruster';
Problem solved. I can now dig through all the amateur radio licensees of the US and generate some potentially useful results. It would be a fun task to integrate this with Google earth or maps to get a view of all amateur radio operators in the US. In fact, that would make an excellent topic for a future posting.