MaxMind GeoIP DB and sqlite
I love using sqlite when I need a simple database. It find it especially handy for analyzing log files and similar data. Just import your logs into a simple DB and then run queries using SQL on them.
Sometimes you want to see where certain actions in your log file originated from, but all you have is IP addresses. Luckily they can roughly be mapped to countries – all you need is a geoip database.
The most popular is the one published by Maxmind. Their “lite” version can be downloaded for free as CSV files. Unfortunately there is no explanation how to import it into a sqlite database…
Well, this is about to change
Preparing the input
After downloading and unpacking the ZIP file you'll have two files: GeoLiteCity-Blocks.csv
and GeoLiteCity-Location.csv
. Let's have a look at their format:
$> head -5 GeoLiteCity-Location.csv Copyright (c) 2007 MaxMind LLC. All Rights Reserved. locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode 1,"O1","","","",0.0000,0.0000,, 2,"AP","","","",35.0000,105.0000,, 3,"EU","","","",47.0000,8.0000,,
As you can see the first two lines are not real data – they have to go. Since working with a 120MB file in an editor can be a pain1), we do this with sed:
sed -i -n '3,$p' GeoLiteCity-Location.csv sed -i -n '3,$p' GeoLiteCity-Blocks.csv
However this is not enough. The Maxmind CSV uses quotes to encapsulate string contents but sqlite doesn't understand this and will add all strings with quotes. Luckily the strings are simple so the quotes can just be stripped:
sed -i -e 's/"//g' GeoLiteCity-Location.csv sed -i -e 's/"//g' GeoLiteCity-Blocks.csv
Finally, you probably want to use UTF-8 encoded data, so the Location CSV needs some recoding:
recode latin1..utf-8 GeoLiteCity-Location.csv
Importing the CSV
It's time to create the tables and insert the data. Below is a script that will do that:
- geoip.sql
-- Create Tables CREATE TABLE IF NOT EXISTS geolocation ( locId INTEGER PRIMARY KEY, country TEXT, region TEXT, city TEXT, postalCode TEXT, latitude NUMERIC, longitude NUMERIC, metroCode TEXT, areaCode TEXT ); CREATE TABLE IF NOT EXISTS geoblocks ( startIpNum NUMERIC UNIQUE, endIpNum NUMERIC UNIQUE, locId INTEGER REFERENCES geolocation(locID) ); -- Import CSV files .mode csv .import GeoLiteCity-Location.csv geolocation .import GeoLiteCity-Blocks.csv geoblocks -- add additional index -- (Tip by Kamil Kubica. https://www.maxmind.com/app/csv) ALTER TABLE geoblocks ADD COLUMN idx INTEGER; UPDATE geoblocks SET idx = (endIpNum - (endIpNum % 65536)); CREATE INDEX geoidx ON geoblocks(idx);
Just save it where you put your Maxmind CSVs and feed it to sqlite3. This will take a while:
cat geoip.sql |sqlite3 geoip.db
Querying the Database
Querying is pretty straight forward, but remember that IP Addresses are used in their integer representation2).
Eg. my current IP 92.195.213.74
translates to 1556337994
.
A simple lookup then looks like this:
SELECT loc.* FROM geolocation loc, geoblocks blk WHERE blk.idx = (1556337994-(1556337994 % 65536)) AND blk.startIpNum < 1556337994 AND blk.endIpNum > 1556337994 AND loc.locId = blk.locId;
In case you wonder: the modulo operation there makes the lookup *much* faster. It's an optimization I found at the Maxmind website and adapted from MySQL to sqlite.