splitbrain.org

electronic brain surgery since 2001

Writing /var/www/html/data/log/deprecated/2024-11-07.log failed

MaxMind GeoIP DB and sqlite

I love using sqlite when I need a simple database. It find it especially handy for analyzing log files and similar data. Just import your logs into a simple DB and then run queries using SQL on them.

Sometimes you want to see where certain actions in your log file originated from, but all you have is IP addresses. Luckily they can roughly be mapped to countries – all you need is a geoip database.

The most popular is the one published by Maxmind. Their “lite” version can be downloaded for free as CSV files. Unfortunately there is no explanation how to import it into a sqlite database…

Well, this is about to change :-)


Preparing the input

After downloading and unpacking the ZIP file you'll have two files: GeoLiteCity-Blocks.csv and GeoLiteCity-Location.csv. Let's have a look at their format:

$> head -5 GeoLiteCity-Location.csv 
Copyright (c) 2007 MaxMind LLC.  All Rights Reserved.
locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode
1,"O1","","","",0.0000,0.0000,,
2,"AP","","","",35.0000,105.0000,,
3,"EU","","","",47.0000,8.0000,,

As you can see the first two lines are not real data – they have to go. Since working with a 120MB file in an editor can be a pain1), we do this with sed:

sed -i -n '3,$p' GeoLiteCity-Location.csv
sed -i -n '3,$p' GeoLiteCity-Blocks.csv

However this is not enough. The Maxmind CSV uses quotes to encapsulate string contents but sqlite doesn't understand this and will add all strings with quotes. Luckily the strings are simple so the quotes can just be stripped:

sed -i -e 's/"//g' GeoLiteCity-Location.csv
sed -i -e 's/"//g' GeoLiteCity-Blocks.csv

Finally, you probably want to use UTF-8 encoded data, so the Location CSV needs some recoding:

recode latin1..utf-8 GeoLiteCity-Location.csv

Importing the CSV

It's time to create the tables and insert the data. Below is a script that will do that:

geoip.sql
-- Create Tables
CREATE TABLE IF NOT EXISTS geolocation (
    locId INTEGER PRIMARY KEY,
    country TEXT,
    region TEXT,
    city TEXT,
    postalCode TEXT,
    latitude NUMERIC,
    longitude NUMERIC,
    metroCode TEXT,
    areaCode TEXT
);
 
CREATE TABLE IF NOT EXISTS geoblocks (
    startIpNum NUMERIC UNIQUE,
    endIpNum NUMERIC UNIQUE,
    locId INTEGER REFERENCES geolocation(locID)
);
 
-- Import CSV files
.mode csv
.import GeoLiteCity-Location.csv geolocation
.import GeoLiteCity-Blocks.csv geoblocks
 
-- add additional index
-- (Tip by Kamil Kubica. https://www.maxmind.com/app/csv)
ALTER TABLE geoblocks ADD COLUMN idx INTEGER;
UPDATE geoblocks SET idx = (endIpNum - (endIpNum % 65536));
CREATE INDEX geoidx ON geoblocks(idx);

Just save it where you put your Maxmind CSVs and feed it to sqlite3. This will take a while:

cat geoip.sql |sqlite3 geoip.db

Querying the Database

Querying is pretty straight forward, but remember that IP Addresses are used in their integer representation2).

Eg. my current IP 92.195.213.74 translates to 1556337994.

A simple lookup then looks like this:

 SELECT loc.*
   FROM geolocation loc,
        geoblocks blk
  WHERE blk.idx = (1556337994-(1556337994 % 65536))
    AND blk.startIpNum < 1556337994
    AND blk.endIpNum > 1556337994
    AND loc.locId = blk.locId;

In case you wonder: the modulo operation there makes the lookup *much* faster. It's an optimization I found at the Maxmind website and adapted from MySQL to sqlite.

Tags:
sqlite, database, geoip, maxmind
Similar posts:
1)
of course VIM can do that easily
2)
INET_ATON or php's ip2long can help you convert