1. Overview

Many web pages and RESTful services provide data about an IP address, including geolocation. However, we may prefer a locally hosted and manually inspectable database to avoid relying on external services.

In this tutorial, we’ll see how to look up the geolocation of an IP address in the Linux terminal using a local copy of MaxMind’s free GeoLite2 databases and a Bash script. We’ll focus on IPv4 only to avoid cumbersome scripting. However, our Bash code can be easily adapted to IPv6 addresses.

As of January 2023, GeoLite2 contains 3,343,938 IPv4 CIDRs, corresponding to the worldwide coverage of 3,689,264,895 IPv4 addresses (not officially documented). In fact, since an IPv4 address is a 32-bit number, the total number of possible addresses is 2^32, which equals about 4.3 billion. We must subtract 600 million reserved IPs that aren’t available for public routing from that theoretical number. This gives us approximately 4.3-0.6=3.7 billion IPv4 addresses, the same amount in GeoLite2.

2. Which Databases Do We Need?

After registering on the MaxMind site, we can log in and download six GeoLite2 databases. The one we are interested in is “GeoLite2 City” in the CSV format:

MaxMind GeoLite2 downloadWe chose the CSV format because it’s compatible with any scripting or programming language capable of parsing a text file, such as Bash. In addition, we can inspect MaxMind’s CSV databases with any plain-text editor that can handle huge files, such as nano.

On the other hand, the other available format, mmdb, which is a custom binary format used only by MaxMind’s DBs, requires ad hoc tools like mmdblookup. This poses a problem with the availability and updating of such software for the various Linux distributions. For this reason, we prefer not to depend on such a binary format.

From the zip file, we need to extract the two files highlighted in the screenshot below and place them in the same folder where we’ll put our Bash script:

GeoLite2 DBs - Files to be extracted

MaxMind updates its GeoLite2 databases once a week. According to section 6(c) of the End User License Agreement, we must keep a current copy of the databases and destroy old ones. This legal obligation is also a norm of common sense because IP address assignments change over time.

2.1. GeoLite2-City-Blocks-IPv4.csv

The official documentation’s Blocks Files section details the information in GeoLite2-City-Blocks-IPv4.csv. Let’s take a quick look at its first few lines with nano GeoLite2-City-Blocks-IPv4.csv:

GeoLite2-City-Blocks-IPv4.csvThe network column, which is the primary key of this database, progressively contains unique CIDR values. The fact that the column is already ordered is essential for our search algorithm, so let’s keep that in mind.

The CIDR notation is a compact way to denote a set of IP addresses. For example, let’s ask prips which IPs the first line refers to:

$ prips

Next to the network column, we have the geoname_id column, which is also the primary key of the GeoLite2-City-Locations-en.csv database, as we’ll see shortly. Thus, the two databases have an N-to-1 relationship.

2.2. GeoLite2-City-Locations-en.csv

The official documentation’s Locations Files section details the information in GeoLite2-City-Locations-en.csv. Let’s take a quick look at its first few lines with nano GeoLite2-City-Locations-en.csv:

GeoLite2-City-Locations-en.csvOverall, both databases contain more information than we need for geolocation, so we’ll use a subset of it.

3. The Algorithm

We’ll use Bash as a query tool for a relational database consisting of two tables corresponding to the two CSV files. This may sound difficult, but it is feasible without complications if we know what we want to do.

The most challenging part of our algorithm is the binary search, which is an incredible performance boost, allowing Bash to search the IP in one or two seconds. We are lucky that the primary key of GeoLite2-City-Blocks-IPv4.csv, i.e., the network column, is already sorted in ascending order by CIDR. Otherwise, binary searching wouldn’t be possible. By comparison, if the same search were linear, it would take many hours with such a vast database.

Let’s carefully observe the following flowchart, which is a reasonably faithful representation of what our Bash script will have to do:

Geolocate IP algorithm for GeoLite2To check whether a given IP belongs to a certain CIDR, we’ll use grepcidr. Although this is an implementation detail, we’ve made it explicit in the flowchart because its installation is a prerequisite.

4. The Bash Code

Before proceeding further, let’s understand how we’ll code some nodes of the previous flowchart:

  • To insert the contents of GeoLite2-City-Blocks-IPv4.csv into an array, we’ll use the built-in readarray command.
  • The variable $record is not an array but a string whose content is a row of GeoLite2-City-Blocks-IPv4.csv.
  • We’ll use the built-in read command to extract the comma-separated values of $record, such as $network, $geoname_id, $latitude, $longitude, and others.
  • The $IP<$startIP comparison is performed by a for loop that compares, from left to right, one by one, the four digits separated by dots that make up the IP addresses, exiting the loop at the first difference found.
  • Finally, to extract the GeoLite2-City-Locations-en.csv record containing a specific geoname_id, the grep command is sufficient without the need for manual coding of the search algorithm.

Now we have all the preliminary information to move on to the code:

###    IPv4 information based on MaxMind's GeoLite2 database     ###
### https://dev.maxmind.com/geoip/geolite2-free-geolocation-data ###

### Required files


### Debug mode (enable it only to investigate the execution flow)

DEBUG=false # it can be true or false

### Initial checks

if (( $# == 1 )); then
    rx='([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])' # regex to validate an IP address
    if [[ ! $IP =~ ^$rx\.$rx\.$rx\.$rx$ ]]; then
      echo "Not valid IP: $IP" >&2
      exit 1
    echo 'Usage: ./geolocateIP.sh <IPv4 ADDRESS>'
    exit 1

if ! test -f $DB; then
    echo "$DB is missing" >&2
    exit 1

if ! test -f $LOCATIONS; then
    echo "$LOCATIONS is missing" >&2
    exit 1

if [ -z "$(which grepcidr)" ]; then
    echo "Please install grepcidr (https://manpages.org/grepcidr)" >&2
    exit 1

### Loading the entire CSV into an array
### There are no limitations on how many elements we can store in the array, assuming to have enough system memory
readarray -t array_csv < $DB
if $DEBUG; then echo "CSV loaded in memory..."; fi

### Looking for the IP in the database
### Luckily, the CSV records, and thus the array, are already sorted by "network" in CIDR notation, which is unique for each record
### We use Binary Search (https://en.wikipedia.org/wiki/Binary_search_algorithm) to reduce the complexity from O(n) to O(log_2 n)

min=0                           # index of the first element of the array
max=$(( ${#array_csv[@]} - 1 )) # index of the last element of the array
if $DEBUG; then echo "The DB contains ${#array_csv[@]} records"; fi
attempts=$(echo "l(${#array_csv[@]})/l(2)" | bc -l | awk  '{printf "%.0f\n", $1}')
if $DEBUG; then echo "We have to make at most $attempts attempts to find information about $IP"; fi

while [ $min -lt $max ]; do
    # Compute the mean between min and max, rounded up to the superior unit
    current=`expr '(' "$min" + "$max" + 1 ')' / 2` # current array index to be checked
    if $DEBUG; then echo ""; fi
    if $DEBUG; then echo "Test $counter -> Current index of the DB: $current"; fi
    IFS="," read    network geoname_id registered_country_geoname_id represented_country_geoname_id \
                    is_anonymous_proxy is_satellite_provider postal_code latitude longitude accuracy_radius <<< $record
    if $DEBUG; then echo "Checking if $IP belongs to the network: $network..."; fi
    if echo "$IP" | grepcidr $network >/dev/null; then
        echo "$IP is in the network $network";
        if $DEBUG; then echo "Geoname ID: $geoname_id"; fi
        georecord=$(cat "$LOCATIONS" | grep "$geoname_id,")
        IFS="," read    geoname_id locale_code continent_code continent_name country_iso_code country_name  \
                        subdivision_1_iso_code subdivision_1_name subdivision_2_iso_code subdivision_2_name \
                        city_name metro_code time_zone is_in_european_union <<< $georecord
        echo "Location: $city_name (Postal Code $postal_code), $subdivision_2_name, $subdivision_1_name, $country_name, $continent_name"
        echo "Approximate Coordinates (accuracy radius ${accuracy_radius}km): http://maps.google.com/maps?q=$latitude,$longitude"
        if $DEBUG; then echo "Debug: we can compare the results with https://www.maxmind.com/en/geoip2-precision-demo"; fi
        break # exit the "while" loop
        if $DEBUG; then echo "No, $IP is not in the network: $network"; fi
        startIP=${network%/*} # in this DB, removing the network mask from the CIDR is enough to get the start IP of the IP range
        for v in 1 2 3 4; do            
            A=$(echo $IP | cut -d '.' -f$v)
            B=$(echo $startIP | cut -d '.' -f$v)
            if [ $A -lt $B ]; then
                if $DEBUG; then echo "$IP is less then $startIP"; fi
                max=`expr $current - 1`
                break # exit only the current "for" loop, continuing the "while" loop
            if [ $A -gt $B ]; then
                if $DEBUG; then echo "$IP is greater then $startIP"; fi
                break # exit only the current "for" loop, continuing the "while" loop
            if [ $v -eq 4 ] && [ $A -eq $B ]; then
                if $DEBUG; then echo "Debug: $IP and $startIP must be different, so the execution should never come here" >&2; fi
                if $DEBUG; then echo "Debug: \$A is $A and \$B is $B" >&2; fi
                exit -1
    if ! [ $min -lt $max ]; then
        echo "$IP is not in the database. No result. Probably it is a reserved IP address (private, multicast, etc.)"

Comments in the code and various echo help us understand this flowchart implementation. In particular, if we set the constant $DEBUG equal to true, we have accurate logging of the execution flow.

5. Usage Examples

The output of our script will likely be different as new GeoLite2 updates come out, as IP address assignments change over time. That said, let’s do three tests:

$ ./geolocateIP.sh is in the network
Location:  (Postal Code ), , , Italy, Europe
Approximate Coordinates (accuracy radius 200km): http://maps.google.com/maps?q=43.1479,12.1097

$ ./geolocateIP.sh is in the network
Location: Ashburn (Postal Code 20149), , Virginia, "United States", "North America"
Approximate Coordinates (accuracy radius 1000km): http://maps.google.com/maps?q=39.0469,-77.4903

$ ./geolocateIP.sh is in the network
Location: "Castelfranco Emilia" (Postal Code 41013), "Province of Modena", Emilia-Romagna, Italy, Europe
Approximate Coordinates (accuracy radius 100km): http://maps.google.com/maps?q=44.5919,11.0487

As we can see, the level of accuracy changes from case to case. We can have greater accuracy and more details with the paid databases. However, we can never use geolocation to identify a particular address or household. This isn’t technically possible, and section 5 of the EULA prohibits this usage.

6. Conclusion

In this article, we’ve seen how to look up the geolocation of an IP address in the Linux terminal using a Bash script and a local copy of the GeoLite2 databases. This information is an excellent supplement to the whois data available for that IP.

An interesting aspect is that obtaining an IP’s GPS coordinates or street address allows possible integration into other scripts. For example, we could create a monitoring system on our Linux server to report suspicious IPs and their geolocation.

Comments are closed on this article!