1. Introduction

Software packages are mostly available only online, with just some small subsets being kept in bundles on mediums such as a CD, DVD, or USB drive. In fact, most packaging systems only maintain an index database with the current URL of different packages and versions. So, even if we download a single package, the lack of an Internet connection may hinder installation in the case it has dependencies since each dependency is a separate package that we also need to fetch.

In this tutorial, we explore ways to prepare a full Python package bundle for offline installation on a machine without an Internet connection. First, we briefly refresh our knowledge about packages and dependencies. After that, we specifically talk about Python packages. Next, we choose a specific Python package for our examples and motivate our decision. Then, we go over both manual and automatic ways to resolve dependencies. Finally, we install the package along with its already downloaded dependencies.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. It should work in most POSIX-compliant environments unless otherwise specified.

2. Packages and Dependencies

Packages are bundles of data with a similar purpose. Where we draw the line of similarity depends on our needs, resources, and how strong the relation is between the different parts.

For example, although fundamental, only select tools go into the GNU coreutils package. Again, what constitutes a core utility depends on the creators, developers, user community, and other factors.

Further, each tool within that toolset may have its own dependencies. Dependencies are additional packages, on which a given package relies.

For instance, as a whole, the current latest version of the coreutils package has five dependencies:

Installing packages has become the norm for handling software regardless of its purpose. Especially on Linux, we can install everything from interface customization like theme packages, through servers such as database and Web servers, all the way to language libraries and modules.

However, systems keep just a small amount of index data locally, so they can locate each package. Anything we require while installing has to be downloaded when it’s needed. However, these requirements aren’t always clear from the onset.

So, knowing ways to resolve package dependencies while online and bundling those with the package that needs them can be a valuable skill. How that happens is subject to the mechanics of the particular package system and package manager.

3. Python Packages

At this point, let’s look at the Python language along with its packages and dependencies. Since the latter also comprises more packages, we should first understand their format.

Python packages usually come in two main forms:

Although we can install Python packages from the sources and use tools like Cython, it’s usually easier to leverage the binary installer.

4. Example Python Package (Pandas)

For demonstration purposes, we use the Pandas package as our main example.

There are several reasons behind this:

  • one of the top data analysis and manipulation packages for Python
  • data science packages don’t usually require Internet connectivity
  • Pandas is usually applied over huge local datasets

Thus, we might end up needing the library within an air-gapped environment, especially when it comes to more sensitive data.

5. Manual Python Offline Dependency Resolution

In case we don’t want to rely on tools to get the necessary dependencies, we can do so manually.

5.1. Download Package

Although most Python packages are made available through the Python Package Index (PyPI), some are distributed as binary installers on other platforms as well:

  • pip: package installer looking up packages from the PyPI
  • anaconda: data science and machine learning Python distribution with its own conda package, dependency, and environment manager
  • custom repository: depending on the package, we might have other sources for its files

However, both of these platforms have direct links to different package versions as well:

Let’s use wget to download the current version from PyPI:

$ wget https://files.pythonhosted.org/packages/5b/5f/076b1ce74f80df0a9db244d30e30c4d4dee45342cbfa5f3e01f64cadf663/pandas-2.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Now, we have, the WHL package file locally. Yet, before installing, we should take care of the dependencies.

5.2. Check Dependencies

There are several ways to find out the dependencies of a given package.

Naturally, we can check the dependency list from the official package website, if any, but we can also do it on PyPI:

Another way to get the list automatically is conda:

$ conda info pandas

Further, we can use Python to request the latter:

$ python3 -c '
  import urllib.request;
  import json;
  url = "https://pypi.org/pypi/{}/json";
  jr = json.loads(urllib.request.urlopen(url.format("pandas")).read().decode("utf-8"));
  print(jr["info"]["requires_dist"]);
'

Here, we use the standard urllib module to request the https://pypi.org/pypi/pandas/json endpoint of PyPI. After that, we extract the info->requires_dist path, which contains data about all package dependencies in an array format.

Either way, we have the list:

For each of these packages, we should again download the package file itself along with its dependencies. However, since this process can become very tedious, we can take advantage of the pip package manager to automate it.

6. Automatic Python Offline Dependency Resolution

Using the example of pandas, let’s see how we can get the package and all necessary dependencies via pip:

$ pip download pandas
Collecting pandas
  Using cached pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)
Collecting numpy<2,>=1.23.2
  Using cached numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
Collecting python-dateutil>=2.8.2
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1
  Using cached pytz-2023.3.post1-py2.py3-none-any.whl (502 kB)
Collecting tzdata>=2022.1
  Using cached tzdata-2023.4-py2.py3-none-any.whl (346 kB)
Collecting six>=1.5
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Saved ./pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Saved ./numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Saved ./python_dateutil-2.8.2-py2.py3-none-any.whl
Saved ./pytz-2023.3.post1-py2.py3-none-any.whl
Saved ./tzdata-2023.4-py2.py3-none-any.whl
Saved ./six-1.16.0-py2.py3-none-any.whl
Successfully downloaded pandas numpy python-dateutil pytz tzdata six

Thus, with a single pip download command, we recursively get the pandas package along with all of its dependencies:

$ ls -1
numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
python_dateutil-2.8.2-py2.py3-none-any.whl
pytz-2023.3.post1-py2.py3-none-any.whl
six-1.16.0-py2.py3-none-any.whl
tzdata-2023.4-py2.py3-none-any.whl

After verifying the data with ls, let’s continue to installation.

7. Install Python Package and Dependencies Offline

At this point, we can disable our Internet connection or move all downloaded package files to another air-gapped environment.

The basic syntax of WHL package file installation with pip boils down to a single command:

$ pip install <PACKAGE>.whl

Notably, we provide a file with the .whl extension instead of just the PACKAGE name.

However, in our case, we also add several switches to ensure pip doesn’t attempt any online operations:

$ pip install <PACKAGE>.whl --find-links <DEPENDENCY_PATH> --no-index

Let’s go over each option:

  • install is the installation subcommand
  • <PACKAGE>.whl is the relevant PACKAGE file path and name
  • –find-links <DEPENDENCY_PATH> enables searching for dependency packages within the provided DEPENDENCY_PATH
  • –no-index disables searching the online PyPI index

So, we can apply this to our scenario within the directory that contains all downloaded dependencies:

$ pip install pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl --find-links ./ --no-index --no-deps
Looking in links: ../../
Processing /root/pandas/pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing /root/pandas/numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing /root/pandas/python_dateutil-2.8.2-py2.py3-none-any.whl
Processing /root/pandas/pytz-2023.3.post1-py2.py3-none-any.whl
Processing /root/pandas/tzdata-2023.4-py2.py3-none-any.whl
Processing /root/pandas/six-1.16.0-py2.py3-none-any.whl
pandas is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
Installing collected packages: pytz, tzdata, six, numpy, python-dateutil
Successfully installed numpy-1.26.3 python-dateutil-2.8.2 pytz-2023.3.post1 six-1.16.0 tzdata-2023.4

Let’s confirm by using pandas to create a basic data frame with a date range:

$ python -c 'import pandas as pd; dates = pd.date_range("20101010", periods=666); print(dates);'
DatetimeIndex(['2010-10-10', '2010-10-11', '2010-10-12', '2010-10-13',
               '2010-10-14', '2010-10-15', '2010-10-16', '2010-10-17',
               '2010-10-18', '2010-10-19',
               ...
               '2012-07-27', '2012-07-28', '2012-07-29', '2012-07-30',
               '2012-07-31', '2012-08-01', '2012-08-02', '2012-08-03',
               '2012-08-04', '2012-08-05'],
              dtype='datetime64[ns]', length=666, freq='D')

As expected, the package has been installed correctly and works as expected.

8. Summary

In this article, we talked about offline Python package installation.

In conclusion, we can install Python packages without an Internet connection as long as we prepare them along with their dependencies beforehand.

Comments are closed on this article!