PyPI mirror

Your Python applications or services will have dependencies, most likely installed through pip (pulling from PyPi, the Python package registry). For example in a Dockerfile:

RUN pip3 install onnx==1.14.0

When you aim for reliable and stable builds this is problematic.

  1. It's impossible to pin your dependency list. Even though you pin to the exact onnx 1.14.0 version above, onnx's dependencies specify:

    numpy
    protobuf>=3.20.2
    typing-extensions>=3.6.2.1

    So when you install onnx you'll get any numpy version, and any protobuf version as long as it's >=3.20.2. If a breaking change is introduced in any of these packages our application breaks. And, the exact versions you'll get are dependent on when you build the container. Building the same container on your developer machine and in production, at the same time, can yield wildly different package lists - depending on your build cache.

  2. There are package managers (like Poetry) that create lock files alongside your requirements.txt (and you should use them!). They still can cause issues:

    1. If you add a new dependency later on. Your lock file was accurate when you started a new project, but when adding a new dependency your package manager needs to recalculate the dependency graph - which can lead to unwanted upgrades of sub-dependencies (this even happens with poetry lock --no-update).

    2. When you add an older version of a dependency. Your package manager will resolve the dependency tree using the Python package registry of today (and not when the dependency was created), unaware that new versions of a sub-dependency (e.g. protobuf above) might break the dependency.

    3. If you're not starting a new project, for example because you found an interesting tutorial or example on the internet (which most likely doesn't have a lock file). Your package manager again uses the Python package registry of today, and happily installs the latest version of sub-dependencies - which most likely don't work with the old application.

  3. Package versions can and will be deleted from the Python package registry. Even if you freeze your complete list of dependencies and subdependencies this'll break your builds. Even large companies do this, e.g. Google has deleted versions of jaxlib in the past.

StableBuild solves all of these problems by creating a daily copy of the complete PyPi registry (the most popular Python package registry). You can thus pin your package list to a specific date, and this will return the exact same packages, regardless whether packages were updated or removed upstream. Example:

pip3 install \
    -i https://your-domain.pypimirror.stablebuild.com/2023-11-18/ \
    onnx==1.14.0

This will always install the exact same package list:

numpy==1.26.2
onnx==1.14.0
protobuf==4.25.1
typing_extensions==4.8.0

When you want to add a new dependency later on, you can use the same pin date - and the dependency tree will be resolved using the registry of that date, keeping the exact same version of your sub-dependencies.

We have full daily copies of the PyPi registry indexes from Sept. 21, 2023. If you pin to a date before this we'll dynamically calculate what the registry looked like on the date. This mostly works, but you might see small differences (e.g. we can't recover deleted packages).

That's it. Your Python package list is now stable and reliable. 🎉

Getting older Python examples working

One huge benefit of StableBuild's PyPi mirror is that it can help you resurrect old Python examples easily; without having to manually piece back together a correct dependency list. For example, you want to get OlafenwaMoses/ImageAI revision e76f87212a (published Oct. 15, 2020) working to classify an image. The install instructions no longer work:

# clone the repository
git clone https://github.com/OlafenwaMoses/ImageAI
cd ImageAI
git checkout e76f87212a6e53b6271f7a225e20ca3df9b1d18e

# download weights and example image
cp data-images/holo2.jpg ./examples/
wget -O examples/hololens-ex-60--loss-2.76.h5 https://github.com/OlafenwaMoses/ImageAI/releases/download/essential-v4/hololens-ex-60--loss-2.76.h5

# create a new virtual environment and install dependencies
python3.8 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt
python3 setup.py install

# try and classify an image (fails!)
cd examples/
python3 custom_detection.py
# Traceback (most recent call last):
#  ...
# tensorflow.python.util.tf_export.SymbolAlreadyExposedError: Symbol Zeros is already exposed as ().

Most likely this example worked when it was published on Oct. 15, 2020 - so use that date as the pin date, and install the dependencies using StableBuild instead.

cd ImageAI

# create a new virtual environment
python3.8 -m venv .venv-sb
source .venv-sb/bin/activate

# install dependencies, but pin on 2020-10-15
pip3 install \
    -i https://your-domain.pypimirror.stablebuild.com/2020-10-15 \
    -r requirements.txt
python3 setup.py install
    
# try and classify an image (succeeds!)
cd examples/
python3 custom_detection.py
# hololens  :  87.66432404518127  :  [23, 45, 90, 79]
# hololens  :  89.25175070762634  :  [191, 64, 243, 93]
# hololens  :  64.49641585350037  :  [437, 76, 514, 127]
# hololens  :  91.78624749183655  :  [380, 109, 423, 134]

(Tested on Ubuntu 20.04, using Python 3.8, on x86 - there were no arm64 wheels for TensorFlow in PyPi in October 2020 yet)

Tips & tricks

Using StableBuild with Poetry

You can use StableBuild with Poetry. Add to your pyproject.toml:

[[tool.poetry.source]]
name = "stablebuild"
url = "https://your-domain.pypimirror.stablebuild.com/2023-11-18/"

And all packages will be fetched through StableBuild, rather than PyPi.

Do you cache all packages?

No. We cache packages:

  1. When a file is requested for the first time. Subsequent requests will be served from the StableBuild cache.

  2. When we detect that a file, version or package is deleted from PyPI. These are not immediately deleted from PyPI's CDN, so we can cache them when we detect the deletion.

We do have full copies of the index for every day (which holds which packages and versions were in the registry). Together with caching deleted files this gives 100% coverage of all packages across all cached dates.

Don't want to pin on a date, but still want to cache packages?

If you don't want to pin on a specific date - e.g. because you use Poetry - but still want to automatically cache packages (in case packages are deleted), you can use live as a date. This will query the current PyPI index for every request; but will serve any files from cache (or add to cache if the file is not present yet). For example, in your pyproject.toml file:

[[tool.poetry.source]]
name = "stablebuild"
url = "https://your-domain.pypimirror.stablebuild.com/live/"

Other package registries

We currently only mirror PyPI. You can most likely use the File mirror to pin alternative package registries; as long as the package registry uses relative URLs for downloading its packages.

For example: you want to pin the Nvidia package registry (https://pypi.ngc.nvidia.com). Looking at the source of https://pypi.ngc.nvidia.com/onnx-graphsurgeon this indeed uses relative URLs for its packages:

Now, you just prefix any calls to the Nvidia registry with your file mirror prefix. F.e.:

pip3 install \
    -i https://your-prefix.pypimirror.stablebuild.com/2024-01-02/ \
    --extra-index-url=https://your-prefix.httpcache.stablebuild.com/nvidia-python-2024-01-02/https://pypi.ngc.nvidia.com \
    onnx-graphsurgeon

After running this both the package index for onnx-graphsurgeon and the installed version are cached - so here you'll always get onnx-graphsurgeon 0.3.27 back:

This also works for private registries. Just authenticate like you always do, and we'll forward the credentials on the first request. See File Mirror > Authentication / HTTP Headers.

If you have alternative (public) registries that you'd like to have mirrored - and the file mirror does not work for you - then shoot us an email at support@stablebuild.com and we'll consider adding support.

Last updated