PyPI mirror
Your Python applications or services will have dependencies, most likely installed through pip (pulling from PyPi, the Python package registry). For example in a Dockerfile:
When you aim for reliable and stable builds this is problematic.
It's impossible to pin your dependency list. Even though you pin to the exact onnx 1.14.0 version above, onnx's dependencies specify:
So when you install onnx you'll get any numpy version, and any protobuf version as long as it's >=3.20.2. If a breaking change is introduced in any of these packages our application breaks. And, the exact versions you'll get are dependent on when you build the container. Building the same container on your developer machine and in production, at the same time, can yield wildly different package lists - depending on your build cache.
There are package managers (like Poetry) that create lock files alongside your requirements.txt (and you should use them!). They still can cause issues:
If you add a new dependency later on. Your lock file was accurate when you started a new project, but when adding a new dependency your package manager needs to recalculate the dependency graph - which can lead to unwanted upgrades of sub-dependencies (this even happens with
poetry lock --no-update
).When you add an older version of a dependency. Your package manager will resolve the dependency tree using the Python package registry of today (and not when the dependency was created), unaware that new versions of a sub-dependency (e.g. protobuf above) might break the dependency.
If you're not starting a new project, for example because you found an interesting tutorial or example on the internet (which most likely doesn't have a lock file). Your package manager again uses the Python package registry of today, and happily installs the latest version of sub-dependencies - which most likely don't work with the old application.
Package versions can and will be deleted from the Python package registry. Even if you freeze your complete list of dependencies and subdependencies this'll break your builds. Even large companies do this, e.g. Google has deleted versions of jaxlib in the past.
StableBuild solves all of these problems by creating a daily copy of the complete PyPi registry (the most popular Python package registry). You can thus pin your package list to a specific date, and this will return the exact same packages, regardless whether packages were updated or removed upstream. Example:
This will always install the exact same package list:
When you want to add a new dependency later on, you can use the same pin date - and the dependency tree will be resolved using the registry of that date, keeping the exact same version of your sub-dependencies.
We have full daily copies of the PyPi registry indexes from Sept. 21, 2023. If you pin to a date before this we'll dynamically calculate what the registry looked like on the date. This mostly works, but you might see small differences (e.g. we can't recover deleted packages).
That's it. Your Python package list is now stable and reliable. 🎉
Getting older Python examples working
Most likely this example worked when it was published on Oct. 15, 2020 - so use that date as the pin date, and install the dependencies using StableBuild instead.
(Tested on Ubuntu 20.04, using Python 3.8, on x86 - there were no arm64 wheels for TensorFlow in PyPi in October 2020 yet)
Tips & tricks
Using StableBuild with Poetry
You can use StableBuild with Poetry. Add to your pyproject.toml
:
And all packages will be fetched through StableBuild, rather than PyPi.
Do you cache all packages?
No. We cache packages:
When a file is requested for the first time. Subsequent requests will be served from the StableBuild cache.
When we detect that a file, version or package is deleted from PyPI. These are not immediately deleted from PyPI's CDN, so we can cache them when we detect the deletion.
We do have full copies of the index for every day (which holds which packages and versions were in the registry). Together with caching deleted files this gives 100% coverage of all packages across all cached dates.
Don't want to pin on a date, but still want to cache packages?
If you don't want to pin on a specific date - e.g. because you use Poetry - but still want to automatically cache packages (in case packages are deleted), you can use live
as a date. This will query the current PyPI index for every request; but will serve any files from cache (or add to cache if the file is not present yet). For example, in your pyproject.toml
file:
Other package registries
Now, you just prefix any calls to the Nvidia registry with your file mirror prefix. F.e.:
After running this both the package index for onnx-graphsurgeon
and the installed version are cached - so here you'll always get onnx-graphsurgeon 0.3.27 back:
Last updated