Quantcast
Channel: Softology's Blog
Viewing all articles
Browse latest Browse all 10

A plea to all Python developers

$
0
0

EVERY PYTHON DEVELOPER OUT THERE! ALL OF YOU! PLEASE ADD EXACT VERSION NUMBERS TO YOUR REQUIREMENTS.TXT FILES!

Did I convince you? Are you adding in version numbers right now? If not, read on.

It has been a while since I was annoyed enough to add a new blog post in the annoyances catagory, but this is it.

Python. A constant love/hate relationship with me. Love for the most part as it allows me to add many awesome new machine learning systems to Visions of Chaos. The hate (for now) comes from version hell with packages. This is more a developer issue (all of us) and not a problem with Python itself.

When most developers create a new Python program/script/system they usually provide a requirements.txt file listing the Python packages their code needs to run. Packages are like extra libraries of code that give the script more functionality. They allow the Python code to have more commands and they make it easier for devs to code. These packages are installed with the Python pip command.

Here are a few lines from a typical requirements.txt file

gradio==3.33.1
markdown
numpy
pandas
Pillow>=9.5.0

The first line specifies an exact version number. That means that gradio 3.33.1 will be installed. This is good.

The next 3 lines do not specify any version numbers. This is bad. By default when a version nuimber is not specified the latest available version is used. If the same script and requirements are being used soon after the script is released then this is probably not an issue as the developer most likely used the same current versions. The problem arises as more time elapses between the release date and the user install date. numpy here is a good example. numpy has deprecated (made obsolete and unsuported) many commands and syntax over the versions. If pip installs the latest version every time, the chances are that a new version is going to break existing code. When this happens the poor end user (or me) has to go and try and work out which library broke and how to fix it (if possible).

The last line specifies a >= version number. This is just as bad as no version. This also shows an issue I had only recently (one of the reasons I wrote this blog post). Pillow now has a v10 release that breaks some of the v9.5 code. If the author had specified 9.5.0 as an exact version then there would be no problems. Pillow could advance to v136.56.3 and it would not matter as the script in question would still know to install v9.5.0.

When I first started adding machine learning systems to Visions of Chaos I quickly encountered version hell. Firstly if you are going to add a lot of different Python scripts you are going to run into version conflicts. Some scripts need v1 of a certain package, some need v2. To get around this you can use Python environments. Environments keep a certain set of packages and versions isolated from others. When you want to run a certain script, you activate its environment first so you know you have the right packages. Within the environments in Visions of Chaos I always specify exact version numbers for the packages. Life was good, back to work on more interesting things.

No such luck. Now we really get to the annoyance. Unless EVERY developer out there specifies exact version numbers in their required packages lists any updates could cause version hell.

Visions of Chaos supports using the GPU for calculations. Without the GPU support these machine learning systems run orders of magnitude slower on the CPU. Because of different versions of pytorch with GPU support, most devs do not include GPU supported pytorch with their requirements. Makes sense to avoid lots of “it doesn’t install for me” complaints. I know what specific version of pytorch Visions of Chaos uses so what I do is, at the end of any environment setup, I will uninstall any existing CPU pytorch versions and install the GPU version I know works.

This all worked smoothly until a week ago. I had reports a lot of the modes in Visions of Chaos were not working. Scripts that ran happily for months would fail when new users installed them (great first impression for a new user to find a lot of the features do not work). Time to test. I reset the environments in question and ran the scripts. Sure enough the same errors.

What happened this time was Python package requirements without version numbers being updated outside my control.

Firstly it was pytorch. My environment setups usually end with this

pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2+cu118 --index-url https://download.pytorch.org/whl/cu118

That gets rid of any auto-installed CPU pytorch and installs a versioned GPU version.

BUT, when pytorch installs it also installs a bunch of its own pre-requisite packages. And when it does this it does NOT specify version numbers. So even though every line and package I install has exact versions, a dependancy from pytorch does not and that causes my scripts to fail. pytorch updated to the lastest typing_extensions package that caused script errors.

Same thing happened with the latest Pillow v10 release around the same time. Changes to v10 caused problems with v9.5.0.

Both of those issues could be fixed by adding these next lines to the end of the environment setups.

pip uninstall -y typing_extensions
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts typing_extensions==4.7.1
pip uninstall -y Pillow
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts Pillow==9.5.0

But that is only a temp fix. Any day now a non versioned requirement of a package I install could cause this same madness all over again. I have spent 5 days now tediously reinstalling environments and debugging and fixing code, all because someone somewhere did not spend 2 minutes to put version numbers into their requirements.txt.

If it was up to me I would change pip to enforce that a version must be specified. No version, pip errors out with “You didn’t specify a version you bozo! Don’t you know how much of a hassle this can cause!” If each package specifies a version it installs fine.

All a dev has to do is before uploading their working new script is run a quick pip list command to show the packages and versions. Then they just copy those versions into their requirments.txt file. If every dev did this (and Python forced them to) this version hell would be fixed (maybe not a 100% fix, but much better than what we have now). Maybe an enforced law of version numbers is needed?

Maybe this post can also help explain to users who just see Visions of Chaos “not work” why it happens and why it is outside my control.

Jason.


Viewing all articles
Browse latest Browse all 10

Latest Images

Trending Articles



Latest Images