My own private PyPI

PyPI, formerly the CheeseShop is awesome. It’s a central repository of python packages. Knowing you can just do a pip install foo, and it looks on pypi for a package named foo is superb. Using pip requirements files, or setuptools install_requires means you can install all the packages you need, really simply.

And, the nice thing about pip is that it won’t bother downloading a package you already have installed, subject to version requirements, unless you specifically force it to. This is better than installing using pip install -e <scm>+https://... from a mercurial or git repository. This is a good reason to have published version numbers.

However, when installing into a new virtualenv, it still may take some time to download all of the packages, and not everything I do can be put onto pypi: quite a lot of my work is confidential and copyrighted by my employer. So, there is quite a lot of value to me to be able to have a local cache of packages.

You could use a shared (between all virtualenvs) --build directory, but the point of virtualenv is that every environment is isolated. So, a better option is a local cache server. And for publishing private packages, a server is required for this too. Being able to use the same workflow for publishing a private package as an open source package is essential.

Because we deploy using packages, our private package server is located outside of our office network. We need to be able to install packages from it on our production servers. However, this negates the other advantage of a pypi cache. It does mean we control all of the required infrastructure required to install: no more “We can’t deploy because github is down.”

So, the ideal situation is to actually have two levels of server: our private package server, and then a local cache server on each developer’s machine. You could also have a single cache server in the local network, or perhaps three levels. I’m not sure how much of a performance hit not having the cache on the local machine is.

To do this, you need two things. Your local cache needs to be able to use an upstream cache (no dicking around with /etc/hosts please), and your private server needs to be able to provide data to this.

The two tools I have been using handle neither of these. pypicache does not handle upstream caching, however this was easy to patch. My fork handles upstream caching, plus uses setuptools, enabling it to install it’s own dependencies.

localshop, however, will not work as an upstream cache, at least with pypicache, which uses some other APIs than those used by pip. However, it does have nice security features, and to move away from it would require me to extract the package data out. pypicache works to a certain extent with itself as an upstream cache, until you try to use it’s ‘requirements.txt caching’ feature. Which I tried to tonight.

Oh well.

Python deployment using fabric and pip

I’ve spent a not insignificant amount of time working on a deployment function for within my fabfile.py (the configuration file used by Fabric). It’s well worth the investment, as being able to deploy with a single command (potentially to many servers) is not only faster, but much less prone to human error.

Currently, I’m using Mercurial as my source control. I’m also using it for deployment, but I’d like to get away from that.

My deployment process looks something like this:

  1. Ensure the local repository has no uncommitted changes.
  2. Ensure the requirements.txt file is exactly the same as the output from pip freeze.
  3. Copy our public key to the remote server, for the user www-data, if it is not already installed there.
  4. Create a virtualenv in the desired location on the server, if there is not one already there.
  5. Ensure mercurial is installed on the server.
  6. Push the local repository to the remote server. This will include any subrepositories. I do a little bit of fancy magic to ensure the remote subrepositories exist.
  7. Update the remote server’s repository to the same revision as we are at locally. This means we don’t necessarily need to always deploy to tip.
  8. Install the dependencies on the remote server.
  9. Run some django management commands to ensure everything is setup correctly.
    • collect static files
    • sync the database
    • run migrations
    • ensure permissions are correct
    • compress static files
  10. Restart the various services that need to be restarted.

This process is based around requirements files for a very good reason. pip is very good at recognising which packages are already installed, and not reinstalling them if the version requirements are met. I use pip freeze > requirements.txt to ensure that what will be deployed matches exactly with what I have been developing (and testing) against.

However, this process has some issues.

  • Files must be committed to SCM before they can be deployed. This is fine for deployment to production, but is annoying for deploying to test servers. I have plenty of commits that turn on some extra debugging, and then a commit or two later, I turn it off.
  • I have some packages that I have installed locally using pip install -e /path/to/package. To deploy these, I need to:
    1. Uninstall the editable installation.
    2. Package up a new version of the app.
    3. Push the package to my package repository (I use localshop).
    4. Install the package from the package repository.
    5. Run pip freeze > requirements.txt.
    6. Commit the changes.
    7. Deploy to the test server.
  • Then, I usually need to develop further, so I reinstall using pip install -e ....

Today, I finally got around to spending some time looking at how pip can help improve this workflow.

With pip==1.3.1, we have a command that was not in pip==1.1, which was what I had been using until now. pip bundle.

My ‘deploy-to-development/test’ process now looks something like:

  1. Get a list of packages installed as editable: pip list -e
  2. Create a bundle, without dependencies, of these packages.
  3. Get a list of all packages, other than those installed as editable: pip freeze | grep -v "^-e".
  4. Ensure the server is set up (virtualenv, etc)
  5. Push the local repository to the remote server.
  6. Upload the bundle and requirements files.
  7. Install from the requirements file on the server.
  8. Force install from the bundle file on the server, without dependencies.
  9. Repeat the post-installation stuff from above.

Some of this I’m never going to be able to avoid: ensuring we have the virtualenv, and the post-installation stuff. Migrations gotta migrate. However, I would like to move away from the pushing of the local repository.

My plan: turn my project into a package (complete with setup.py), so that it becomes just another entry in the requirements file. It will be editable, which means it will be bundled up for deployment.

However, it will mean I can get away from having the nested repositories that I currently have. Ultimately, I plan to be able to:

  1. Build a bundle of editable packages.
  2. Create a requirements file of non-editable packages.
  3. Upload both of these files to the server.
  4. Install the requirements.
  5. Install the bundle.
  6. Run the post installation tasks.

That would be bliss.