Packages and Packaging
Packages, Modules, Imports, Oh My!
Before we get started on making your own package – let’s remind ourselves about packages and modules, and importing.
Modules
A python “module” is a single namespace, with a collection of values:
functions
constants
class definitions
really any old value
A module usually corresponds to a single file: something.py
Packages
A “package” is essentially a module, except it can have other modules – and indeed other packages – inside it.
A package usually corresponds to a directory with a file in it called __init__.py
and any number of python files or other package directories:
a_package
__init__.py
module_a.py
a_sub_package
__init__.py
module_b.py
The __init__.py
can be totally empty – or it can have arbitrary python code in it. The code will be run when the package is imported – just like a module.
Modules inside packages are not automatically imported. So, with the above structure:
import a_package
Will run the code in a_package/__init__.py
. Any names defined in the __init__.py
will be available in:
a_package.a_name
But:
a_package.module_a
Will not exist. To get submodules, you need to explicitly import them like so:
import a_package.module_a
More on Importing
You usually import a module like this:
import something
Or:
from something import something_else
Or a few names from a package:
from something import (name_1, name_2, name_3, x, y)
You also can optionally rename stuff as you import it:
import numpy as np
This is a common pattern for using large packages – maybe with long names – and not having to type a lot.
import *
from something import *
Means: “import all the names in the module, “something”.
You really don’t want to do that! It is an old pattern that is now an anti-pattern.
But if you do encounter it, it doesn’t actually import all the names – it imports the ones defined in the module’s __all__
variable.
__all__
is a list of names that you want import *
to import. So the module author can control it, and not accidentally override built-ins or bring a lot of extraneous names into your namespace.
But really:
Do NOT use import *
Relative Imports
Relative imports were added with PEP 328: https://www.python.org/dev/peps/pep-0328/
The final version is described here: https://www.python.org/dev/peps/pep-0328/#guido-s-decision
This gets confusing! There is a good discussion on Stack Overflow here: Relative Imports for the Billionth Time
Relative imports allow you to refer to other modules relative to where the existing module is in the package hierarchy, rather than in the entire python module namespace. For instance, with the following package structure:
package/
__init__.py
subpackage1/
__init__.py
moduleX.py
moduleY.py
subpackage2/
__init__.py
moduleZ.py
moduleA.py
You can write in moduleX.py
:
from .moduleY import spam
from . import moduleY
from ..subpackage1 import moduleY
from ..subpackage2.moduleZ import eggs
from ..moduleA import foo
from ...package import bar
from ...sys import path
This is similar to command line shells where:
“.” means “the current package”
“..” means “the package above this one”
Note that you have to use the from
form of import when using relative imports.
Caveats
You can only use relative imports from within a package.
You can not use relative imports from the interpreter.
You can not use relative imports from a top-level script. (i.e. if
__name__
is set to__main__
. So the same python file with relative imports can work if it’s imported, but not if it’s run as a script.)
The alternative is to always use absolute imports:
from package.subpackage import moduleX
from package.moduleA import foo
Advantages of Relative Imports
The package does not have to be installed.
You can move things around and not much has to change.
Advantages of Absolute Imports
Explicit is better than implicit.
Imports are the same regardless of where you put the package.
Imports are the same in package code, command line, tests, scripts, etc.
There is debate about which is the “one way to do it” – a bit unpythonic, but you’ll need to make your own decision.
sys.modules
sys.modules
is simply a dictionary that stores all the already imported modules. The keys are the module names, and the values are the module objects themselves.
Note
Remember that everything in Python is an object, including modules. So they can be stored in lists and dict, assigned names, even passed to functions, just like any other object. They are not often used that way, but they can be.
In [3]: import sys
In [4]: type(sys.modules)
Out[4]: dict
In [6]: sys.modules['textwrap']
Out[6]: <module 'textwrap' from '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/textwrap.py'>
In [10]: [var for var in vars(sys.modules['textwrap']) if var.startswith("__")]
Out[10]:
['__spec__',
'__package__',
'__loader__',
'__doc__',
'__cached__',
'__name__',
'__all__',
'__file__',
'__builtins__']
You can access the module through the sys.modules
dict:
In [12]: sys.modules['textwrap'].__file__
Out[12]: '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/textwrap.py'
Which is the same as:
In [13]: import textwrap
In [14]: textwrap.__file__
Out[14]: '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/textwrap.py'
In [15]: type(textwrap)
Out[15]: module
In [16]: textwrap is sys.modules['textwrap']
Out[16]: True
So, more or less, when you import a module, the interpreter:
Looks to see if the module is already in
sys.modules
.If it is, it binds a name to the existing module in the current module’s namespace.
If it isn’t:
A module object is created
The code in the file is run
The module is added to sys.modules
The module is added to the current namespace
Implications of Module Import Process
The code in a module only runs once per program run.
Importing a module again is cheap and fast.
Every place your code imports a module it gets the same object
You can use this to share “global” state where you want to.
If you change the code in a module while the program is running – the change will not show up, even if re-imported.
That’s what
importlib.reload()
is for.
The module search path
The interpreter keeps a list in sys.path
of all the places that it looks for modules or packages when you do an import:
import sys
for p in sys.path:
print(p)
You can manipulate that list to add or remove paths to let python find modules in a new place.
Every module has a __file__
name that points to the path it lives in. This lets you add paths relative to where you are, etc.
Note
It’s usually better to use your package manager’s “develop” mode (e.g. pip install -e
) instead of messing with sys.path
. See below for examples.
Note
One “gotcha” in Python is “name shadowing”. The interpreter automatically adds the “current working directory” to sys.path
. This means you can start the interpreter and just import something
to work with your code. But if you happen to have a python file, or package, in your current working directory that’s the same as an installed package, then it will get imported instead, which can lead to some odd errors. If you are getting confusing errors on import then check for python modules in your current working directory that may match an installed package.
Reloading
Once loaded, a module stays loaded.
If you import it again – usually in another module – it will simply use the version already there rather than re-running the code.
And you can access all the already loaded modules from sys.modules
.
In [4]: import sys
In [5]: sys.modules.keys()
Out[5]: dict_keys(['builtins', 'sys', '_frozen_importlib', '_imp', '_warnings', '_thread', '_weakref', '_frozen_importlib_external', '_io', 'marshal', 'posix', 'zipimport', 'encodings', 'codecs', '_codecs'
There is a lot there already!
There’s no reason to, but you could import an already imported module like so:
In [10]: math = sys.modules['math']
In [11]: math.sin(math.pi)
Out[11]: 1.2246467991473532e-16
In [12]: math.sin(math.pi / 2)
Out[12]: 1.0
Python Distributions
So far, we’ve used the Python from python.org. It works great, and supports a lots of packages via pip.
But there are also a few “curated” distributions. These provide python and a package management system for hard-to-build packages.
These are Widely used by the scipy community:
Conda has seen a LOT of growth in the last few years. It’s based on the open-source conda packaging system, and provides both a commercial curated set of packages, and a community-developed collection of packages known as conda-forge:
If you are doing data science or scientific development then I recommend you take a look at Anaconda, conda and conda-forge.
Installing Packages
Every Python installation has its own stdlib and site-packages
folder. site-packages
is the default place for third-party packages.
From Source
python setup.py install
– though this is heading towards deprecationpython -m build .
– the newer way for use with the newerpyproject.toml
filesWith the system installer (apt-get, yum, dnf, etc.)
From Binaries
Binary wheels –
pip
should find appropriate binary wheels if they are there
A Bit of History
In the beginning, there was the distutils
:
But distutils
was missing some key features:
package versioning
package discovery
auto-install
And then came PyPI
– the Python Package Index.
And then came setuptools
with easy_install.
But that wasn’t well maintained so easy_install disappeared.
Then there was pip
which replaced running setup.py
and easy_install
.
pip
is still there but now there is also poetry and hatch and uv and a few others.
You can’t really go wrong with pip+setuptools but you should explore other tools like poetry and uv if you want to take advantage of their additional features or workflows.
Installing Packages
Actually, it’s still a bit of a mess. It’s getting better, and the mess is almost cleaned up.
To build packages: setuptools
Most folks use setuptools for everything, though poetry and uv are making headways.
To install packages: pip
pip is basically the only package installer that you need. It comes with Python, usually, and almost always does the right thing.
For binary packages: wheels
You don’t really need to know about wheels except to say the following. Many Python packages incorporate code written in C or C++ or Rust. Historically, when you ran “pip install” for one of these packages then pip would build the package from source. This meant you needed build tools on your host which you might not have. Python “wheels” are pre-compiled binaries created by the package maintainer for your specific operating system. This way you do not need build tools installed on your system to use these packages and they install much, much more quickly.
Final Recommendations
First try: pip install
If that doesn’t work, then read the docs of the package you want to install and do what they say.
virtualenv
virtualenv
is a tool to create isolated Python environments.
It is very useful for developing multiple applications and for keeping your system Python from being polluted with lots of packages.
See: http://www.virtualenv.org/en/latest/index.html
You can find some additional notes here: Working with Virtualenv
NOTE: Conda also provides a similar isolated environment system.
Building Your Own Package
The term “package” is overloaded in Python. As defined above, it means a collection of Python modules. But it often is used to refer to not just the modules themselves, but the whole collection, with documentation and tests, bundled up and installable on other systems.
Here are the very basics of what you need to know to make your own package.
Why Build a Package?
There are a bunch of nifty tools that help you build, install and distribute packages.
Using a well structured, standard layout for your package makes it easy to use those tools.
Even if you never want to give anyone else your code, a well structured package eases development.
What is a Package?
A collection of modules
… and the documentation
… and the tests
… and any top-level scripts
… and any data files required
… and a way to build and install it…
Python Packaging Tools
setuptools
– for building and distributing packages
pip
– for installing packages
wheel
– for binary distributions
These are pretty much the standard now and very well maintained by The Python Packaging Authority: PaPA
This all continues to change quickly so see that site for up to date information.
Where do I go to figure this out?
The Python project maintains a really good guide which covers the packaging tools built in to Python.
Python Packaging User Guide: https://packaging.python.org/
There is a sample project here: https://github.com/pypa/sampleproject
You can use this as a template for your own packages. It covers the latest and greatest in Python packaging as supported by a standard Python installation, including the latest pyproject.toml
configuration.
Note
One confusion for folks new to this is that a LOT of the documentation (and tools) around packaging for Python assumes that you are writing a package that is generally useful, and you want to share it with others on PyPI. That is partly because all the people developing the tools and writing about them are doing just that. It’s also harder to distribute a package properly than to simply make one for internal use, so more tools and docs are needed. But it is still useful to make a package of your code if you aren’t going to distribute it, but you don’t need to do everything that is recommended.
Where do I put my custom code?
If you have a collection of your own code that you want to access for various projects, make a “package” out of it so you can manage it in one place and use it in other places. You do NOT need to put your code on PyPI.
Most people who write code to solve problems find that they have a collection of little scripts and utilities that they want to be able to use and reuse for various projects.
You have a few options for handling your code collection:
Keep your code in one place and copy and paste the functions you need into each new project. Do not do this! It is really not a good idea to simply copy and paste code around. You will end up with multiple versions scattered all over the place and you will regret it.
Put your code into a single directory and add it to the
PYTHONPATH
environment variable. Do not do this! ThePYTHONPATH
environment variable is shared by all installs of Python on your system so it will be used by your scripts and also other scripts that are not yours.Make a package.
A Python “package” is a collection of modules and scripts that you can install with pip
. People usually think of these as something carefully developed for a particular purpose and distributed to a wide audience. But you can also use this strategy yourself to distribute code for yourself.
But it’s a much easier process than it sounds! Let’s start on making our first package.
Basic Package Structure
package_name/
docs/
LICENSE.txt
CHANGELOG.md
README.md
pyproject.toml
package_name/
__init__.py
module1.py
module2.py
tests/
__init__.py
test_module1.py
test_module2.py
CHANGELOG.md
– A log of changes with each release, written in Markdown format. There are tools that will automatically generate this but they are beyond the scope of this guide.
LICENSE.txt
– The text of the license you choose. Do choose a license! It really does matter!
README.md
– A description of the package using Markdown format. You could also write it using ReST format but tools like GitHub primarily support Markdown.
Those are all the “metadata” critical if you are distributing to the world. They’re not so much for your own use.
pyproject.toml
– The configuration file for how to build and install your package.
bin/
– This is where you put top-level scripts. Some folks prefer scripts
. It doesn’t matter.
docs/
– Your documentation.
package_name/
– This is the main package. This is where the code goes. You should replace “package_name” with the name of your package. Be sure to make the package name unique such that it doesn’t reuse a built-in Python package name.
tests/
– Ths is where your unit tests should go. There are several options here that we’ll go over in a moment.
Where should I put my tests?
You have a few options for where to put your test code.
If your package and the test code are small and self contained then put the tests inside the package. This results in the tests being installed with the package so that they can be run after installation, like this:
$ pip install package_name
>> import package_name.tests
>> package_name.tests.runall()
Or:
$ pytest --pyargs package_name
On the other hand, if you have a lot of tests, and do not want the entire set of tests installed with the package, then you can keep it at the top level, as shown above.
The pytest
project has a discussion on this here: https://docs.pytest.org/en/stable/explanation/goodpractices.html
The advantage of keeping test code self-contained, and outside of your package, is that you can have a large test suite with sample data and whatever else, and it won’t bloat and complicate the installed package.
The advantage to keeping test code within the package is that your test code gets installed with the package, so users of the package can install the package and then run the tests to make sure the package is working.
Most people choose to have the tests separate from their code.
The pyproject.toml
File
Your pyproject.toml
file describes your package and tells setuptools how to build and install it. It’s the TOML syntax which may be new to you. Additionally, you may not see every project using it. However, it is a Python standard defined by PEP 621 so it is the future.
An example:
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "mypackage"
version = "3.0.0"
description = "My fancy project."
readme = "README.md"
requires-python = ">=3.8"
license = {file = "LICENSE.txt"}
authors = [{name = "A. Random Developer", email = "author@example.com" }]
maintainers = [{name = "A. Great Maintainer", email = "maintainer@example.com" }]
dependencies = [
"Django >= 5.0"
]
[project.optional-dependencies]
test = ["pytest"]
Building Your Package
With a pyproject.toml
file defined, pip
can do a lot:
Builds wheels for your project:
$ python3 -m pip wheel .
Install your package:
$ python3 -m pip install .
The dot at the end of the command means “this directory”. pip
will look in the current dir for the pyproject.toml
file.
Basically, rather than making a copy of your code and putting it into your project, you’re making a link to your code and telling your project to use it.
Note
setuptools can be used by itself to build and install packages. But over the years, pip has evolved to a more “modern” way of doing things. When you install from source with pip – it is using setuptools to do the work, but it changes things around, and installs things in a more modern, up to date, and compatible way. For much use, you won’t notice the difference, but setuptools still has some old crufty ways of doing things, so it’s better to use pip as a front end as much as possible.
wheels
Wheels are a binary format for packages.
See: http://wheel.readthedocs.org/en/latest/
It’s pretty simple. It’s essentially a zip archive of all the stuff that gets installed, i.e. put in site-packages
when your package is installed.
A wheel file can be pure python or it can continue binary code for compiled extensions.
Wheels are compatible with virtualenv.
As shown earlier, you can building a wheel using pip, like this:
$ python3 -m pip wheel .
When you are installing a package off of PyPI, you can use pip install packagename
and pip
will find wheels for Windows and macOS and “manylinux” or whatever operating system you’re running, assuming wheels are available for it. Or you can pip install --no-use-wheel
to avoid using wheels and build the package from source.
manylinux
There are a lot of Linux distributions out there. So, for a long time, there were not easily available binary wheels for Linux. How could you define a standard with all the Linux distros out there?
Enter “manylinux”. No one thinks you can support all Linux distros, but it was found that you could support many of the common ones by building on an older version and restricting system libraries. This approach worked well for Canopy and conda, so PyPi adopted a similar strategy with manylinux.
See: https://github.com/pypa/manylinux
So now there are binary wheels for Linux on PyPi.
The core scipy stack is a great example. You can now pip install numpy
on all three systems (Windows, macOS, and Linux) easily with pip.
PyPI
The Python Package Index: https://pypi.python.org/pypi
Sometimes called “Pie Pie”, sometimes called “Pie Pee Eye”.
You’ve all used this. Running pip install
searches it. Uploading your package to PyPI is beyond the scope of this document. For a tutorial on how to do so, you can follow this tutorial: https://realpython.com/pypi-publish-python-package/
Under Development
Working with your code in develop mode – also known as an “editable install” – is really really nice:
$ python -m pip install -e .
The e stands for “editable”. The “dot” is still required.
Installing in this way puts links into the Python installation to your code, so that your package is installed, but any changes to your source code will immediately take effect in the installation.
This way all your test code, and client code, etc, can all import your package the usual way with no sys.path
hacking or modifications to PYTHONPATH
.
It’s great to use it for anything more than a single file project.
Running Tests
It can be a good idea to set up your tests to be run from setup.py
So that you (or your users) can:
$ pip install ".[test]"
$ pytest
If you want to add default options to pytest you can add those to your pyproject.toml
file like this:
[tool.pytest.ini_options]
addopts = "--cov --cov-report html --cov-fail-under 95"
Handling Version Numbers
There is one key rule in software: never put the same information in more than one place!
With a Python package, you want this to return a version string:
import the_package
the_package.__version__
You might expect a string like this: 1.2.3
Using __version__
is not a requirement, but it is a very commonly used convention – use it!
But you also need to specify it in the pyproject.toml
:
[project]
name = "mypackage"
version = "3.0.0"
Not Good.
My Solution
Put the version in the pyproject.toml
, as shown above.
Then write a function that will get the version and put that into your program, like this:
import importlib.metadata
def version(package: str) -> str:
try:
return importlib.metadata.version(package)
except importlib.metadata.PackageNotFoundError:
return "0.0.0"
__version__ = version(__name__)
You can also have scripts that automatically update the version number in whatever places that it needs to. For example:
You can hook commitizen into your project with git hooks so that it enforces conventions and bumps your version number correctly in every location based on your release.
Semantic Versioning
Another note on version numbers.
The software development world, for the most part, has established a standard for what version numbers mean. This standard is known as semantic versioning. This is helpful to users, as they can know what to expect they upgrade.
In short, with a x.y.z version number:
x is the Major Version. It could mean changes in API, major features, etc. Changes in the major version are likely to be incompatible with previous versions.
y is the Minor Version. Some features were added features, etc, but they should be backwards compatible.
z is the “Patch” Version. This is for bug fixes, etc. that should be fully compatible.
Read all about it: http://semver.org/
There is a related versioning scheme that is appearing more often called Calendar Versioning where you bump the version based on when you released the software.
You can read about calendar versioning, too: https://calver.org/
Tools to Help
Tox is great for automating testing in Python. We won’t go into it here but it’s pretty popular.
Dealing with Data Files
Oftentimes a package will require some files that are not Python code. In that case, you need to make sure the files are included with the package some how.
With the pyproject.toml
file the easiest way to do that is with this syntax:
[tool.setuptools]
# If there are data files included in your packages that need to be installed, specify them here.
package-data = {"sample" = ["*.dat"]}
This is a dict with the keys being the package(s) you want to add data files to. This is required, as a single pyproject.toml
file can install more than one package. The value following the key is a list of filenames, relative to the package.
This is described more here: https://setuptools.pypa.io/en/stable/userguide/datafiles.html
Note
Debugging package building can be kind of tricky. If you install the package, and it doesn’t work, what went wrong?!? One approach that can help is to “build” the package, separately from installing it. pip provides a wheel command: pip wheel .
that builds your package in place. It will create a build
directory, and in there you can see your package as it will be deployed. So you can look there and see if your data files are getting included, and everything else about the package.
Now you’ll need to write your code to find that data file. You can do that by using the importlib.resources
package built-in to Python. See: https://docs.python.org/3/library/importlib.resources.html
import importlib.resources
file_path = importlib.resources.files(__name__) / "mydata.csv"
You can use the __name__
magic value to get the name of the current module or you can construct it by hand.
Command Line Scripts
If your scripts are Python files then the best way to make them accessible is to use “entry points”. Entry points can provide a number of functions, but one of them is to make console scripts. Here is how you would add it to your pyproject.toml
file:
[project.scripts]
example = "example:main"
What this does is tell setuptools to make a little wrapper program called “example” that will start up Python, and run the function called main
in the example
module.
Getting Started With a New Package
For anything but a single-file script (and maybe even then):
Create the basic package structure
Write a
pyproject.toml
filepip install -e .
Put some tests in the
test
directoryRun
pytest
from the project directory
LAB: A Small Example Package
Create a small package
package structure
pyproject.toml
pip install -e .
at least one working test
Here is a ridiculously simple and useless package to use as an example:
Or go straight to making a package of your mailroom project.