Dynamic content using jupyter notebooks#
Jupyter notebooks are a literate-programming format that allows text and runnable code
to be combined in a single document. They provide the ability to write documentation
pages that show the actual use of the virtual_ecosystem project along with outputs
and figures. They are also an invaluable tool for sharing design and troubleshooting
investigations. The Jupyter project provides many different
tools for working with notebooks, including the main jupyter program and a
browser-based notebook editor called jupyter-lab.
Running jupyter-lab#
The poetry virtual environment for virtual_ecosystem is already setup to
include jupyter and jupyter-lab, which is a browser-based application for editing
and running notebooks. As that virtual environment also has the virtual_ecosystem
package installed in development mode, a jupyter notebook running using this
environment will be able to import and use virtual_ecosystem code from the active
branch.
You can open jupyter-lab in a couple of ways. The simplest way is to use poetry run jupyter-lab from the terminal, but you can also open the notebook within VS Code use
the Jupyter extension within VS Code. For this option, you will need to make sure that
VS Code is using the right python environment. The information you will need is
produced from poetry:
% poetry env list --full-path
/Users/dorme/Library/Caches/pypoetry/virtualenvs/virtual-ecosystem-Laomc1u4-py3.10
/Users/dorme/Library/Caches/pypoetry/virtualenvs/virtual-ecosystem-Laomc1u4-py3.9 (Activated)
In VS Code, you then have to set the Python interpreter to the full path to the
currently active poetry virtual environment:
View > Command Palette
Type
interpreterand find ‘Python: Select Interpreter’Enter the full path from the
poetry env listoutput.
Jupyter kernel setup#
The jupyter system can be setup to run notebooks in a number of different languages
and even different environments of the same language. Each option is setup as a
kernel, which is basically a pointer to a particular programming environment or
virtual environment. Each notebook should specify which kernel is to be used when
executing any code, and we need to ensure two things.
The selected kernel needs to point to a virtual environment including the
virtual_ecosystempackage and dependencies, andthe kernel should be available consistently across supported Python versions, developer machines, GitHub runners used for testing and also within the ReadTheDocs build environment.
Fortunately, when poetry run or poetry shell are used, the jupyter kernels are
updated to set the python3 kernel to point to the active poetry virtual environment.
This ensures that Jupyter is invoked in the correct environment on all platforms. We can
check this by running the following, which shows the python3 kernel pointing to the
python3 kernel Virtual Ecosystem virtual environment: that path will vary between
machines but poetry will ensure that the link is set correctly.
% poetry run jupyter kernelspec list
Available kernels:
ir ../Jupyter/kernels/ir
python3 ../pypoetry/virtualenvs/virtual-ecosystem-In6MogPy-py3.11/share/jupyter/kernels/python3
Notebook formats#
The default jupyter notebook format is the IPython Notebook (.ipynb suffix). This
file uses the JSON format to store the text and code and a whole bunch of other
metadata. However, the .ipynb format is not great for use in version control. The
basic problem is that - although JSON files are text-based and are technically
human-readable:
they contain irrelevant metadata - such as the number of times the notebook has been run - that will generate unnecessary commits.
they can contain output binary data - such as images - that may also have arbitrary changes.
This really neat
summary presents the
problem, along with a discussion of tools (e.g. nbdime and nbmerge) that help manage
those changes in a more coherent way.
However, a simpler solution is to use plain text instead of JSON: we use notebooks
written in the plain text MyST Markdown format. The jupytext extension then allows
jupyter to load and run those files as notebooks. More broadly, jupytext is a really
powerful tool for managing the content of Jupyter notebooks, including using markdown
formats for notebooks.
Using jupytext#
The jupytext package works as an extension running within Jupyter Lab, adding some
commands to the jupyter-lab command palette, but also provides a command line tool
with some really useful features.
To be used with jupytext, MyST Markdown files need to include a YAML preamble at the
very top of the file. This is used to set document metadata about the Markdown variety
and also code execution data like the jupyter kernel. This is where the python3
kernel name is set.
---
jupytext:
cell_metadata_filter: -all
formats: md:myst
main_language: python
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.13.8
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
language_info:
codemirror_mode:
name: ipython
version: 3
file_extension: .py
mimetype: text/x-python
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.12
---
If you already have a simple Markdown file then the commands below will insert this YAML
header, but at present the language_info section either needs to be added manually or
by opening and saving the file in jupyter.
% jupytext --set-format md:myst simple.md
% jupytext --set-kernel python3 simple.md
There is a downside to using Markdown notebooks. The .ipynb format includes the
results of executing the notebook code, including Python code outputs and any graphics
created in the code. GitHub knows how to render those outputs, so the page you see on
GitHub includes the most recently committed code and graphics outputs. These outputs are
not stored in Myst Markdown notebooks, so you only see the text and input code on
GitHub.
In summary:
We only commit notebooks in MyST Markdown format
Notebooks should use the
python3kernel.GitHub will render the markdown and code cells correctly but none of the executed outputs will be shown.
However, the notebooks will be executed by the
sphinxdocumentation system, so fully rendered versions will be in the documentation website.You can develop notebook content locally using
jupyter-laband run it to get outputs. You can also runsphinxto see how a notebook is rendered in the documentation.The code in notebooks should not take a long time to run - these pages have to be built every time the documentation is built.
Notebook quality checking#
All Myst Markdown content in a notebook will be checked using markdownlint when the
file is committed to GitHub. In
addition, the following tools may be useful:
Using black with jupytext#
Although jupytext does not do Markdown validation, it does allow black to be run on
the code cells, so that the format of code in notebooks can be automatically formatted.
jupytext --pipe black my_markdown.md
Note that this does not format Python code that is simply included in a Markdown
cell - essentially text that is formatted as if it were Python code. It only formats
code within a Jupyter notebook {code-cell} or {code-block} section.