Creating new Virtual Ecosystem models#
The Virtual Ecosystem initially contains a set of models defining core components of
an ecosystem, examples include the abiotic, animal, plants and soil models.
However, the simulation is designed to be modular:
Different combinations of models can be configured for a particular simulation.
New models can be defined in order to extend the simulation or alter the implementation: examples of new functionality might be
freshwaterordisturbancemodels.
This page sets out the steps needed to add a new model to the Virtual Ecosystem and
ensure that it can be accessed by the core processes in the simulation.
Important
When a model is used in the Virtual Ecosystem, the code relies on naming conventions to
access the different model components used in the model and register these components so
that they can be easily found from within the code - see the
registry submodule for details.
You need to choose a unique model name that will be used to name the root model directory, submodules within the model and then two critical model components. The name will be used following two standard Python naming conventions:
Model directory and file names use snake case (lower case with underscores): e.g.
abioticorabiotic_simple.Class names use camel case (capitalised words with no spaces): e.g.
AbioticandAbioticSimple.
The critical names are the model subclass and configuration subclasses and the example below shows the required pattern.
abiotic_simple.abiotic_simple_model.AbioticSimpleModelabiotic_simple.model_config.AbioticSimpleConfiguration
The rest of this page assumes that you are creating a new freshwater model.
Create a new submodule folder#
Start by creating a new directory for your model within the models directory:
virtual_ecosystem/models/freshwater
You will then need to create the three files shown below within this folder:
The init file
virtual_ecosystem/models/freshwater/__init__.py. This is required to indicate to Python that the folder is a submodule within thevirtual_ecosystempackage, but we also use it to provide overview documentation of the model structure.The
virtual_ecosystem/models/freshwater/model_config.pysubmodule, providing theFreshwaterConfigurationclass that defines the settings needed to configure how the model runs.The
virtual_ecosystem/models/freshwater/freshwater_model.pysubmodule, providing the mainFreshwaterModelclass that implements the model itself.
It is very likely that you will also want to create additional code submodules within this directory to split out different parts of the module functionality and to keep code files organised and a manageable size.
The model __init__.py file#
This file is used to tell Python that the directory contains a package submodule. It
can be used to run code automatically when any component of the submodule is imported,
but in the Virtual Ecosystem, we only use the __init__.py to provide a brief overview
of the module as a docstring. It can be used to provide a short description of any
submodules and how they are used within the model. The submodule files should then have
their own docstring progviding more detail. These docstrings are automatically included
in the HTML documentation of the package.
A docstring should be formatted using block quotes, as below:
"""This is the freshwater model module. The module level docstring should contain a
short description of the overall model design and purpose, and link to key components
and how they interact.
""" # noqa: D204, D415
Model configuration#
The model configuration needs to define both model settings - such as paths to model
specific input files, method choices and the like - and model constants. These are
defined in the model_config.py as Pydantic
models, which are very close to
standard Python dataclasses but have built in support for validation and serialisation.
When the Virtual Ecosystem model runs using ve_run, the first thing that happens is
that specified configuration files are loaded and then validated using this
configuration models. This allows the model to detect bad configuration and provided
detailed error reports before any further processing.
Each Virtual Ecosystem model needs to provide a single root configuration model. This root class must have a couple of specific features to allow it to be identified when the simulation starts.
The root configuration class name must derive from the model name using the following pattern: the
abiotic_simplemodel would have theAbioticSimpleConfigurationroot configuration class. Basically, underscores are dropped and words are capitalised.The class must inherit from a shared root model class:
ModelConfigurationRoot. This is used to enforce some model settings:Instances of model configuration are frozen so they cannot be changed during a run.
Configuration models are strict about extra data: is unknown settings are provided when a configuration model instance is created, it fails.
The model_config.py file can then also contain additional configuration classes that
can be nested within the root configuration to define a tree of configuration settings.
For example, all existing models define a separate class to hold constants. Any
additional class must inherit from the
Configuration class, which again freezes
configuration model instances and makes them intolerant of extra data.
All of your configuration models and fields must have clear docstrings that describe
what the model and fields are. As an example, the new freshwater.model_config module
might look like this:
class FreshwaterConstants(Configuration):
"""Constants settings for the freshwater model."""
number_of_pools: int = 5
"""Number of pools to simulate."""
ashrae_model_a: float = 95
"""The A constant of the ASHRAE evaporation model."""
ashrae_model_b: float = Field(gt=0, default=37.4)
"""The B constant of the ASHRAE evaporation model."""
molar_mass_water: ClassVar[float] = 18.01528
"""The molar mass of water."""
class FreshwaterConfiguration(ModelConfigurationRoot):
pond_data_path: FILEPATH_PLACEHOLDER
"""Path to a CSV file containing pond data for simulation cells."""
constants: FreshwaterConstants = FreshwaterConstants()
"""The constants settings for the freshwater model."""
With these validation classes, an instance of the root model above can be easily created by reading data from an appropriate file format (‘de-serialised’). We use TOML for configuration files and so an instance of model above could be created from TOML like this:
[freshwater]
pond_data_path = '/path/to/freswater_pond_data.csv'
[freshwater.constants]
ashrae_model_a = 96
ashrae_model_b = 38
Similarly, a model instance can be exported to a file format (‘serialised’) to provide a record of the settings used in a particular model.
Defining constants#
The definition of ‘constant’ in the Virtual Ecosystem is basically a parameter of any kind that should be held constant throughout a simulation. Many of the parameters required in a Virtual Ecosystem simulation have been estimated from field data, The values may have uncertainty or may vary significantly between sites. For this reason, all parameters for your model should be included in your model configuration, to allow other users to experiment with the results of changing variables and to explore the sensitivity of model predictions to the configuration settings.
However, some variables are genuine constants, such as the molar mass of water in the
example above. The pydantic package has a few ways of fixing constants:
For integer values and strings, the
Literaltype can be used to specify the exact value to be used and then no other value will be accepted. For example,number_of_pools: Literal[5] = 5, would enforce a fixed number of pools.The
Literaltype cannot be used with floating point numbers, which is unfortunate since most parameters will be floats! You can write a custom field validator that will enforce the specified default value.Alternatively, you can make the constant field a class attribute using
ClassVar, as in the example above. Whenever the configuration model is used, it will always have this fixed value. Additionally, class attributes are not included when configuration models are dumped to file, so the constant field will not appear in the TOML version of the configuration. If users try to add it, it will be rejected. The class attributes do occur in the configuration documentation though!This is probably the cleanest way to set fixed constants, but you should clearly document which parameters in your configuration cannot be changed.
The example model below shows the various options in practice:
from pydantic import field_validator
from typing import ClassVar, Literal
from scipy import constants
from virtual_ecosystem.core.configuration import ModelConfigurationRoot, Configuration
class Example(Configuration):
"""An example configuration model."""
f1: ClassVar[float] = 12.3
"""A constant float set as a class attribute. This field does not appear in the TOML
representation of the model and cannot be changed."""
f2: Literal[3] = 3
"""A constant integer set using Literal. This field _does_ appear in the TOML
representation of the model but users cannot change the value."""
f3: float = constants.Boltzmann
"""The Bolzmann constant"""
f4: float = constants.angstrom
"""One angstrom in metres."""
@field_validator("f3", "f4", mode="after")
@classmethod
def enforce_constants(cls, value, context):
"""Custom validation to enforce constants in field f3 and f4."""
fname = context.field_name
constant_default = cls.model_fields[fname].default
if not value == constant_default:
raise ValueError(
f"The {fname} field can only take the constant value {constant_default}"
)
Validation#
The pydantic package provides a wide range of validation tools to enforce conditions on
the fields within the configuration models.
All pydantic fields must have a declared type - validation will fail if the input data does not match that type. So any attempt to set
ashrae_model_amust provide a float.The
Fieldclass provides additional built-in constraints on provided values. Each type supports different constraints, but in the example aboveField(gt=0, default=37.4)checks that the input value is greater than zero.In addition, you can add custom validators for fields or validators for the whole class.
You should be as precise as you can about the validation of your model settings: they provide very strong guidance to users about how to configure a simulation. When values fail validation, we are able to use the great error reporting built in to pydantic to provide detailed information about conguration failures.
Defaults#
The example above provides defaults for all values and you should do the same. This is
partly to give users some kind of a sense check of what expected values look like, but
also because it is easy to export example configurations as templates when all fields
have defaults. Defaults can either be provided by assignment - as with
ashrae_model_a: float = 95 - or be provided using Field(default=...).
When a model instance is created from configuration files (de-serialised), the defaults will be used to fill in any missing settings. This is extremely useful if a user wants to be able to just switch one value in setting without having a complete configuration file.
Paths in configuration classes#
You may want your configuration file to point to resources stored in an external file, as in the example above. This should not be used to load array data that uses the core data axes, but can be used to load model specific initialisation data.
As an example, the plants model uses definitions of different plant functional types and
the initial plant cohort distributions. The most convenient way to provide these for the
model initialisation is in CSV files containing a data frame. Since this data is not
needed by the other models, they are passed to the model using the
pft_definitions_path and cohort_data_path configuration options.
There are some specific requirements for including paths in configuration models:
The Virtual Ecosystem allows users to provide multiple configuration files - this allows users to build up a library of settings for different models and then can specify combination of different configurations.
These files are compiled into a single set of configuration data before validation. However, if those configuration files provide relative paths to data files, then the relative paths may well break when the data is compiled. For this reason, the compilation process resolves all paths in a given configuration file to absolute paths before compiling the data. Although settings may be typed as paths in a configuration class, the compilation step comes before validation and there is no type information available. For this reason, you must use the
_pathsuffix on configuration options that provide file paths. This naming convention allows the Virtual Ecosystem configuration to manage file paths to ensure that file paths are preserved when configuration files are compiled.File paths should obviously point to existing files, but that makes it hard to set meaningful default values for use in generating example or template configurations. The custom
FILEPATH_PLACEHOLDERtype used in the example above helps solve this issue. Under the hood, this type uses the pydanticFilePath, which will fail validation if the input path does not exist. It also sets the default values<PLACEHOLDER>, but has extended validation to specifically check that this placeholder default has not been left in configuration file in use.
Defining the new model class#
The model file will define a new subclass of the
BaseModel class.
Required package imports#
You may of course need to import other packages or package members to support your model
code, but the following imports are typically needed to create a new BaseModel
subclass.
# The BaseModel.from_config factory method returns an instance of the class, and
# annotations is required to allow typing to understand this return value.
from __future__ import annotations
# To support the kwargs argument to BaseModel.__init__
from typing import Any
# Data in the Virtual Ecosystem is stored as xarray.DataArrays and array calculations
# typically use numpy.
import numpy as np
import xarray
from pint import Quantity
# These are the main imports required to set up a BaseModel instance:
# - the BaseModel itself
# - a Config , used to configure a BaseModel instance.
# - the load_constants helper function to configure model constants.
# - the Data class, used as a central data store within the simulation
# - an custom exception to cover model initialisation failure
# - the global LOGGER, used to report information to users.
from virtual_ecosystem.core.base_model import BaseModel
from virtual_ecosystem.core.data import Data
from virtual_ecosystem.core.exceptions import InitialisationError
from virtual_ecosystem.core.logger import LOGGER
# You will likely also have a set of imports of model specific code such as constants
# classes and other classes and functions. For example:
from virtual_ecosystem.models.freshwater.model_config import FreshwaterConstants
from virtual_ecosystem.models.freshwater.streamflow import calculate_streamflow
Defining the new class and class attributes#
Now create a new class that derives from the
BaseModel. This base class requires that you
also set a number of class attributes: these are bits of information about the model
that will be the same for every time the model is used. These values are set as class
attributes by providing them as arguments to the class signature. You will end up with
something like the following:
class FreshWaterModel(
BaseModel,
model_name="freshwater",
model_update_bounds=("1 day", "1 month"),
vars_required_for_init=("temperature",),
vars_populated_by_init=("pond_temperature"),
vars_required_for_update=(
"air_temperature",
"relative_humidity",
"atmospheric_pressure",
"vapour_pressure_deficit",
"precipitation",
),
vars_populated_by_first_update=("average_P_concentration",),
vars_updated=("average_P_concentration",),
):
"""Docstring describing model.
Args:
Describe the __init__ arguments here (see below)
"""
The model_name attribute provides a
short lower case name that is used throughout the simulation: for example, it is used to
identify the parts of the configuration data that apply to the model. The name must
match the chosen submodule name for the model, so the module
virtual_ecosystem.models.freshwater must use freshwater as the model name.
The model_update_bounds
attribute sets two time intervals that define a lower and upper bound
on the update frequency that can reasonably be used with a model. Models updated
more often than the lower bound may fail to capture transient dynamics and models
updated more slowly than the upper bound may fail to capture important temporal
patterns. Each attribute is a string that can be parsed by pint.Quantity
into a time period
Data requirements#
The remaining class attributes all start with vars_ and are used to define sets of
variables that will be shared across models in a central data store (a Data object)
for the simulation. The variables in this central data store are all arrays of data and
are structured across the core data axes in
the simulation.
New variables
If your model requires new variables - either to be loaded from initial data or that
your model writes to the Data object - you must add the variable details to the
data_variables.toml file.
These attributes define which variables the model reads from and writes to the central data store and when that happens during the model run. There are two main phases to running models within the simulation:
model initialisation, which sets up any core model structures and data once at the start of the simulation.
model updates, which run at every time step and modify the model structure and data throughout the simulation.
The var_ attributes define which variables are needed at both of these stages, and are
critical to defining the model data dependencies and the sequence in which models can
run.
The first two variables set data requirements during model initialisation:
The
vars_required_for_initattribute sets which variables must be loaded into theDataobject before your model can be initialised. These must either be:included in the configured initial data that is loaded when the simulation starts, or
be populated by a model that initialises earlier in the model sequence.
The
vars_populated_by_initattribute sets which variables are written to theDataobject when your model is initialised. These variables are then available for models later in the sequence.
The remaining three variables set data requirements during each update:
The
vars_required_for_updateattribute defines the data that must be in theDataobject for the model to be able to update. These variables can be:provided in the initial data, often as time series of data that provides different values for each time step
populated during the initialisation of any of the models, or
populated during the first update of another model that updates before your model.
The
vars_populated_by_first_updateattribute defines the variables that your model writes to theDataobject when the model updates for the first time.The
vars_updatedattribute records which variables in theDataobject are altered when your model updates. This will typically include all variables invars_populated_by_first_updatebut your model may also alter the state of other variables in the simulation.
Model dependencies
The var_ attributes defined for your model are used to automatically detect model
dependencies and resolve the sequence in which the set of models included in a
simulation can run. For example, if your model requires variable A to be initialised
and that variable is provided during the initialisation of another model, this second
model must run first.
If a suitable order cannot be found, the simulation will stop and an error message will be provided informing on the specific issue.
Defining the model __init__ method#
The next step is to define the __init__ method for the class. This needs to do a few
things, in this order:
It must call the
__init__()method of theBaseModel()parent class, also known as the superclass:super().__init__(data, core_components, static)
Calling this method runs all of the shared core functionality across models, such as setting the update intervals and validating that the input data provides the required variables to run the model.
It should define any specific attributes of the new model class. For example, the configuration above defines a path to a CSV file of pond data, which needs to be provided to the models and the set of model constants. These should be added to the signature of the
__init__method, alongside the required parameters of the base class, and then stored as attributes of the instance.The method should then conditionally call the model
_setupmethod. This method is used to run any code that is used to populate the initial state of the model.The call must be conditional because it is possible to configure a model so that all of the model state, including the data generated by the
_setupmethod, is fixed by the initial inputs. In this case, the model should not run the setup step: this is indicated if the model_run_setupattribute isFalse.The
__init__method can also contain code that should be executed regardless of the static configuration. For example, some models have can configure additional data export and so__init__would then need to set up the exporter process even when the model is running in static mode.The
BaseModel()provides a basic__repr__to provide a simple text representation of a class object. This just prints the class name and a set of properties. You can add some or all of your custom model properties to the__reprproperty to include them in the representation.
You should end up with something like this:
def __init__(
self,
data: Data,
core_components: CoreComponents,
update_interval: pint.Quantity,
community_data: pandas.DataFrame,
constants: FreshwaterConstants,
static: bool = False,
):
# Call the __init__() method of the base class
super().__init__(data, core_components, static)
# Type and document attributes
self.community_data: pandas.DataFrame
"""A data frame containing pond community cohort data for each cell."""
self.constants: FreshwaterConstants
"""Constants for the model."""
# Conditionally run setup steps.
if self._run_setup:
self._setup(community_data=community_data, constants=constants)
# Save attribute names to be used by the __repr__
self._repr.append("pond_data_path")
The _setup method#
The _setup method typically contains the bulk of the code that needs to run to setup
the initial state of the model and populate the data variables listed in the
_vars_populated_by_init attribute. The signature of the function typically takes the
model specific arguments defined on the __init__ method and uses those values to
populate model attributes and calculate data values. It is typical for _setup to call
additional methods that you define on the class or functions from additional submodules.
Following the example above:
def _setup(
self, community_data: pandas.DataFrame, constants: FreshwaterConstants
) -> None:
"""Set up the freshwater model."""
self.community_data = community_data
self.constants = constants
# Populate a variable in the Data object using a user defined method
self.data["pond_temperature"] = calculate_pond_temperature(
data=self.data, constants=self.constants, time_index=0
)
The _update method#
The _update method must then be defined to calculate the changes in the model state
that occur at each time step. The function must have a time_index argument, which is
used by some models to iterate over data that follows a time series through a
simulation, such as climatic variables.
def update(self, time_index: int) -> None:
"""Function to update the freshwater model.
Args:
time_index: The index representing the current time step in the data object.
"""
# Recalculate the pond temperature based on the current conditions
self.data["pond_temperature"] = calculate_pond_temperature(
data=self.data, constants=self.constants, time_index=time_index
)
The from_config factory method#
The job of the from_config method for a model is to take a validated configuration and
then do any processing and validating to convert the configuration into the arguments
required by the __init__ method. The configuration object will contain sections for
all of the models being used in a simulation, so you should extract the configuration
for your model and then do any processing - this might simply be passing sections of the
configuration to the __init__ method or might need to do some pre-processing, such as
loading additional model specific data.
The method then uses those parsed arguments to actually call the __init__ method and
return an initialised instance of the model using the settings. The from_config
method should raise an InitialisationError if the configuration fails.
As an example:
@classmethod
def from_config(
cls, data: Data, configuration: Configuration, update_interval: Quantity
) -> FreshWaterModel:
"""Factory function to initialise the freshwater model from configuration.
This function unpacks the relevant information from the configuration file, and
then uses it to initialise the model. If any information from the config is
invalid rather than returning an initialised model instance an error is raised.
Args:
data: A :class:`~virtual_ecosystem.core.data.Data` instance.
configuration: A validated Virtual Ecosystem model configuration object.
update_interval: Frequency with which all models are updated
"""
# Extract the model configuration from the complete configuration.
model_config: FreshwaterConfiguration = configuration.get_subconfiguration(
"freshwater", FreshwaterConfiguration
)
# Load the community data into a data frame
community_data = pandas.read_csv(model_config.pond_data_path)
constants = model_config.constants
# Run a model specific function to validate the community data
if not check_community_data(self.community_data):
raise ConfigurationError("Pond community data is not valid")
LOGGER.info(
"Information required to initialise the soil model successfully extracted."
)
return cls(
data=data,
update_interval=update_interval,
community_data=community_data,
constants=constants
)
Additional data inputs to a model
Most of the data in a Virtual Ecosystem simulation is loaded into the central Data
object and shared between the models. However, you may need to load additional data to
initialise your model that is only used within the model and not shared through the
Data object. You might share summary data with other models through the Data
object - these are variables that will be included in vars_populated_by_init or
vars_populated_by_first_update.
The preferred way to do this is to add a configuration option that points to a file
containing data to load - such as the pond_data_path in the example above. The
from_config method should handle loading the data and converting it into a Python
object that is one of the arguments to the model __init__ method. This approach
separates data loading from the model processing and makes it easier to test and run the
model class.
Other model steps#
There are currently two other method that must be included as part of the model class. Neither of these are currently used, so can simply be included as function stubs with docstrings as shown below:
def spinup(self) -> None:
"""Placeholder function to spin up the freshwater model."""
def cleanup(self) -> None:
"""Placeholder function for freshwater model cleanup."""