API documentation for the `data` module#

The data module handles the population and storage of data sources used to run Virtual Ecosystem simulations.

The Data class#

The core Data class is used to store data for the variables used in a simulation. It can be used both for data from external sources - for example, data used to set the initial environment or time series of inputs - and for internal variables used in the simulation. The class behaves like a dictionary - so data can be retrieved and set using data_object['varname'] - but also provide validation for data being added to the object.

All data added to the class is stored in a Dataset object, and data extracted from the object will be a DataArray. The Dataset can also be accessed directly using the data attribute of the class instance to use any of the Dataset class methods.

When data is added to a Data instance, it is automatically validated against the configuration of a simulation before being added to the data attribute. The validation process also stores information that allows models to can confirm that a given variable has been successfully validated.

The core of the Data class is the __setitem__() method. This method provides the following functionality:

It allows a DataArray to be added to a Data instance using the data['varname'] = data_array syntax.
It applies the validation step using the validate_dataarray() function. See the axes module for the details of the validation process, including the AxisValidator class and the concept of core axes.
It inserts the data into the Dataset instance stored in the data attribute.
Lastly, it records the data validation details in the variable_validation attribute.

The Data class also provides three shorthand methods to get information and data from an instance.

The __contains__() method tests if a named variable is included in the internal Dataset instance.
# Equivalent code 'varname' in data 'varname' in data.data
The __getitem__() method is used to retrieve a named variable from the internal Dataset instance.
# Equivalent code data['varname'] data.data['varname']
The on_core_axis() method queries the variable_validation attribute to confirm that a named variable has been validated on a named axis.
# Test that the temperature variable has been validated on the spatial axis data.on_core_axis('temperature', 'spatial')

Adding data from a file#

The general solution for programmatically adding data from a file is to:

manually open a data file using an appropriate reader packages for the format,
coerce data from named variables into properly structured DataArray objects, and then
use the __setitem__() method to validate and add it to a Data instance.

The load_to_dataarray() implements data loading to a DataArray for some known file formats, using file reader functions described in the readers module. See the details of that module for supported formats and for extending the system to additional file formats.

# Load temperature data from a supported file
from virtual_ecosystem.core.readers import load_to_dataarray
results = load_to_dataarray(
    '/path/to/supported/format.nc', var_names=['temperature']
)
data['temperature'] = results['temperature']

Using a data configuration#

A Data instance can also be populated using the load_data_config() method. This is expecting to take a properly validated configuration object, typically created from TOML files (see ConfigurationLoader). The expected structure of the data configuration section within those TOML files is as follows:

[[core.data.variable]]
file_path="/path/to/file.nc"
var_name="precip"
[[core.data.variable]]
file_path="/path/to/file.nc"
var_name="temperature"
[[core.data.variable]]
file_path="/path/to/a/different/file.nc"
var_name="elev"

You can include `core.data.variable` tags in different files. This can be useful to group model-specific data with other model configuration options, and allow configuration files to be swapped in a more modular fashion. However, the data configurations across all files must not contain repeated data variable names.

# Load configured datasets
data.load_data_config(config)

Classes:

`Data`(grid)	The Virtual Ecosystem data object.
`DataGenerator`(spatial_axis, temporal_axis, ...)	Generate artificial data.

class virtual_ecosystem.core.data.Data(grid: Grid)[source]#

The Virtual Ecosystem data object.

This class holds data for a Virtual Ecosystem simulation. It functions like a dictionary but the class extends the dictionary methods to provide common methods for data validation etc and to hold key attributes, such as the underlying spatial grid.

Parameters:: grid – The Grid instance that will be used for simulation.
Raises:: TypeError – when grid is not a Grid object

Methods:

`__contains__`(key)	Check if a given data variable is present in a Data instance.
`__getitem__`(key)	Get a given data variable from a Data instance.
`__repr__`()	Returns a representation of a Data instance.
`__setitem__`(key, value)	Load a data array into a Data instance.
`add_from_dict`(output_dict)	Update data object from dictionary of variables.
`load_data_config`(config)	Setup the simulation data from a user configuration.
`on_core_axis`(var_name, axis_name)	Check core axis validation.
`save_current_state_to_zarr`(output_file_path, ...)	Export requested variables in current data state to `zarr` format.
`save_to_zarr`(output_file_path[, group, ...])	Save variables from the data object to a Zarr store.

Attributes:

`data`	The `Dataset` used to store data.
`grid`	The configured Grid to be used in a simulation.
`known_variables`	A dictionary of known variables.
`variable_validation`	Records validation details for loaded variables.

__contains__(key: str) → bool[source]#

Check if a given data variable is present in a Data instance.

This method provides the var_name in data_instance functionality for a Data instance. This is just a shortcut: var in data_instance is the same as var in data_instance.data.

Parameters:: key – A data variable name

__getitem__(key: str) → DataArray[source]#

Get a given data variable from a Data instance.

This method looks for the provided key in the data variables saved in the data attribute and returns the DataArray for that variable. Note that this is just a shortcut: data_instance['var'] is the same as data_instance.data['var'].

Parameters:: key – The name of the data variable to get
Raises:: KeyError – if the data variable is not present

__repr__() → str[source]#: Returns a representation of a Data instance.

__setitem__(key: str, value: DataArray) → None[source]#

Load a data array into a Data instance.

This method takes an input {class}`~xarray.DataArray` object and then matches the dimension and coordinates signature of the array to find a loading routine given the grid used in the {class}`virtual_ecosystem.core.data.Data` instance. That routine is used to validate the DataArray and then add the DataArray to the {class}`~xarray.Dataset` object or replace the existing DataArray under that key.

Note that the DataArray name is expected to match the standard internal variable names used in Virtual Ecosystem and this is enforced against the dictionary of known variables.

The method also adds unit and description metadata to from the known variables database to attributes as they are written to the data object.

Parameters:

key – The name to store the data under
value – The DataArray to be stored

Raises:

TypeError – when the value is not a DataArray.

add_from_dict(output_dict: dict[str, DataArray]) → None[source]#

Update data object from dictionary of variables.

This function takes a dictionary of updated variables to replace the corresponding variables in the data object. If a variable is not in data, it is added. This will need to be reassessed as the model evolves; TODO we might want to split the function in strict ‘replace’ and ‘add’ functionalities.

Parameters:: output_dict – dictionary of variables from submodule
Returns:: an updated data object for the current time step

data#: The Dataset used to store data.

grid: Grid#: The configured Grid to be used in a simulation.

known_variables: dict[str, VariableMetadata]#: A dictionary of known variables.

load_data_config(config: CoreConfiguration) → None[source]#

Setup the simulation data from a user configuration.

This is a method is used to validate a provided user data configuration and populate the Data instance object from the provided data sources. The data_config dictionary can contain a ‘variable’ key containing an array of dictionaries providing the path to the file (file_path) and the name of the variable within the file (var_name). The function groups variables by their source file path, so that each file is only opened once to load the requested variables.

Parameters:: config – A validated Virtual Ecosystem model configuration object.

on_core_axis(var_name: str, axis_name: str) → bool[source]#

Check core axis validation.

This function checks if a given variable loaded into a Data instance has been validated on one of the core axes.

Parameters:

var_name – The name of a variable
axis_name – The core axis name

Returns:

A boolean indicating if the variable was validated on the named axis.

Raises:

ValueError – Either an unknown variable or core axis name or that the variable validation data in the Data instance does not include the variable, which would be an internal programming error.

save_current_state_to_zarr(output_file_path: Path, time_index: int, timestamp: datetime64, variables_to_save: list[str] = [], group: str | None = None) → None[source]#

Export requested variables in current data state to zarr format.

Parameters:

output_file_path – Path to the zarr data store.
time_index – The time index of the slice being saved
timestamp – The timestamp of the start of the timeslice
variables_to_save – An optional list of variables to be exported.
group – An optional zarr group to export the data to.

save_to_zarr(output_file_path: Path, group: str | None = None, variables_to_save: list[str] | None = None) → None[source]#

Save variables from the data object to a Zarr store.

Either the whole contents of the data object or specific variables of interest can be saved using this function.

Parameters:

output_file_path – Path location to save the Virtual Ecosystem model state.
group – A zarr group to export the data to.
variables_to_save – List of variables to be saved, defaulting to all variables.

variable_validation: dict[str, dict[str, str | None]]#

Records validation details for loaded variables.

The validation details for each variable is stored in this dictionary using the variable name as a key. The validation details are a dictionary, keyed using core axis names, of the AxisValidator subclass applied to that axis. If no validator was applied, the entry for that core axis will be None.

class virtual_ecosystem.core.data.DataGenerator(spatial_axis: str, temporal_axis: str, temporal_interpolation: timedelta64, seed: int | None, method: str, **kwargs: Any)[source]#

Generate artificial data.

Currently just a signature sketch.

API documentation for the data module#

The Data class#

Adding data from a file#

Using a data configuration#

API documentation for the `data` module#