Background

A detailed explanation of the model can be found in the method paper:

Hagen O, Flück B, Fopp F, Cabral JS, Hartig F, et al. (2021) gen3sis: A general engine for eco-evolutionary simulations of the processes that shape Earth’s biodiversity. PLOS Biology 19(7): e3001340.

Gen3sis is a process-based model designed to simulate the evolution and ecology of diversifying lineages over a paleo-environmental landscape. Within the Landscape Ecology Group, gen3sis is used by several projects to explore evolutionary processes and test hypotheses about the mechanism shaping biodiversity in plants, tetrapods, lizards and fish, among others.

Gen3sis relies on two inputs: (i) raster-based, time-varying landscape layers and (ii) a configuration file with four core model functions. The landscape defines the arena where the simulation occurs, while the config defines the parameters of the simulation and the eco-evolutionary rules. Thus, gen3sis is highly modular and can adapt to almost every study system.

We highly recommend the use of the browser() function supplied by base R for troubleshooting configuration files. The function will stop the simulation at the set line in the config and allow users to explore the environment at that point.

Core model objects

Species

Information of all extinct and extant species within a simulation is stored within a species object. For each species, this includes: abundance per occupied cell, trait values per occupied cell, and a compressed matrix of pairwise divergence values between occupied cells.

Landscape objects

The user-provided landscape to gen3sis is the framework over which the model runs and is comprised of two objects: landscape and distance matrices. The landscape is a list of data frames. Each data frame contains the environmental variable values for each cell in the landscape at each time step. Additionally, there is a distance matrix for each time step. These matrices store the pairwise dispersal cost values between habitable cells across the landscape.

Configuration file

The configuration file contains all the simulation functions and variable definitions to be used in your simulation run. This includes how you implement the core model functions described below as well as the parameters you would like to use.

Core model functions

Initialisation

The first step of any simulation is to populate your landscape with one or more species and assign them traits. This includes geographic distribution, types of traits to be implemented, and values for those traits. Every species in the simulation is stored in a list and consists of an ID, their abundance, trait values, and intraspecies divergence values.

Speciation

The speciation function determines how you want new species to form. This is based on a divergence counter and speciation threshold. Once two clusters within a species diverge enough to reach the speciation threshold, a new species forms. How the counter behaves is entirely up to users!

Dispersal

How species disperse across the landscape is determined by simulated migration events between habitable cells. The range and behaviour of these events is set by users and can be based on fixed values, draws from probability distributions, species trait values, or anything else thought up by users.

Mutation

Each species can have traits set by users. This function provides the opportunity to modify or "mutate" these traits with each time step in the simulation. How these traits change (or not) is entirely customisable.

Ecology

Every species present in a cell across the landscape has an abundance value representing the population size of the species in that location. The ecology function allows users to modify this abundance value based on environmental values, species interactions, or anything else relevant to the study. An abundance of 0 leads to the extinction of the species in that cell.

Limitations

Paleo-environmental data

Investigations of deep-time paleoclimatic influences on biodiversity are still limited by our current mechanistic knowledge of eco-evolutionary processes and by computational power, as well as by the availability of paleo-environmental reconstructions (Svenning et al., 2015, Franklin et al., 2017, Pontarp et al., 2019). Biodiversity dynamics and climatic variations happening on smaller spatio-temporal scales have to be ignored due to the uncertainty in paleo-landscape reconstructions.

Distribution, fossil and phylogenetic data

Biodiversity data is of the essence when evaluating implemented processes of eco-evolutionary models with empirical biodiversity patterns. In order to perform the evaluation, multiple past and present biological empirical datasets can be used, such as: (i) fossil records, (ii) calibrated molecular phylogenies, (iii) population genetic data, (iv) trait measurements, and (v) species distribution maps. The combination of multiple datasets, such as phylogenies and fossils, provides a better picture of past dynamic processes (Huang et al., 2015, Hagen et al., 2018, Coiro et al., 2019). The main gaps remaining in biodiversity data, as pointed out in multiple studies (Franklin, 2010, Hampton et al., 2015, Meyer et al., 2016), are: (i) sparse data with regional biases, (ii) a lack of non-occurrence data reporting, (iii) poor availability of public data, and (iv) high heterogeneity in data quality and methodologies.

Model complexity

The modelling engine gen3sis introduced here is predominantly a theoretical model rather than exclusively a calculating tool, since responses from possible natural processes are predicted (Guisan and Zimmermann, 2000). By prioritizing theoretical correctness of the predicted response over predicted precision, a spatio-temporal mechanistic model was created in the most flexible way possible in order to explore multiple hypotheses and processes.

Computational time

Runtimes are heavily dependent on the number of species emerging during a simulation and their geographical extent, and thus are highly dependent on the assumed model parameters and input landscape. The current state of optimizations is limited because no parallelization is implemented, as ease of maintenance and development are prioritized for this initial release.

Core functions

Given the inherent computational limitations, gen3sis tries to incorporate all the processes at the level of geographical ranges of populations, as realistically as possible. However, the modelled objects are limited to geographic populations and species. Also, it is not possible to track cluster phylogenies. Moreover, there is no within-cell variation within a species.

Caveats

temporal behaviour: No variables in gen3sis have an explicit temporal component. It is always x/timestep. That means that most processes will either speed up or slow down if one changes the temporal resolution of the input. For example, a configured rate of dispersal of 1/timestep can become 10/million years or 2/million years for simulations with 10 and 2 timesteps per million years respectively.
raster inputs: The behaviour of the species dispersal and geographic clustering depends on the spatial input resolution. Changing input resolution can have secondary effects. For example, a higher-resolution landscape input will have more cells. Not only will it be easier to reach neighbouring cells, but there will also be more chances for dispersal events to happen. As such there might be a non-linear dependency between dispersal values and input resolution.
polar distortions: Our distance calculations correct for the distance distortions raster-based data experiences towards the polar regions. This does lead to many small cells being close together in the polar regions. This makes dispersal a lot easier compared to equatorial cells in two ways. First, the dispersal distances are often enough to reach multiple neighbours. And second, since there are many more cells closed by, more dispersal events can happen.

Admin message