This is a legacy version of Mapinguari - for the current version, go to gabrielhoc.github.io/Mapinguari
It is well documented that climate change has severe effects on biodiversity. Species distribution models or SDMs are popular tools for predicting individual species distributions and project the effects of climate change on those distributions, under different scenarios. Most SDMs are correlative and do not take into account biological processes underlying species responses to environmental variables. We aim to provide a modeling tool to fulfill this gap by estimating geographical layers with biologically relevant information that can be used in SDMs or other biogeographical analysis, as well as stimulating good practices in dealing with spatial autocorrelation, predictor selection, model evaluation, consensus and comparison.
Mapinguari is a package for program R aimed at providing tools to facilitate the incorporation of biological processes in biogeographical analyses. It offers conveniences in fitting, comparing and extrapolating models of biological processes such as physiology, phenology, adaptation, interspecific interactions, demography and dispersal. These spatial extrapolations can be informative by themselves, but also complement traditional correlative SDM methods, by mixing environmental and process-based predictors.
On this manual I will provide some examples of the kind of information that can be generated with Mapinguari, using a terrestrial ectotherm and processes relevant to those animals as examples, such as temperature limited activity times and locomotor performance. The spatial information that can be generated with the package is not limited to those presented here, and the same principles apply to generating surfaces relevant to other taxa, such as metabolic rates, time inside thermal neutral zone, photosyntesis, seed and egg development rates, etc.
The package is currently hosted on GitHub, this is how to install and load it:
# You need package devtools to install packages from GitHub
library(devtools)
install_github("gabrielhoc/MapinguariLegacy")
library(MapinguariLegacy)
Next, we are going to give an example of how to use the package, using some simulated data:
Fulanus beltranus is a fictional species of lizard from Democratic Republic of Congo. We would like to find out how this species is going to do in different climate change scenarios. First let’s look at the points where it is currently known to occurr. To access the table with distribution data FulanusDistribution
, package Mapinguari has to be loaded. Here we use ggmap to plot the points over a image from Google Maps (copyright Google).
# First, let's take a look at FulanusDistribution
head(FulanusDistribution)
## species Lon Lat
## 69 Fulanus_beltranus 23.24550 1.25600
## 70 Fulanus_beltranus 23.26570 1.26980
## 71 Fulanus_beltranus 23.26570 1.26980
## 72 Fulanus_beltranus 23.24550 1.25600
## 73 Fulanus_beltranus 23.18456 1.11298
## 74 Fulanus_beltranus 23.18456 1.11298
# Load ggmap and ggplot2 to make the map
library(ggmap)
library(ggplot2)
Fulanus_bbox <- make_bbox(lat = Lat,
lon = Lon,
data = FulanusDistribution,
f = 0.2)
Fulanus_big <- get_map(location = Fulanus_bbox,
source = "google",
maptype = "terrain")
ggmap(Fulanus_big) +
geom_point(data = FulanusDistribution,
mapping = aes(x = Lon, y = Lat),
size = 1,
colour = 'red')
Some points might be too close to each other and be redundant, while other might be mistakenly placed in regions where the species is known not to occurr. Fulanus occurrs only in land and in altitudes below 1000 meters. Let’s filter out points that are within 2 kilometers from each other, outside this altitude range or on the ocean. To get a altitude layer, we can use function get_rasters
, which will be explained in further detail in the next section.
# First, we need an altitude raster.
alt_list <- get_rasters(
raster_source = "/Volumes/Podocnemis/Gabriel/Dropbox/data/rasters/worldclim/global_rasters_10min",
ext = FulanusDistribution,
#non_fixed_var = c('prec', 'tmin', 'tmax'), # make this optional
fixed_var = 'alt')
# Get the layer from the list
alt <- alt_list[[1]]
# Than, we clean the points
FulanusDistribution_clean <-
clean_points(coord = FulanusDistribution,
merge_dist = 2,
reference_layer = alt < 1000 & !is.na(alt), # Oceans are represented by NA on this altitude layer
layer_filter = 0)
## [,1]
## n_entries_raw 615
## n_entries_clean 128
## km_resolution_merging 2
head(FulanusDistribution_clean)
## Lon Lat layer
## 1 23.24550 1.25600 1
## 2 23.26570 1.26980 1
## 3 23.18456 1.11298 1
## 4 22.55580 1.17460 1
## 5 23.30160 1.27210 1
## 6 22.59310 1.23220 1
Let’s plot those points on the map.
library(ggmap)
ggmap(Fulanus_big) +
geom_point(data = FulanusDistribution_clean,
mapping = aes(x = Lon, y = Lat),
size = 1,
colour = 'red')
Now, let’s get some information on the climate of this region.
We can use function get_rasters
to extract WorldClim surfaces for that area. The argument ext
tells us which is the area you desire to crop your rasters around. It accepts either a numerical vector with the coordinates of cropping limits (in order: western most longitude, easter most longitude, southern most latitude then northern most latitude), or a table of longitudes (column named Lon
) and latitudes (column named Lat
), such as the FulanusDistribution_clean
table we created previously. In this case, the function will grab the coordinates of the most extreme points and use those as the cropping limits. The argument margin
adds to these limits, and it is on the same unit as the coordinates you supply.
library(raster)
FulanusEcoRasters_present <-
get_rasters(
raster_source = "/Volumes/Podocnemis/Gabriel/Dropbox/data/rasters/worldclim/global_rasters_10min",
ext = FulanusDistribution_clean,
margin = 5,
non_fixed_var = c('prec', 'tmin', 'tmax'),
fixed_var = 'alt',
years = c("present"),
reorder = TRUE)
plot(FulanusEcoRasters_present[[1]])
get_rasters
is able to download rasters from WorldClim if you leave raster_source
blank, but you probably don’t want to download them every time you are executing the function. That’s why the argument raster_source
will also take a list of rasters from your workspace or a path to a folder containing those rasters. These options also have the advantage of being able to input any raster you want, not being limited to the ones you can download from WorldClim.
However, in order for Mapinguari to recognize which variables correspond to each year or scenario, you have to name your folders and list elements following the convention variable_year_scenario. The default character for separating those terms is “_“, but you can change that, as long as you do the corresponding change to the argument separator
in the function. Some of your rasters will not be subject to scenarios, since they are measures of current climate, instead of projections. In that case you only have to name their folders variable_year, omitting the scenario term. The argument baseline
identifies which years are not subject to scenarios. The default for this argument is present
, but it can be changed at your convenience. Some other variables are constant accross the time considered, such as altitude. In that case you only have to write the name of the variable. Here is an example of how my folder is structured:
I’m going to assign the path to that directory to an object:
my_directory <- "/Volumes/Podocnemis/Gabriel/Dropbox/data/rasters/worldclim/global_rasters_10min"
Why did I place some variables in the non-fixed
argument and others in the fixed
argument? Variables on non_fixed
are things such as climate, which you expect to change in different times and future scenarios, while variables in fixed
are things such as geological features, which you expect to remain constant accross the time projected.
Another important argument for non-fixed variables is reorder
, which will take the last two characters of your RasterLayer names, replace letters with zeros and order the layers in ascending order of those numbers. This is useful because of the way some of the WorldClim layers are named when you download them, which when stacked will be placed out of chronological order. This argument fixes that. But be careful if you are using layers with different names than the ones from WorldClim. If they are in the correct order when you stack them, reorder
could scramble them, so set it to FALSE.
Let’s try getting some projections for the future years of 2050 and 2070:
FulanusEcoRasters_future <-
get_rasters(
raster_source = my_directory,
ext = FulanusDistribution_clean,
margin = 5,
non_fixed_var = c('prec', 'tmin', 'tmax'),
years = c('2050', '2070'),
scenarios = c('rcp45', 'rcp85'),
reorder = TRUE)
plot(FulanusEcoRasters_future$`2050_rcp45`)
As you can see, we have multiple rasters for each variable, one for each month. When doing biogeographical analysis, such as species distribution models, it is common to need a summary raster for each variable, such as the annual average or total. Function transform_rasters
can help us to get summaries for the whole year or for parts of the year.
Function transform_rasters
allows us to apply any vectorized function to any subset of rasters for each variable, so we can get measures like averages, standard deviations, totals, variance or any other summary function for the month layers. Summaries have the advantage of reducing computation time, as they allows us to work with less layers. The first argument is raster_stack
, which is where we supply the RasterStack with the layers for the calculations. The second argument, FUN_qlist
, is where we supply the functions to be applied, the variables to apply the functions to, as well as the subset of the layers for that variable that will be included in the calculation. This expressions have to be inside a qlist
, which is a list that returns its arguments unevaluated. In order for the function to find the layers, the layer names must begin with the same variable name used in the FUN_qlist
expressions, before the separator character.
Fulanus_present_year_summaries <-
transform_rasters(FulanusEcoRasters_present$present,
FUN_qlist = qlist(tmax_average_year = mean(tmax),
tmin_average_year = mean(tmin),
prec_total_year = sum(prec)))
# The function operates in RasterStacks, if you want to apply it to a list, you can use `lapply`:
Fulanus_future_year_summaries <-
lapply(FulanusEcoRasters_future, transform_rasters,
FUN_qlist = qlist(tmax_average_year = mean(tmax),
tmin_average_year = mean(tmin)))
# You can also specify different subsets of the year to get, for example, summaries for seasons of the year, by using double square brackets.
Fulanus_present_two_seasons_summaries <-
transform_rasters(FulanusEcoRasters_present$present,
FUN_qlist = qlist(tmax_average_breeding = mean(tmax[[3:8]]),
tmax_average_growing = mean(tmax[[c(7:12, 1:3)]])))
plot(Fulanus_present_year_summaries$tmax_average_year)
plot(Fulanus_present_two_seasons_summaries$tmax_average_breeding)
plot(Fulanus_present_two_seasons_summaries$tmax_average_growing)
Summary functions are an useful example, but we can apply more complex functions, such as models using the spatial variables you have, in order to create new variables. But before we create those layers, we must fit the models in question. Function fit_curves
allows us to fit models, compare them and get a vectorized function which returns the prediction of the model given the necessary variables.
Function fit_curves
can be used to compare models with different parameterizations or specifications for your biological processes. The user needs to inform a list of models in argument models
and the function will output a table with their AIC, BIC, log likelihood, delta AIC, delta BIC and a rank for AIC and BIC values. The main goal of the function, however, is to create vectorized predict functions for each model, which can then be used in transform_models
to spatially extrapolate the model. In order to make the spatial extrapolation possible, you must have spatial information for at least one of the variables on the right hand side of your model formula. The vectorized function generated can be used to get predictions of your model in contexts other than spatial, such as for time series or specific values.
Back to Fulanus, we measured the maximum running speed of several individuals under different temperatures. We also recorded their body sizes, since that variable is likely to inffluence their performance. Here is how the data look like:
head(FulanusPhysiology)
## species id temp performance size site acctemp hydration
## 1 Fulanus_beltranus 2412 3 22.26 1.994 Cafundo 10 0.825
## 2 Fulanus_beltranus 2015 3 24.30 3.327 Cafundo 10 0.825
## 3 Fulanus_beltranus 2122 3 25.02 2.658 Cafundo 10 0.825
## 4 Fulanus_beltranus 2220 3 25.44 2.346 Cafundo 10 0.825
## 5 Fulanus_beltranus 2022 3 26.04 3.006 Cafundo 10 0.775
## 6 Fulanus_beltranus 2212 3 26.82 2.504 Cafundo 10 0.775
Note that we also attributed a number to each lizard to keep track of which lizard did which trial. That is important because, since the same lizard ran different trials, the data points are not independent and we have to account for that when building our model.
Argument models
can be a single model or a list of models. In this case we are fitting GAMM models, which can account for the autocorrelation from running several tests with each individual. You can name the models by putting their names on the list. Most model algorithms that have a method for function predict
should work.
library(mgcv)
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:raster':
##
## getData
## This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.
# Here is an example without naming the models. They will be assigned a generic name.
perf_functions_no_name <-
fit_curves(gamm(performance ~ s(temp, bs = 'cs') + size,
random = list(id = ~ 1),
data = FulanusPhysiology))
## logLik AIC BIC dAIC dBIC rankAIC rankBIC
## model_1 -2934.343 5878.686 5901.304 0 0 1 1
# It is easier to keep track if you name them something meaningful (TPC means thermal performance curve). Also, you can fit multiple models at the same time.
perf_functions <-
fit_curves(list(tpc_gamm_size = gamm(performance ~ s(temp, bs = 'cs') + size,
random = list(id = ~ 1),
data = FulanusPhysiology),
tpc_gamm_no_size = gamm(performance ~ s(temp, bs = 'cs'),
random = list(id = ~ 1),
data = FulanusPhysiology)))
## logLik AIC BIC dAIC dBIC rankAIC
## tpc_gamm_size -2934.343 5878.686 5901.304 0.00000 0.00000 1
## tpc_gamm_no_size -2951.518 5911.035 5929.129 32.34875 27.82518 2
## rankBIC
## tpc_gamm_size 1
## tpc_gamm_no_size 2
The function prints a table containing statistics on each model, for comparison, and outputs a list containing that table, as well as a sub list for each model, containing the predictor function, raw model output and inputed arguments.
perf_functions$stats
perf_functions$tpc_gamm_size$predict
perf_functions$tpc_gamm_size$model
perf_functions$tpc_gamm_size$output
You can assign the predictor function to a name, which can be applied to any value.
my_tpc <- perf_functions$tpc_gamm_size$predict
my_tpc(temp = 20, size = 3)
## 1
## 76.27046
# You can apply it to a vector
my_tpc(temp = 20:40, size = 3)
## 1 2 3 4 5 6 7
## 76.27046 79.04413 81.34007 82.99379 84.03314 84.57768 84.74755
## 8 9 10 11 12 13 14
## 84.66288 84.44381 84.21046 84.08298 84.18262 84.69811 85.92279
## 15 16 17 18 19 20 21
## 88.15899 91.70904 96.87525 103.95998 112.26919 109.52650 79.17918
The predictor function can be applied to rasters of the variables on the right hand side of the model formula to spatialize the variable on the left hand side. On the next section we are going to give an example.
transform_rasters
can do the link between your physiological model and your spatial environmental data. The predictor functions obtained in fit_curves
can be used in the same way we used the summary functions in the previous examples. This interface also allows us to set values for terms in the model, such as estimating performance for animals of specific sizes, in the thermal performance model we fitted before.
Note, however, that the variables on the rasters used need to be on the same scale as the data you used to fit the models. In this case, the raster temperatures are multiplied by 10, so we have to fix that first.
# You can calculate the performance for minimum and maximum temperatures, by changing the term on the expression inside FUN_qlist
# We are dividing the temperatures by 10, so they are on the same scale as the data used to fit the models
Perf_rasters_tmax <-
transform_rasters(raster_stack = Fulanus_present_year_summaries,
FUN_qlist = qlist(perf_tmax = my_tpc(tmax/10, size = mean(FulanusPhysiology$size))))
Perf_rasters_tmin <-
transform_rasters(raster_stack = Fulanus_present_year_summaries,
FUN_qlist = qlist(perf_tmin = my_tpc(tmin/10, size = mean(FulanusPhysiology$size))))
# We can get the performance for individuals of different ages by changing the `size` argument on the my_tpc function (lizards' ages are highly correlated with size)
Perf_rasters_young <-
transform_rasters(raster_stack = Fulanus_present_year_summaries,
FUN_qlist = qlist(perf_young = my_tpc(tmax/10, size = min(FulanusPhysiology$size))))
Perf_rasters_old <-
transform_rasters(raster_stack = Fulanus_present_year_summaries,
FUN_qlist = qlist(perf_old = my_tpc(tmax/10, size = max(FulanusPhysiology$size))))
plot(Perf_rasters_tmax$perf_tmax)
plot(Perf_rasters_tmin$perf_tmin)
plot(Perf_rasters_young$perf_young)
plot(Perf_rasters_old$perf_old)
The function operates in RasterStacks, if you want to apply it to a list, you can use lapply
:
Perf_rasters_list <- lapply(Fulanus_future_year_summaries, transform_rasters,
FUN_qlist = qlist(perf_tmin = my_tpc(tmin/10, size = mean(FulanusPhysiology$size))))
To make the performance raster we extrapolated a generalized additive mixed model (GAMM). We can use the same functions fit_curves
and transform_rasters
to create rasters from other kinds of modelling algorithms, such as non-least squares (NLS). The amount of time lizards are able to stay active seems to be a good predictor of their ability to persist in a locality (Sinervo et al, 2010). Here we are going to demonstrate how to estimate average daily hours of activity limited by temperature. The same framework could be applied to estimate amount of time inside temperature ranges important for other organisms, such as the thermal neutral zone of mammals or temperatures relevant to seed and egg development.
In order to know how much time Fulanus lizards are able to perform their activities, first we need to know the temperature range in which they are active. Measuring the temperature of active lizards in the field, we found that they were active with body temperatures between 16 and 28 degrees Celsius. Next, we have to gather data on how temprature varies in the microhabitats those lizards use. We deployed physical models with similar thermal properties to the lizards in the microhabitats lizards were observed performing their activities. Those models were connected to data loggers that registered their temperature every four minutes. We also recorded the air temperature at a nearby weather station for every day the models were out. The table FulanusMicroclimate
contains this (simulated) data.
head(FulanusMicroclimate)
## species day month year hour minute second t_air Lon
## 1 Fulanus_beltranus 7 4 2020 0 0 0 8.7428 24.2724
## 2 Fulanus_beltranus 7 4 2020 0 4 0 8.7428 24.2724
## 3 Fulanus_beltranus 7 4 2020 0 8 0 8.7428 24.2724
## 4 Fulanus_beltranus 7 4 2020 0 12 0 8.7428 24.2724
## 5 Fulanus_beltranus 7 4 2020 0 16 0 8.3892 24.2724
## 6 Fulanus_beltranus 7 4 2020 0 20 0 8.3892 24.2724
## Lat microhabitat temp
## 1 -2.12569 shade_tree 10.0740
## 2 -2.12569 shade_tree 10.1908
## 3 -2.12569 shade_tree 10.2292
## 4 -2.12569 shade_tree 10.3076
## 5 -2.12569 shade_tree 10.3460
## 6 -2.12569 shade_tree 10.2692
Next, we are going to create a new column in the table with a binary variable, indicating if at each moment, the temperature registered for the models was inside the activity range for Fulanus or not.
# Summarise microclimate table by day
#make this into a function
library(dplyr)
# Fulanus only has activity on microhabitats between 16 and 28 degrees celsius
FulanusMicroclimate$HA <- ifelse(FulanusMicroclimate$temp > 16 & FulanusMicroclimate$temp < 28, 1, 0)
If we sum HA
for each day, we are going to get the amount of time the models were inside that temperature on each day.
# First, let's summarise the data by day
resolution_minutes <- 4 # data is logged every 4 minutes
microclimate_by_day <-
group_by(FulanusMicroclimate, day, month, year) %>%
summarise(HA = sum(HA)/60/resolution_minutes, # converting to hours
temp_micro_avg = mean(temp),
temp_air_avg = mean(t_air),
Lon = unique(Lon),
Lat = unique(Lat))
head(microclimate_by_day)
## # A tibble: 6 x 8
## # Groups: day, month [6]
## day month year HA temp_micro_avg temp_air_avg Lon Lat
## <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 7 4 2020 2.96 24.6 14.4 24.3 -2.13
## 2 7 8 2020 3.75 22.7 11.6 33.5 2.59
## 3 8 4 2020 6.33 23.3 13.2 27.6 -2.05
## 4 8 6 2020 2.19 16.0 7.05 33.5 5.26
## 5 8 8 2020 3.32 19.3 9.33 21.7 4.19
## 6 9 4 2020 3.40 26.9 15.0 31.5 1.74
Then we can relate HA to the temperature measured at the weather station, which is equivalent to the temperature at the WorldClim rasters we have already. This assumes microhabitats are similar accross the species range. Hours of activity should be bound between two values, zero and length of the day, so logistic models would be appropriate for fitting this. We are going to use package FlexParamCurve
, which allows us to fit many variations of a logistic models, such as Gompertz, Von Bertalannfy and Richards curves.
library(FlexParamCurve)
# Set options for Gompertz curve
modpar(x = microclimate_by_day$temp_air_avg,
y = microclimate_by_day$HA,
pn.options = "G_options")
change.pnparameters(M = 0.1, pn.options = "G_options")
# Set options for Von Bertanlannfy curve
modpar(x = microclimate_by_day$temp_air_avg,
y = microclimate_by_day$HA,
pn.options = "VB_options")
change.pnparameters(M = -0.3, pn.options = "VB_options")
# Set options for Richards curve
modpar(x = microclimate_by_day$temp_air_avg,
y = microclimate_by_day$HA,
pn.options = "R_options")
HAlogistic <- nls(HA ~ SSlogis(temp_air_avg, Asym, xmid, scal), data = microclimate_by_day) # A simple logistic
HAGompertz <- nls(HA ~ SSposnegRichards(temp_air_avg, Asym, K, Infl, modno = 32, pn.options = "G_options"), data = microclimate_by_day)
HAVonBertalannfy <- nls(HA ~ SSposnegRichards(temp_air_avg, Asym, K, Infl, modno = 32, pn.options = "VB_options"), data = microclimate_by_day)
HARichards <- nls(HA ~ SSposnegRichards(temp_air_avg, Asym, K, Infl, M, modno = 12, pn.options = "R_options"), data = microclimate_by_day)
HA_curve <-
fit_curves(list(logistic = HAlogistic,
Gompertz = HAGompertz,
Von_Bertalannfy = HAVonBertalannfy,
Richards = HARichards),
predict_formals = "temp_air_avg",
separator = "_")
It seems the Von Bertalannfy curve was the best one.
HA_VB <- HA_curve$Von_Bertalannfy$predict
HA_raster_present <-
transform_rasters(raster_stack = Fulanus_present_year_summaries,
FUN_qlist = qlist(HA = HA_VB(tmax/10)))
## [1] "cannot use this formula, probably because it is not vectorized"
## attr(,"class")
## [1] "snow-try-error" "try-error"
## [1] "trying not parallelized version"
plot(HA_raster_present)
Mapinguari can generate other potentially useful rasters that are not species specific, such as hydrologic variables like potential evapotranspiration (PET), actual evapotranspiration (AEt) and climatic water deficit.
In order to calculate AET, we need the PET layers in monthly resolution, so we are using the monthly layers instead of the year averages.
prec_stack <- FulanusEcoRasters_present$present[[1:12]]
tmin_stack <- FulanusEcoRasters_present$present[[13:24]]
tmax_stack <- FulanusEcoRasters_present$present[[25:36]]
alt_stack <- FulanusEcoRasters_present$present$alt
# Potential EvapoTranspiration
PET_stack <-
PETFUN(tmax = tmax_stack/10,
tmin = tmin_stack/10,
alt = alt_stack)
plot(mean(PET_stack))
# Actual EvapoTranspiration
AET_stack <-
AETFUN(PET = PET_stack, prec = prec_stack)
plot(mean(AET_stack))
# Climatic Water Deficit
CWD_stack <- PET_stack - AET_stack
names(CWD_stack) <- paste("CWD", 1:12, sep = "_")
plot(mean(CWD_stack))
You can also get day length values for the area, by inputing any raster for the area desired.
daylength_stack <- daylengthFUN(CWD_stack)
plot(mean(daylength_stack))
Another useful variable is the amount of Solar Radiation, in kiloJoules by square meter by day.
srad_stack <- sradFUN(alt_stack, tmax_stack/10, tmin_stack/10)
plot(mean(srad_stack))
Previously, we estimated hours of activity assuming homogenous microhabitats. However, microhabitats, and thus microclimates, are likely to change according to the structure of the vegetation. Many vegetation indexes, such as EVI and NDVI are based on satellite images, and therefore, are hard to extrapolate for the future. Luckily for us, AET is a good approximation of vegetation distribution. We can incorporate this information on our hours of activity models.
# let's include a new column in the table with AET
microclimate_by_day$AET <- extract(mean(AET_stack), data.frame(Lon = microclimate_by_day$Lon, Lat = microclimate_by_day$Lat))
# Let's try fitting a curve with AET
HAlogisticAET <- try(nls(HA ~ SSlogis(temp_air_avg + AET, Asym, xmid, scal), data = microclimate_by_day))
HAGompertzAET <- try(nls(HA ~ SSposnegRichards(temp_air_avg + AET, Asym, K, Infl, modno = 32, pn.options = "G_options"), data = microclimate_by_day))
## WARNING: Upper bounds for Infl are too low: optimized Infl > Imax. These will be adjusted. Alternatively
## try runing modpar with option width.bounds set to more than 1
HAVonBertalannfyAET <- try(nls(HA ~ SSposnegRichards(temp_air_avg + AET, Asym, K, Infl, modno = 32, pn.options = "VB_options"), data = microclimate_by_day))
## WARNING: Upper bounds for Infl are too low: optimized Infl > Imax. These will be adjusted. Alternatively
## try runing modpar with option width.bounds set to more than 1
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
HARichardsAET <- try(nls(HA ~ SSposnegRichards(temp_air_avg + AET, Asym, K, Infl, M, modno = 12, pn.options = "R_options"), data = microclimate_by_day))
## WARNING: Upper bounds for Infl are too low: optimized Infl > Imax. These will be adjusted. Alternatively
## try runing modpar with option width.bounds set to more than 1
## Error in nls(HA ~ SSposnegRichards(temp_air_avg + AET, Asym, K, Infl, :
## step factor 0.000488281 reduced below 'minFactor' of 0.000976562
It seems that only the simple logistic and Gomperts were able to converge, so let’s use those.
HA_AET_curve <-
fit_curves(list(HAlogisticAET,
HAGompertzAET),
predict_formals = c("temp_air_avg", "AET"))
## logLik AIC BIC dAIC dBIC rankAIC rankBIC
## model_1 -39.79095 87.58190 92.45740 0.00000000 0.00000000 1 1
## model_2 -39.79858 87.59716 92.47266 0.01525387 0.01525387 2 2
HAlogis <- HA_AET_curve$model_1$predict
# Let's join CWD to hour climatic stack
FulanusEcoRasters_present_AET <- FulanusEcoRasters_present
FulanusEcoRasters_present_AET$present <- stack(FulanusEcoRasters_present$present, AET_stack)
HA_raster_present <-
transform_rasters(raster_stack = FulanusEcoRasters_present_AET$present,
FUN_qlist = qlist(HA = HAlogis(tmax, AET)))
plot(mean(HA_raster_present))
Sometimes, you might want to create a function to use in transform_rasters
that is not a prediction of a model you can fit with fit_curves
. transform_rasters
will take custom functions you make if they are vectorized. There are different ways to vectorize a function, like function Vectorize
. We are going to make an example function, the Sinervo method for calculating hours of activity (Sinervo 2010), and use it to create a raster of hours of activity. This method creates a sin wave between the minimum and maximum temperatures for a location, simulating daily temperature variation, then counts how much time the temperatures are between the thresholds for activity for that species. This method was developed initially for hours of activity in lizards, but it can be applied for any relevant temperature thresholds, like the limits of the thermal neutral zone in mammals or temperature ranges important for seed or egg development.
# Let's create a function for the Sinervo method
# tmax is the maximum temperature at a location
# tmin is the minimum temperature at a location
# tlwr is the lower temperature threshold for activity
# tupr is the upper temperature threshold for activity
# res is the time resolution, i. e. how many parts of hours are being counted. If res = 3, hours of activity will be counted from 20 to 20 minutes (this will greatly affect the speed of the calculation).
SinervoHA <- function(tmax, tmin, tlwr, tupr, res) {
s0 <- 1:res
h0 <- 1:24
s <- expand.grid(s0, h0)[[1]]
h <- expand.grid(s0, h0)[[2]]
day_temps <-
((tmax - tmin)/2 * sin((pi/12) * (h + (s/res)) - 3 * (pi/4))) + (tmax + tmin)/2
sum(ifelse(day_temps > tlwr & day_temps < tupr, 1/res, 0))
}
# We can vectorize the function using function Vectorize
SinervoHA_vectorized <- Vectorize(SinervoHA)
SinervoHA(tmax = c(30, 40, 35), tmin = c(10, 20, 15), tlwr = 20, tupr = 30, res = 3)
## [1] 10.33333
SinervoHA_vectorized(tmax = c(30, 40, 35), tmin = c(20, 30, 25), tlwr = 20, tupr = 30, res = 3)
## [1] 23.33333 0.00000 11.66667
ha_sinervo_raster <-
transform_rasters(raster_stack = Fulanus_present_year_summaries,
FUN_qlist = qlist(SinervoHA_vectorized(tmax = tmax[[1]]/10, tmin = tmin[[1]]/10, tlwr = 15, tupr = 25, res = 3)))
# Let's cap the hours of activity by daylength
average_daylength <- mean(daylength_stack)
ha_capped <-
overlay(ha_sinervo_raster, average_daylength, fun = function(x, y) ifelse(x < y, x, y))
# hours of restriction to activity
h_restriction <- average_daylength - ha_capped
plot(ha_sinervo_raster)
plot(ha_capped)
plot(h_restriction)
We used WorldClim temperatures to estimate performance accross the distribution of our species. Those temperatures are more akin to those measured at weather station. A more precise way to do it might be using estimates of microhabitat temperature. We have microhabitat temperatures from our FulanusMicroclimate
dataset, which we can use to create a model relating microhabitat temperature to air temperature and AET then extrapolate it.
microtemp_model <-
fit_curves(glm(temp_micro_avg ~ temp_air_avg + AET, data = microclimate_by_day))
## logLik AIC BIC dAIC dBIC rankAIC rankBIC
## model_1 -63.91627 135.8325 140.7081 0 0 1 1
predict_microtemp <- microtemp_model$model_1$predict
microtemp_raster_present <-
transform_rasters(raster_stack = FulanusEcoRasters_present_AET$present,
FUN_qlist = qlist(microtemp = predict_microtemp(temp_air_avg = tmax/10, AET = AET)))
Perf_rasters_microtemp <-
transform_rasters(raster_stack = microtemp_raster_present,
FUN_qlist = qlist(perf_microtemp = my_tpc(microtemp,
size = mean(FulanusPhysiology$size))))
plot(mean(Perf_rasters_microtemp))
Now, let’s say the most crucial period for Fulanus population dynamics is their breeding period and they will breed only under certain climatic conditions. This might cause their breeding period to vary spatially and temporally. In order to account for that, we can fit a logistic model of breeding status vs climate, using data from the table FulanusBreeding
, which contains data on breeding status and climate at different localities. Then we can use this model to create binary rasters with the locations where Fulanus is breeding at each month, using function transform_rasters
. After that, we can summary the climatic rasters only during the breeding season at different places, by using the binary rasters as weights to average other variables. You can use this method to merge any raster according to a spatial varying criterium. This can be used, for example, to merge rasters of locally adapted physiology according to the distribution of populations.
Let’s take a look at the table.
head(FulanusBreeding)
## Lon Lat month breeding prec tmin tmax
## 1 23.24550 1.25600 jan 0 36 214 325
## 2 23.26570 1.26980 jan 0 36 214 325
## 3 23.26570 1.26980 jan 0 36 214 325
## 4 23.24550 1.25600 jan 0 36 214 325
## 5 23.18456 1.11298 jan 0 38 212 322
## 6 23.18456 1.11298 jan 0 38 212 322
Now we fit the model of breeding season.
Phen_model <- fit_curves(models = glm(breeding ~ prec, family = binomial(link = 'logit'), data = FulanusBreeding))
## logLik AIC BIC dAIC dBIC rankAIC rankBIC
## model_1 -2425.187 4854.374 4868.187 0 0 1 1
PhenFUN1 <- Phen_model$model_1$predict
And then we extrapolate it.
season_rain_rasters <-
transform_rasters(raster_stack = FulanusEcoRasters_present$present,
FUN_qlist = qlist(season_rain = PhenFUN1(prec)))
It makes sense for us to convert this to binary now, but leaving it as proportions would also work.
# Let's convert the raster to binary
season_rain_rasters_binary <-
calc(season_rain_rasters, function(x) ifelse(x < 0.5, 0, 1))
FulanusEcoRasters_season_rain <-
transform_rasters(FulanusEcoRasters_present$present,
FUN_qlist = qlist(weighted.mean(tmax, weights = season_rain_rasters_binary)))
# Length of Breeding season
plot(sum(season_rain_rasters))
# Average Performance during breeding season
plot(FulanusEcoRasters_season_rain)
The function can also be a condition. Let’s say we are only interested on the periods when precipitation is bigger than 150.
PhenFUN2 <- function(x) ifelse(x > 150, 1, 0)
season_prec_rasters <-
transform_rasters(raster_stack = FulanusEcoRasters_present$present,
FUN_qlist = qlist(season_rain = PhenFUN2(prec)))
FulanusEcoRasters_season_prec <-
transform_rasters(FulanusEcoRasters_present$present,
FUN_qlist = qlist(weighted.mean(tmax, weights = season_prec_rasters)))
# Length of Breeding season
plot(sum(season_prec_rasters))
# Average Performance during breeding season
plot(FulanusEcoRasters_season_prec)
# group desired variables in a stack
total_stack <- stack(Fulanus_present_year_summaries,
alt_stack,
sum(PET_stack),
sum(AET_stack),
sum(CWD_stack),
ha_capped,
h_restriction,
mean(Perf_rasters_microtemp),
sum(season_rain_rasters))
names(total_stack) <- c("prec", "tmin", "tmax", "alt", "PET", "AET", "CWD", "HA", "HR", "perf", "breed")
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
library(dismo)
##
## Attaching package: 'dismo'
## The following object is masked from 'package:ggmap':
##
## geocode
# Select 1000 random points from mask
set.seed(22111962)
mask <- total_stack[[1]]
rnd.points <- randomPoints(mask, 1000)
# Principal Components Analysis (PCA) of environmental variables
env.data <- extract(total_stack, rnd.points)
pca.env.data <- princomp(env.data, cor = T)
plot(pca.env.data)
biplot(pca.env.data, pc.biplot = T)
summary(pca.env.data)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 2.2620171 1.9970656 1.00876367 0.60650924
## Proportion of Variance 0.4651565 0.3625701 0.09250947 0.03344122
## Cumulative Proportion 0.4651565 0.8277266 0.92023602 0.95367725
## Comp.5 Comp.6 Comp.7 Comp.8
## Standard deviation 0.52326558 0.35152815 0.244785180 0.211571257
## Proportion of Variance 0.02489153 0.01123382 0.005447253 0.004069309
## Cumulative Proportion 0.97856878 0.98980260 0.995249854 0.999319163
## Comp.9 Comp.10 Comp.11
## Standard deviation 0.0865333720 1.088564e-03 1.137301e-08
## Proportion of Variance 0.0006807295 1.077247e-07 1.175867e-17
## Cumulative Proportion 0.9999998923 1.000000e+00 1.000000e+00
pca.env.data$loadings[, 1:3]
## Comp.1 Comp.2 Comp.3
## prec 0.3294299 0.306185551 0.13604800
## tmin 0.1863011 0.406408592 -0.04812344
## tmax -0.3340659 0.304267809 -0.18739889
## alt -0.1468538 -0.449161168 -0.05652756
## PET 0.3279248 0.213602626 0.30132870
## AET -0.3318814 0.297074121 -0.14676007
## CWD 0.3956749 -0.175602369 0.23001339
## HA -0.3508002 0.009358823 0.59826615
## HR 0.3510937 -0.009044878 -0.59731783
## perf 0.1556342 0.398622013 0.18711068
## breed -0.2807376 0.350675915 -0.16092335
# Variance Inflation Factors (VIF) of environmental variables
library(usdm)
##
## Attaching package: 'usdm'
## The following object is masked from 'package:car':
##
## vif
## The following object is masked from 'package:nlme':
##
## Variogram
v1 <- vifcor(total_stack, th = 0.9) #correlation
v1
## 4 variables from the 11 input variables have collinearity problem:
##
## HR tmax CWD alt
##
## After excluding the collinear variables, the linear correlation coefficients ranges between:
## min correlation ( breed ~ prec ): -0.04535664
## max correlation ( breed ~ AET ): 0.8826264
##
## ---------- VIFs of the remained variables --------
## Variables VIF
## 1 prec 11.452159
## 2 tmin 4.296651
## 3 PET 5.035600
## 4 AET 6.160886
## 5 HA 1.797801
## 6 perf 3.310480
## 7 breed 5.951872
v2 <- vifstep(total_stack, th = 10) #VIF
v2
## 5 variables from the 11 input variables have collinearity problem:
##
## PET HR tmax CWD alt
##
## After excluding the collinear variables, the linear correlation coefficients ranges between:
## min correlation ( breed ~ prec ): -0.03545731
## max correlation ( breed ~ AET ): 0.8841397
##
## ---------- VIFs of the remained variables --------
## Variables VIF
## 1 prec 5.996003
## 2 tmin 3.836927
## 3 AET 5.808124
## 4 HA 2.011953
## 5 perf 3.458412
## 6 breed 5.566383
# Subset environmental stack
env.selected <- exclude(total_stack, v2) #exclude collinear variables identified with vifstep
env.selected
## class : RasterStack
## dimensions : 75, 119, 8925, 6 (nrow, ncol, ncell, nlayers)
## resolution : 0.1666667, 0.1666667 (x, y)
## extent : 17.16667, 37, -4, 8.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## names : prec, tmin, AET, HA, perf, breed
## min values : 131.83333333, 33.83333333, 175.41902023, 0.00000000, 68.44142381, 0.02328887
## max values : 355.416667, 236.250000, 1637.326681, 12.124163, 115.178630, 8.612525
# Pairs plot of selected environmental predictors
env.data.std <- data.frame(scale(env.data)) # Scale variables
library(corrplot)
## corrplot 0.84 loaded
M <- cor(env.data.std[, names(env.data.std)])
corrplot.mixed(M, upper = "ellipse", lower = "number")
pres_coords <- FulanusDistribution_clean[1:2]
pseudo_abs <- pseudoabsences(pres_coords, 150, 2000)
## Loading required namespace: rgeos
values_pres <- data.frame(unlist(extract(env.selected, pres_coords)))
values_abs <- data.frame(unlist(extract(env.selected, pseudo_abs)))
values_pres$presence <- 1
values_abs$presence <- 0
pres_abs_table <- rbind(values_pres, values_abs)
library(biomod2)
# Prepare data
FulanusBiomodData <- BIOMOD_FormatingData(
resp.var = pres_abs_table$presence,
expl.var = pres_abs_table[1:6],
resp.xy = rbind(pres_coords, pseudo_abs),
resp.name = "Fulanus")
FulanusBiomodData
# Path to MAXENT
myBiomodOption <- BIOMOD_ModelingOptions(MAXENT.Phillips = list(path_to_maxent.jar = "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/dismo/java", memory_allocated = 1024))
# Parallel processing
library(doParallel)
cl <- makeCluster(2) # number of cores in computer
registerDoParallel(cl)
FulanusModelOut <- BIOMOD_Modeling(FulanusBiomodData,
models = c("GBM","CTA","RF", "GLM","GAM","ANN","SRE","FDA","MARS","MAXENT.Phillips","MAXENT.Tsuruoka"),
models.options = myBiomodOption,
NbRunEval = 10,
DataSplit = 75,
Prevalence = NULL,
VarImport = 0,
models.eval.meth = c("KAPPA","TSS","ROC","ACCURACY","BIAS"),
SaveObj = TRUE,
rescal.all.models = FALSE,
do.full.models = FALSE,
modeling.id = "Fulanus")
# Get evaluations
FulanusModelEval <- get_evaluations(FulanusModelOut)
# Print ROC scores of all selected models
FulanusModelEval["ROC","Testing.data",,,]
## RUN1 RUN2 RUN3 RUN4 RUN5 RUN6 RUN7 RUN8 RUN9
## GBM 0.853 0.730 0.866 0.929 0.610 0.809 0.843 0.804 0.925
## CTA 0.843 0.664 0.769 0.627 0.522 0.585 0.677 0.732 0.811
## RF 0.845 0.759 0.828 0.914 0.672 0.748 0.825 0.824 0.913
## GLM NA 0.506 0.666 0.691 0.670 0.517 0.631 0.556 0.632
## GAM 0.527 0.662 0.647 0.701 0.270 0.490 0.574 0.520 0.642
## ANN 0.589 0.647 0.806 0.724 0.557 0.785 0.785 0.587 0.596
## SRE 0.618 0.600 0.592 0.629 0.634 0.614 0.712 0.598 0.788
## FDA 0.778 0.571 0.843 0.771 0.425 0.585 0.742 0.724 0.821
## MARS 0.914 0.762 0.753 0.820 0.383 0.551 0.677 0.787 0.949
## MAXENT.Tsuruoka NA NA NA NA NA NA NA NA NA
## RUN10
## GBM 0.800
## CTA 0.592
## RF 0.766
## GLM 0.647
## GAM 0.448
## ANN 0.605
## SRE 0.656
## FDA 0.619
## MARS 0.702
## MAXENT.Tsuruoka NA
# Get summaries (mean) of model evaluation: one model (MAXENT.Tsuruoka), one method (TSS)
dimnames(FulanusModelEval)
## [[1]]
## [1] "KAPPA" "TSS" "ROC" "ACCURACY" "BIAS"
##
## [[2]]
## [1] "Testing.data" "Cutoff" "Sensitivity" "Specificity"
##
## [[3]]
## [1] "GBM" "CTA" "RF"
## [4] "GLM" "GAM" "ANN"
## [7] "SRE" "FDA" "MARS"
## [10] "MAXENT.Tsuruoka"
##
## [[4]]
## [1] "RUN1" "RUN2" "RUN3" "RUN4" "RUN5" "RUN6" "RUN7" "RUN8"
## [9] "RUN9" "RUN10"
##
## [[5]]
## Fulanus_AllData
## "AllData"
MAXENT.Tsuruoka_Eval <- FulanusModelEval["TSS","Testing.data","MAXENT.Tsuruoka",,]
mean(MAXENT.Tsuruoka_Eval)
## [1] NA
# Get summaries (mean) of model evaluation: one model (GLM), all methods
mean(FulanusModelEval["KAPPA","Testing.data","GLM",,])
## [1] NA
mean(FulanusModelEval["TSS","Testing.data","GLM",,])
## [1] NA
mean(FulanusModelEval["ROC","Testing.data","GLM",,])
## [1] NA
mean(FulanusModelEval["ACCURACY","Testing.data","GLM",,])
## [1] NA
mean(FulanusModelEval["BIAS","Testing.data","GLM",,])
## [1] NA
# Get summaries (mean) of model evaluation: machine-learning models, all methods
sdm.models <- c("GBM","CTA","RF") #3 models
eval.methods <- c("KAPPA","TSS","ROC","ACCURACY","BIAS") #5 evaluation methods
means.i <- numeric(0)
means.j <- numeric(5)
for (i in 1:3) {
for (j in 1:5) {
means.j[j] <- mean(FulanusModelEval[paste(eval.methods[j]),"Testing.data", paste(sdm.models[i]),,])
}
means.i <- c(means.i, means.j)
}
summary.eval.equal <- data.frame(rep(sdm.models,each = 5), rep(eval.methods,3), means.i)
names(summary.eval.equal) <- c("Model", "Method", "Mean")
summary.eval.equal
## Model Method Mean
## 1 GBM KAPPA 0.5724
## 2 GBM TSS 0.5808
## 3 GBM ROC 0.8169
## 4 GBM ACCURACY 0.8240
## 5 GBM BIAS 0.9876
## 6 CTA KAPPA 0.3679
## 7 CTA TSS 0.3553
## 8 CTA ROC 0.6822
## 9 CTA ACCURACY 0.7587
## 10 CTA BIAS 0.8563
## 11 RF KAPPA 0.5273
## 12 RF TSS 0.5402
## 13 RF ROC 0.8094
## 14 RF ACCURACY 0.8065
## 15 RF BIAS 0.9907
xtabs(summary.eval.equal$Mean ~ summary.eval.equal$Model + summary.eval.equal$Method) #GBM with best performance
## summary.eval.equal$Method
## summary.eval.equal$Model ACCURACY BIAS KAPPA ROC TSS
## CTA 0.7587 0.8563 0.3679 0.6822 0.3553
## GBM 0.8240 0.9876 0.5724 0.8169 0.5808
## RF 0.8065 0.9907 0.5273 0.8094 0.5402
# Get summaries (mean) of model evaluation: regression models, all methods
sdm.models <- c("GLM","GAM","ANN","SRE","FDA","MARS","MARS","MAXENT.Tsuruoka") #8 models
eval.methods <- c("KAPPA","TSS","ROC","ACCURACY","BIAS") #5 evaluation methods
means.i <- numeric(0)
means.j <- numeric(5)
for (i in 1:8) {
for (j in 1:5) {
means.j[j] <- mean(FulanusModelEval[paste(eval.methods[j]),"Testing.data",paste(sdm.models[i]),,], na.rm = T)
}
means.i <- c(means.i, means.j)
}
summary.eval.10000 <- data.frame(rep(sdm.models,each = 5), rep(eval.methods,8), means.i)
names(summary.eval.10000) <- c("Model", "Method", "Mean")
summary.eval.10000
## Model Method Mean
## 1 GLM KAPPA 0.2846667
## 2 GLM TSS 0.3133333
## 3 GLM ROC 0.6128889
## 4 GLM ACCURACY 0.7268889
## 5 GLM BIAS 0.9931111
## 6 GAM KAPPA 0.3288000
## 7 GAM TSS 0.3113000
## 8 GAM ROC 0.5481000
## 9 GAM ACCURACY 0.7454000
## 10 GAM BIAS 0.9907000
## 11 ANN KAPPA 0.3763000
## 12 ANN TSS 0.3795000
## 13 ANN ROC 0.6681000
## 14 ANN ACCURACY 0.7696000
## 15 ANN BIAS 0.9907000
## 16 SRE KAPPA 0.2656000
## 17 SRE TSS 0.2884000
## 18 SRE ROC 0.6441000
## 19 SRE ACCURACY 0.6586000
## 20 SRE BIAS 0.8219000
## 21 FDA KAPPA 0.4473000
## 22 FDA TSS 0.4366000
## 23 FDA ROC 0.6879000
## 24 FDA ACCURACY 0.7935000
## 25 FDA BIAS 0.9938000
## 26 MARS KAPPA 0.5225000
## 27 MARS TSS 0.4974000
## 28 MARS ROC 0.7298000
## 29 MARS ACCURACY 0.8194000
## 30 MARS BIAS 0.9938000
## 31 MARS KAPPA 0.5225000
## 32 MARS TSS 0.4974000
## 33 MARS ROC 0.7298000
## 34 MARS ACCURACY 0.8194000
## 35 MARS BIAS 0.9938000
## 36 MAXENT.Tsuruoka KAPPA NaN
## 37 MAXENT.Tsuruoka TSS NaN
## 38 MAXENT.Tsuruoka ROC NaN
## 39 MAXENT.Tsuruoka ACCURACY NaN
## 40 MAXENT.Tsuruoka BIAS NaN
xtabs(summary.eval.10000$Mean ~ summary.eval.10000$Model + summary.eval.10000$Method) #MARS with best performance
## summary.eval.10000$Method
## summary.eval.10000$Model ACCURACY BIAS KAPPA ROC TSS
## ANN 0.7696000 0.9907000 0.3763000 0.6681000 0.3795000
## FDA 0.7935000 0.9938000 0.4473000 0.6879000 0.4366000
## GAM 0.7454000 0.9907000 0.3288000 0.5481000 0.3113000
## GLM 0.7268889 0.9931111 0.2846667 0.6128889 0.3133333
## MARS 1.6388000 1.9876000 1.0450000 1.4596000 0.9948000
## MAXENT.Tsuruoka 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## SRE 0.6586000 0.8219000 0.2656000 0.6441000 0.2884000
clean_points
fit_curves
get_rasters
pseudoabsences
qlist
transform_rasters
daylengthFUN
PETFUN
AETFUN
sradFUN
rhFUN