About AquaMaps: Creating standardized range maps of marine species

By K. Kaschner, J. Ready, E. Agbayani, P. Eastwood, T. Rees, K. Reyes, J. Rius & R. Froese [Version of May 2007]

See Home page for latest presentations and publications.

Why use AquaMaps to predict large-scale marine species occurrence?

What are AquaMaps?

AquaMaps is an approach to generating model-based, large-scale predictions of currently known natural occurrence of marine species. Models are constructed from estimates of the environmental tolerance of a given species with respect to depth, salinity, temperature, primary productivity, and its association with sea ice or coastal areas. Maps show the color-coded relative likelihood of a species to occur in a global grid of half-degree latitude / longitude cell dimensions, which corresponds to a side length of about 50 km near the equator. Predictions are generated by matching habitat usage of species, termed environmental envelopes, against local environmental conditions to determine the relative suitability of specific geographic areas for a given species. Knowledge of species' distributions within FAO areas or bounding boxes is also used to exclude potentially suitable habitat in which the species is not known to occur.

The modeling approach used by AquaMaps was originally developed by Kristin Kaschner and colleagues to predict global distributions of marine mammals (Kaschner et al. 2006). The approach was based on incorporating expert knowledge into an environmental envelope or ecological niche model. The use of expert knowledge compensated for the effects of species misidentifications, effort biases, and the non-representative coverage of large-scale species' distributions. Such data gaps and problems are widespread in publicly available occurrence data sets that are compiled from different sources.

Theapproach developed for marine mammals was subsequently modified in collaboration with FishBase staff to make it more suitable for a greater range of marine organisms and to make use of data and information available in FishBase and OBIS/GBIF online databases. Display of the maps on the web has been facilitated by the use of C-squares Mapper developed by Tony Rees of CSIRO, Australia.

Why use AquaMaps to predict large-scale marine species occurrence?

In recent years, there has been a lot of effort through initiatives such as OBIS or GBIF to compile existing species occurrence records and make them available in a standardized format online. The data is generally displayed in the form of point locations plotted on maps to visualize the geographic extent of species occurrences. The ultimate aim, however, is to infer species distributions from this data to replace the rough, hand-drawn maps that are currently most commonly used to depict known areas of species' presence.

There are a wide range of tools that can be used to predict species distributions based on occurrence records, ranging from simple environmental envelope models that only require information about where a species has been reported (presences) to more complex models that need both information about a species presence as well as where it has been reported to be absent (Guisan and Zimmermann 2000). In general, and not surprisingly, it has been shown that performance of more sophisticated models is superior to that of simpler models when models were applied and tested using data from dedicated sampling schemes on regional scales (Elith, Graham et al. 2006). However, the quality of predictions generated by any model is largely dependent on the quality of available input data and simpler models do not necessarily perform worse than more complex models if the quality of input data is poor (Moisen and Frescino 2002).

Biases of available large-scale marine data sets

Few if any of the available online occurrence data sets of probably meet some of the basic assumptions of most habitat prediction models, such as a representative sampling coverage of all potentially available habitat. While this does not necessarily mean that the total geographic range of a species needs to be sampled, sampling needs to cover the total range in habitat usage of a species comprehensively. Unfortunately, in the marine environment sampling effort is often heavily concentrated in the continental shelf and slope waters of the temperate northern hemisphere, a feature which is strongly reflected in currently available GBIF/OBIS data sets. In addition to this sampling bias, there is a data provision bias since different national or academic research institutions vary in their ability and efforts to make occurrence data accessible online. The compiled data sets themselves stem from a variety of different sources, including dedicated surveys as well as museum records or opportunistic sampling efforts, all associated with their own sets of biases. Any attempt to investigate species response curves to environmental gradients or to predict occurrence of marine species based exclusively on the limited, fragmented and heterogeneous current sampling coverage is thus likely to produce skewed results.

In addition to biases related to sampling effort, the actual reliability of the species identifications represents another problem. Data often have been collected over the course of more than a century during which scientific names of species may have changed once or even repeatedly based on more recent taxonomic information and classifications. In addition, occurrence records of specific species are often collected opportunistically during dedicated surveys which may focus on sometimes entirely different taxa. The lack of time or specific available expertise for non-target species encountered during such surveys greatly increases the risk of species misclassification. While simple quality control tools have been implemented in most online data repositories to filter out misidentified species records, the number of falsely allocated records remains quite high and are difficult to correct for.

The AquaMaps solution: incorporation of non-point data about habitat usage

The AquaMaps approach was developed specifically to deal with the problems encountered when attempting to map large-scale species distributions based on existing but fragmented and potentially non-representative occurrence data. The basic and novel idea behind the AquaMaps concepts is to supplement occurrence records with independent knowledge about species distributions and habitat usage to correct for existing biases. For instance, knowledge of the geographic extents of commercial species available from FAO can be used to define latitudinal and longitudinal bounding boxes to delimit predictions to areas known to be utilized. Area restrictions can also serve as a quality control mechanism, as they filter out outliers in occurrence records that may present misidentifications. Published depth ranges can also be used to better define associations between species distributions and depth, as these are generally based on more rigorous analyses and representative data sets, thus reducing the impacts of sampling biases. Nowadays, information about habitat usage is often stored in online species databases such as FishBase, making it easy to access and use as supplementary data for model building. Additional habitat usage information can be obtained through the contribution of experts participating in a detailed review of input parameter settings. To our knowledge, AquaMaps is the only species distribution modelling approach that combines numerical algorithms with expert knowledge. In doing so, the Aquamaps modelling approach relies less on complexity and more on transparency to improve ease of interpretation by ecologists and experts in particular species and taxa. The strength of AquaMaps compared to other species distribution modelling algorithms thus lies in its transparency and also its ability to incorporate expert knowledge and general information on species habitat usage and occurrence. This type of information represents a currently underutilized resource that can help to compensate for some of the known problems associated with the available patchy and sub-optimal occurrence data sets.

How does AquaMaps work?

Like other ecological niche models, AquaMaps predicts the relative occurrence of a species in geographic space by investigating the relationship between known species' presence and selected environmental parameters in ecological space. As previously mentioned, the main difference to other presence-only modeling approaches is that AquaMaps supplements the existing occurrence records with additional information to compensate for biases or gaps in the a vailable point locality data. In addition, the expert review process, which forms an integral part of the AquaMaps approach, has been implemented to further reduce the effects of such biases by allowing experts to modify envelope settings and maximum range extents.

Final AquaMaps predicted species' distributions are thus the result of a two stage process. During the first stage, referred to as the non-expert mode, predictions are automatically generated using available species occurrence records supplemented by additional geographic information to allow a better distinction between suitable habitat and realized distributions. The second stage involves a manual review of all input parameter settings by an expert to identify and correct for further biases caused by non-representative survey coverage and species misidentifications etc.

AquaMaps non-expert mode:

To generate the initial, non-expert reviewed map, AquaMaps requires information about the general range of the species, some point locality records and basic information about the species' depth habitat usage.

Input parameters

Species habitat usage and maximum range extents

Information about species occurrence and habitat usage are obtained from two different and largely independent sources:

1. published information about species' maximum range extents and depth usage stored in online species databases such as FishBase

2. georeferenced species' occurrence records available from OBIS or GBIF

Published information about known maximum range extents of species described by FAO areas and bounding boxes serves as an independent verification of the validity of occurrence records and can be used to select only "good" presence records for subsequent model building; a minimum of ten good presence cells is a pre-requisite for the AquaMaps algorithm. Distributions of species with very few records therefore cannot be predicted. Habitat usage of species with respect to individual environmental parameters (except depth) is then directly derived from good occurrence records.

Environmental predictors

As a default, AquaMaps uses six basic environmental parameters as predictors, two of which change little over time (bathymetry and distance to land), while the remainder vary seasonally and interannually. Long-term averages were used for all temporally varying parameters. All six environmental parameters are represented at half-degree cell resolution. Predictors include:

1. Bottom depth

2. Temperature

a. Sea surface temperature (SST) for all pelagic species

b. Bottom temperature all non-pelagic species

3. Salinity

a. Sea surface salinity for all pelagic species

b. Bottom salinity for all non-pelagic species

4. Primary production

5. Sea ice concentration

6. Distance to land

By default, AquaMaps generates predictions using the top five parameters, with the exception of marine mammals for which only bottom depth, sea surface temperature and sea ice concentration are used following the initial approach developed by Kaschner et al. (2006). Distance to land may be included as a restrictive buffer feature for a few species with specific life history traits that limit their occurrence in the open ocean, such as some pinniped species that need to return regularly to haul-out sites between foraging trips species that are, for instance, central place foragers, such as some pinniped species (see also Kaschner et al. 2006).

In general, users and experts can also create maps based on any subset of the environmental parameters listed above.

The environmental variables represent most of the key physical factors structuring the habitat of many demersal and pelagic species at larger scales. Others, such as seabed sediments data, might also be essential habitat parameters that would be ideally included for specific taxa, but at present global coverage data for these more specific parameters is rarely available. It should be stressed that most of the environmental parameters may only determine a species' occurrence indirectly by acting as proxies for other factors, such as food availability, predation risks and competitive interactions. However, such biological interactions are difficult to measure directly and data sets are currently unavailable at global scales.

Envelopes

Habitat usage of species with respect to each predictor is described by an environmental envelope. The environmental envelopes effectively represent species response curves in relation to available habitat and corresponds to the single parameter climatic envelope that form the basis of other presence-only models such as BIOCLIM (Guisan and Zimmermann 2000). The AquaMaps approach assumes a trapezoidal shape for each response curve. The defined shape means that the relative likelihood of a species' presence is assumed to be uniformly highest in environmental conditions that fall within this species' preferred parameter range (Min_P to Max_P in link to Fig. 1). If the mean environmental conditions lie beyond this range, this likelihood decreases linearly towards the minimum or maximum thresholds for a species (Min_A or Max_A) and is set to zero outside the absolute minimum or maximum values. Most environmental parameters (except depth) change only very gradually in space. Therefore, the average parameter value measured in each cell is considered an adequate representation of environmental conditions encountered by a species in this cell. In the context of AquaMaps, absolute and preferred minima and maxima for environmental parameters are therefore computed from the mean environmental attributes of "good" occurrence cells using the following rules:

1. Min_P: 10^th percentile of the observed variation in an environmental attribute

2. Max_P: 90^th percentile of the observed variation in an environmental attribute

3. Min_A: 25^th percentile - 1.5 * Interquartile or absolute minimum observed in data (whichever is greater)

4. Max_A: 75^th percentile + 1.5 * Interquartile or absolute maximum observed in data (whichever is greater)

The derivation of depth envelopes differs slightly from the approach described above, since minimum and maximum ranges are obtained from independent published information about overall depth usage while the preferred range is assumed to correspond to the values that stored as "common depth" in species databases such as FishBase. Since bathymetry can vary quite radically even within the relatively small area covered by a single grid cell, the maximum and minimum depths reported for each cell, rather than the mean depth, are more likely to determine the occurrence of a species in a given cell. Consequently, absolute and preferred minima and maxima of depth envelopes (i.e. Min_P, Max_P, Min_A and Max_A) are considered in relation to minimum and maximum bathymetry cell values, rather than average depths.

There are some additional rules that ensure a minimum width of the preferred and absolute ranges (i.e. distance between Min_PandMax_Pand Min_AandMax_A, respectively), which prevent the use of nonsensical values or implement some further basic biological concepts. Based on such concepts, the right triangular side of the depth envelope, for instance, is raised up (resulting in a uniformly high likelihood in all depths beyond the preferred minimum), if a species is known to be pelagic and is therefore unlikely to be affected bottom depths in the open ocean. Along the same lines, bottom temperature and salinity values- rather than surface measurements - form the basis for the corresponding environmental envelopes for all species flagged as non-pelagic.

Algorithm

The AquaMaps model generates an index of species-specific relative likelihood of presence for each individual grid cell by scoring how well its environmental attributes match what is known about a species' habitat usage.

Relative likelihood values range between 0.00 - 1.00 and represent the product of the relative likelihood scores assigned for the individual environmental attributes, which are in turn calculated based on the environmental envelopes described above. A multiplicative approach is used to allow each predictor to serve as an effective "knock-out" criterion (i.e., if a cell's temperature exceeded the maximum of a species' temperature tolerance, the overall likelihood should be zero, even if bathymetry and other cell attributes were within the species preferred or overall habitat range).

The resulting predictions about the relative likelihood of species' occurrence are then mapped using the C-squares Mapper. These range maps are available for viewing through the AquaMaps pages, but are also included as the default in the species summary pages of contributing online species databases such as FishBase. Non-expert mode AquaMaps predictions are marked as "(un-reviewed)" in the title of any displayed map.

AquaMaps expert mode:

The standard computer-generated maps can, and should, be reviewed by experts, who are familiar with the known distribution of a given species to correct errors and improve upon the default predictions. The incorporation of other sources of information about habitat usage, such as expert-knowledge, is an integral part of the AquaMaps modeling framework. AquaMaps uses these types of data to compensate for the sub-optimal quality and often non-representative coverage of large-scale occurrence data sets that can create prediction biases when modeling entire species' ranges. Using the "expert-review" link, experts can improve the quality of maps by changing input parameter settings in a number of different ways, but are asked to provide a source of information upon which the decision to change settings was made.

Area modifications & exclusion zones

Experts can make straight-forward changes to the default known areas of occurrence (i.e. FAO areas and bounding boxes). In addition, in the future versions of AquaMaps, experts will be able to select pre-defined areas from a pull-down menu that species should be excluded from. Such exclusion areas include ocean basins such as the "northwestern Pacific" or specific LMEs (Large Marine Ecosystems). Please note that all area changes affect predictions of species occurrence on two separate levels. Firstly, the changes will affect the selection of "good cells" and/or "good occurrence records" that form the basis for the computation of envelope minima and maxima. Secondly, these changes will obviously modify the extent of the geographic map of the species distribution.

Envelope modifications

Envelopes describing a species' tolerated and preferred range of occurrence with respect to a specific environmental parameter can also be adjusted during the expert review process. Experts can modify envelope settings in three different ways, described in more detail below

Addition of point records in areas of known species presence

The occurrence records which are compiled and made available through GBIF/OBIS obviously represent only a subset of all locations at which a species may have been reported to occur or where an expert has seen it or knows it to be present. AquaMaps therefore allows experts/users to manually add such records not currently included in the GBIF/OBIS data sets. Added records will then be included in the envelope calculation procedure where they result in a modification or adjustment of one, several or all environmental envelopes.

Modification of minima and maxima of individual environmental parameters

In addition or alternatively to adding occurrence records, individual envelopes may be adjusted directly by changing the values for absolute and preferred minimum and maximum ranges.

Effort-correction routine

As a third option, experts can adjust envelopes using a novel algorithm to correct for effort biases in the occurrence data sets using a community model approach. The approach was developed by Kaschner et al. (2006) and has subsequently been modified for application within the AquaMaps framework. The underlying concept is the use of proportional encounter rates rather than presence cells as the basis for envelope calculations. Normally, as outlined above, high frequencies of reported occurrences from a specific locality are difficult to interpret in the context of presence-only models. Without associated effort information it is impossible to determine whether high occurrence rates represent higher species densities due to particularly favorable local environmental conditions or a simple concentration of sampling effort. Effort information, however, is rarely available or standardized enough to be provided along with GBIF/OBIS data or opportunistic occurrence data sets. Presence-only models attempt to correct for heterogeneous sampling effort by relying on presence cells as the sampling unit, i.e. a species is considered to be present in each cell it was reported in, independent of the number of occurrence records falling into that cell. While the use of presence cells does reduce impacts of effort-related biases, important information about the relative usage or preference of a given area by a given species is lost in the process.

The AquaMaps routine that experts can use to adjust envelope settings, maintains information of relative usage while correcting for sampling effort. This is achieved using the total number of occurrence records of all closely-related species as index of sampling effort. The incorporation of such ecological community data has been shown to improve predictions for individual species compared to models relying on single-species data sets (Elith, Graham et al. 2006). Here, the underlying assumption is that taxonomically related and co-occurring species will have had similar probabilities to be sampled during a specific expedition or survey. Therefore, the relative occurrence of the species of interest in comparison to other, similar, species in a given cell can be expressed as a proportional encounter rate (i.e. proportion of given species records out of total reported occurrences of all closely related species in a cell). This proportional encounter rate ranges from 0 - 100 % and thus allows a distinction between high and low habitat usage, yet it is independent of the absolute sampling effort. During the effort-correcting routine the computed proportional encounter rates serve as weighting factors in the calculation of environmental envelopes, increasing the relative importance of cells associated with high encounter rates by using a corresponding number of multiple copies of that cell (Compare Table 1 and Table 2 for illustration).

For the more common species, the resulting effort-corrected environmental envelopes should correspond more closely to actual habitat usage of a species and therefore represent a good starting point for further expert review.

How "good" are AquaMaps?

The degree to which a simple, generic model such as AquaMaps can adequately capture important aspects of a given species' distribution naturally varies between different species. For some species the fit will be quite good, if available occurrence records approximate a representative coverage of the species' used habitat and the environmental predictors indeed represent the key factors determining the presence or absence of a species. For others, however, AquaMaps predictions will be sub-optimal representation of the species' occurrence, even after incorporation of improvements suggested by experts. Low correspondence between modeled distributions and known species presence may be caused by small sample sizes or highly biased occurrence record sets. Similarly, large discrepancies between predicted and observed occurrence may be due to the lack of consideration of more complex relationships between species' presence and environmental or biological factors. In addition to these species-specific and data-related aspects, there are also some limitations that are inherent in the modeling approach itself.

Model-inherent limitations

When viewing AquaMaps, the most important point to keep in mind is that these maps represent species' range maps. Predictions thus document the large-scale and long-term presence of a species and cannot be assumed to correctly represent the local occurrence of a species on a specific day of a specific year. Spatial resolution of predictions is limited by the cell dimensions of the global grid of half degree cells. Similarly, AquaMaps represent mean annual distributions of species and do not account for changes in species occurrence due to migration or unusual environmental events such as El Ninos.

Even though AquaMaps strives to capture the actual range of occurrence and relative likelihood of presence for each species as closely as possible, predictions in many cases may still more closely approximate a species' fundamental niche than its realized niche. This can be caused by the lack of incorporation of environmental parameters that determine species' occurrence on more local scales. In addition, AquaMaps and most other habitat prediction models cannot capture the effects of biological interactions such as inter- or intraspecific competition, predation or symbioses that very much affect the actual presence or absence of species on smaller scales.

Model evaluation

Unfortunately, none of the biases and problems outlined above will likely be detected by means of standard cross-validation techniques that are often used to evaluate the fit of habitat predictions. Cross-validation techniques simply split data into training and test sets or use re-sampling techniques - however, both training and test data will be affected by the same biases. The need to test performance of habitat suitability models using independent data sets is therefore frequently stressed (e.g. Elith, Graham et al. 2006). For models predicting entire distributions of species at very large or even global scales, such independent and reliable test data sets are difficult to find.

Qualitative comparison with published range maps and information about species distributions

Although quantitative evaluation of AquaMaps predictions is still pending and will likely remain limited to a small number of species, we have implemented a ranking system which indicates the degree to which modeled predictions match what is known about a given species distribution. A single star is assigned to the default computer-generated map. Experts reviewing predictions for a species are asked to rank the final reviewed map based on their knowledge of relative occurrence and maximum range extents by assigning additional stars that will provide an index of the quality of specific AquaMaps predictions.

Quantitative assessment of model performance

Kaschner et al. (2006) showed that - on large scales - predictions generated by an environmental envelope model based entirely on expert knowledge successfully captured a large proportion of the observed variation in effort-corrected and independent marine mammal occurrence data collected during dedicated surveys in different parts of the world. We are planning to implement a similar validation approach to test AquaMaps predictions using independent fisheries survey data for a representative subset of marine species in early 2007. In addition, we are planning to quantitatively compare AquaMaps with predictions generated by other presence-only models.

References

Elith, J., C. H. Graham, et al. (2006). "Novel methods improve prediction of species' distributions from occurrence data." Ecography 29: 129-151.

Guisan, A. and N. Zimmermann (2000). "Predictive habitat distribution models in ecology." Ecological Modelling 135: 147-186.

Kaschner, K., L. B. Christensen, et al. (2006). Mapping top consumers in marine ecosystems past and present: comparative consumption rates of great whales and fisheries (SC/58/E3). International Whaling Commission - Scientific Committee Meeting, (unpublished).

Kaschner, K., R. Watson, et al. (2006). "Mapping worldwide distributions of marine mammals using a Relative Environmental Suitability (RES) model." Marine Ecology Progress Series 316: 285-310.

Moisen, G. G. and T. S. Frescino (2002). "Comparing five modelling techniques for predicting forest characteristics." Ecological Modelling 157(2-3): 209-225.