Georeferencing

When coordinates are absent or incorrect for a record, we georeference the location based on the most detailed information available. That is, we use the “site” or “location” variables if these are available. Otherwise we used the centroid of the lowest level administrative subdivsion (e.g. adm3). Location names are often not unique in a country. For example there are more than 100 places called “San Juan” in Mexico or “Rampur” in India. When georeferencing locations, it is therfore very important to consider the administrative subdivsion that they are reported to be in. It is also important to consider the context of the other locations in a data set; locations are typically clustered in a single region; although it is possible, it is unlikely to have a cluster of sites in one part of the country and another site at the other end of the country.

For a background on georeferencing approaches in the biological sciences see this best practises document.

In Carob, we use the “point-radius” georeferencing method. We determine the geographic center of the location, and express our uncertainty as radius. The radius must be large enough so that a circle centered on the geographic center encompasses all places where the observation may have been made. In most cases, this will also circumscribe areas that do not match the locality description.

The detailed recommended protocols for georeferencing using the point-radius method are given in the Georeferencing Quick Reference Guide.

If georeferencing is done based on administrative areas, use the method below. In this example we are georeferencing “Kiteto” and “Kongwa”, two districts (adm2) in Tanzania.

## get the coordinates and uncertainty for adm2 boundaries for Tanzania
xy <- carobiner::geo_adm("Tanzania", 2)
head(xy)
##    country   adm1         adm2 longitude latitude geo_undertainty
## 1 Tanzania Arusha       Arusha   36.8004  -3.5239           34057
## 2 Tanzania Arusha Arusha Urban   36.6761  -3.4502           13074
## 3 Tanzania Arusha       Karatu   35.5683  -3.4902           82726
## 4 Tanzania Arusha   Lake Eyasi   35.1251  -3.5890           37545
## 5 Tanzania Arusha Lake Manyara   35.8294  -3.5240           11415
## 6 Tanzania Arusha      Longido   36.4341  -2.6543           86711

## subset to the names of interest. In practise, the names may not perfectly match, so this may take some more effort
s <- xy[xy$adm2 %in% c("Kiteto", "Kongwa"), ]
s
##     country    adm1   adm2 longitude latitude geo_undertainty
## 18 Tanzania  Dodoma Kongwa   36.6104  -5.9767           56168
## 78 Tanzania Manyara Kiteto   36.8266  -5.1963           84918

## use dput (here leaveing out the country name)
s$country <- NULL
dput(s)
## structure(list(adm1 = c("Dodoma", "Manyara"), adm2 = c("Kongwa", 
## "Kiteto"), longitude = c(36.6104, 36.8266), latitude = c(-5.9767, 
## -5.1963), geo_undertainty = c(56168, 84918)), row.names = c(18L, 
## 78L), class = "data.frame")


## to create code for the script like this
geo <- data.frame(
    adm1 = c("Dodoma", "Manyara"), 
    adm2 = c("Kongwa", "Kiteto"), 
    longitude = c(36.6104, 36.8266), 
    latitude = c(-5.9767,  -5.1963), 
    geo_undertainty = c(56168, 84918)
)

Check the georeferences with terra::plet (especially important when combining existing and new georeferences).

v <- terra::vect(geo, crs="lonlat")
## Warning: [vect] guessed geom variables
terra::plet(v, cex=4, col="red")