Georeferencing
When coordinates are absent or incorrect for a record, we georeference the location based on the most detailed information available. That is, we use the “site” or “location” variables if these are available. Otherwise we used the centroid of the lowest level administrative subdivsion (e.g. adm3). Location names are often not unique in a country. For example there are more than 100 places called “San Juan” in Mexico or “Rampur” in India. When georeferencing locations, it is therfore very important to consider the administrative subdivsion that they are reported to be in. It is also important to consider the context of the other locations in a data set; locations are typically clustered in a single region; although it is possible, it is unlikely to have a cluster of sites in one part of the country and another site at the other end of the country.
For a background on georeferencing approaches in the biological sciences see this best practises document.
In Carob, we use the “point-radius” georeferencing method. We determine the geographic center of the location, and express our uncertainty as radius. The radius must be large enough so that a circle centered on the geographic center encompasses all places where the observation may have been made. In most cases, this will also circumscribe areas that do not match the locality description.
The detailed recommended protocols for georeferencing using the point-radius method are given in the Georeferencing Quick Reference Guide.
If georeferencing is done based on administrative areas, use the method below. In this example we are georeferencing “Kiteto” and “Kongwa”, two districts (adm2) in Tanzania.
## get the coordinates and uncertainty for adm2 boundaries for Tanzania
xy <- carobiner::geo_adm("Tanzania", 2)
head(xy)
## country adm1 adm2 longitude latitude geo_undertainty
## 1 Tanzania Arusha Arusha 36.8004 -3.5239 34057
## 2 Tanzania Arusha Arusha Urban 36.6761 -3.4502 13074
## 3 Tanzania Arusha Karatu 35.5683 -3.4902 82726
## 4 Tanzania Arusha Lake Eyasi 35.1251 -3.5890 37545
## 5 Tanzania Arusha Lake Manyara 35.8294 -3.5240 11415
## 6 Tanzania Arusha Longido 36.4341 -2.6543 86711
## subset to the names of interest. In practise, the names may not perfectly match, so this may take some more effort
s <- xy[xy$adm2 %in% c("Kiteto", "Kongwa"), ]
s
## country adm1 adm2 longitude latitude geo_undertainty
## 18 Tanzania Dodoma Kongwa 36.6104 -5.9767 56168
## 78 Tanzania Manyara Kiteto 36.8266 -5.1963 84918
## use dput (here leaveing out the country name)
s$country <- NULL
dput(s)
## structure(list(adm1 = c("Dodoma", "Manyara"), adm2 = c("Kongwa",
## "Kiteto"), longitude = c(36.6104, 36.8266), latitude = c(-5.9767,
## -5.1963), geo_undertainty = c(56168, 84918)), row.names = c(18L,
## 78L), class = "data.frame")
## to create code for the script like this
geo <- data.frame(
adm1 = c("Dodoma", "Manyara"),
adm2 = c("Kongwa", "Kiteto"),
longitude = c(36.6104, 36.8266),
latitude = c(-5.9767, -5.1963),
geo_undertainty = c(56168, 84918)
)
Check the georeferences with terra::plet (especially
important when combining existing and new georeferences).
v <- terra::vect(geo, crs="lonlat")
## Warning: [vect] guessed geom variables
terra::plet(v, cex=4, col="red")