Data
We compile agricultural research data into groups with similar variables. The table below shows the current groups and the number of original datasets and records in each group. We also show these numbers for the datasets that have a Creative Commons (CC) license. As of 21 November 2024, we have processed 2172 original data sets containing a total of 1,568,759 records.
Group | Datasets | Records | CC-Datasets | CC-Records |
---|---|---|---|---|
agronomy | 198 | 186892 | 156 | 124466 |
pest_disease | 9 | 3303 | 7 | 2671 |
soil_samples | 13 | 17986 | 10 | 13009 |
survey | 30 | 67259 | 13 | 35987 |
varieties | 31 | 20057 | 29 | 19005 |
varieties_cassava | 1472 | 138686 | 1472 | 138680 |
varieties_cowpea | 76 | 23193 | 76 | 23193 |
varieties_maize | 76 | 81811 | 62 | 73052 |
varieties_potato | 55 | 30714 | 53 | 30381 |
varieties_wheat | 212 | 998858 | 4 | 19234 |
Here is a map with all locations for which we have at least one observation.
From this page you can download the compiled data with a Creative Commons license. You can create the full datasets yourself by following these instructions.
Please note that the data available here are new. They represent our first attempt to standardize widely variable data with lots of data quality issues. The data still contains errors from the original data that remain, and likely also errors that we have introduced.
Our aim is to provide cleaner and better documented datasets sometime in 2025.
The groups make it easier for us to organize our work but it is important to note that they are not mutually exclusive. For example, the first place to look for crop response to fertilizer data would be in the “agronomy” group. However, the “survey”, and “varieties” groups may also contain fertilizer application data. Likewise, the “varieties” data are about comparing crop varieties, but variety names are also reported in the “agronomy” group. This means that you may want to consider using data from multiple groups. The maize and wheat varieties have their own groups because of the large amount of data in these groups, and because they have some unique terms.