Data
From this page you can download the compiled data with a Creative Commons license. You can create the full datasets yourself by following these instructions.
Please note that the data available here are new. They represent our first attempt to standardize widely variable data with lots of data quality issues. The data still contains errors from the original data that remain, and likely also errors that we have introduced.
As of 6 June 2025, we have processed 2181 original data sets containing a total of 1,625,034 records. The map below shows all locations for which we have at least one observation.
For ease of organziation we divide the data into thematic groups. These are not mutually exclusive. For example, the first place to look for crop response to fertilizer data would be in the “agronomy” group. However, the “survey”, and “*_varieties” groups may also contain fertilizer application data. Likewise, the “*_varieties” data are about comparing crop varieties, but variety names are also reported in the “agronomy” group. This means that you may want to consider using data from multiple groups.
The table below shows the current groups and the number of original datasets and records in each group. We also show these numbers for the datasets that have a Creative Commons (CC) license.
Group | Datasets | Records | CC-Datasets | CC-Records |
---|---|---|---|---|
agronomy | 200 | 197921 | 152 | 119374 |
pest_disease | 8 | 3225 | 7 | 2785 |
soil_samples | 13 | 17984 | 10 | 13007 |
survey | 28 | 108742 | 12 | 77469 |
varieties | 38 | 15462 | 36 | 15394 |
varieties_cassava | 1466 | 138686 | 1466 | 138680 |
varieties_cowpea | 76 | 23193 | 76 | 23193 |
varieties_maize | 76 | 81811 | 62 | 73052 |
varieties_potato | 62 | 36602 | 62 | 36602 |
varieties_wheat | 214 | 1001408 | 4 | 19234 |
Below, you can download the compiled standardized data by group. If you want all data, select “everything”. If you want data for a single data set, you can find these here. Please note that for most survey data, we have currently only partially processed the data, and the original data sources may contains many more variables.