Data

We compile agricultural research data into groups with similar variables. The table below shows the current groups and the number of original datasets and records in each group. We also show these numbers for the datasets that have a Creative Commons (CC) license. As of 13 October 2024, we have processed 2122 original data sets containing a total of 1,541,506 records.


Group Datasets Records CC-Datasets CC-Records
agronomy 195 187893 154 127639
pest_disease 9 3303 7 2671
soil_samples 12 16551 10 13009
survey 29 66840 13 35987
varieties 57 37198 53 36797
varieties_cassava 1471 138605 1471 138599
varieties_cowpea 76 23193 76 23193
varieties_maize 70 76242 61 67932
varieties_wheat 203 991681 4 19234


Here is a map with all locations for which we have at least one observation.



From this page you can download the compiled data with a Creative Commons license. You can create the full datasets yourself by following these instructions.

Please note that the data available here are new. They represent our first attempt to standardize widely variable data with lots of data quality issues. The data still contains errors from the original data that remain, and likely also errors that we have introduced.

Our aim is to provide cleaner and better documented datasets sometime in 2025.




.csv (Comma Separated Values)
.xlsx (Excel)

 

The groups make it easier for us to organize our work but it is important to note that they are not mutually exclusive. For example, the first place to look for crop response to fertilizer data would be in the “agronomy” group. However, the “survey”, and “varieties” groups may also contain fertilizer application data. Likewise, the “varieties” data are about comparing crop varieties, but variety names are also reported in the “agronomy” group. This means that you may want to consider using data from multiple groups. The maize and wheat varieties have their own groups because of the large amount of data in these groups, and because they have some unique terms.