Some researchers, analysts and writers prefer to collect their own data, design surveys and create brillant experiments to collect data. There is much to be learned in taking that road, to be sure. However, sometimes, you just need some datasets to get the ball rolling and try out some research ideas. After discussing this issue of finding datasets with a friend who does economic research, I thought it might be useful to share some of the resources I looked up. Before I get into the list, you may be wondering why datasets matter or why the broader movement that has made this possible – open data – matters. Partly, I view it as an extension of that famous open source programming adage attributed to Eric Raymonds, “Many eyeballs make all bugs shallow.” Many perspectives on a dataset can yield different values and it is much better than the alternative; a researcher creates a dataset for one purpose and then puts it away somewhere where nobody else can ever make use of it. Further, in the case of public authorities, the quality and openness of data can indicate the attitude of that organization toward using data. Without further ado, here is the list of ten places to get datasets:
- Google’s H1N1 Flu Trends This tool tracks how people searched for the illness and breaks it down by geography and other factors. This would be of interest to those in the public health field and others, for example.
- City of San Francisco Data: This is a treasuretrove of data for advocates, researchers, bureaucrats and others. There is data on public transit, housing, crime and more.
- City of Toronto Data: I was excited to blog about the city’s efforts to build this last year and now it is finally up. The breadth of coverage looks broader than San Francisco; you can get data on licensed child care centres, parks and other areas. While transit data is here, there does not appear to be crime or policing data, alas.
- Google Transit Feed: Google has created a standard for the world’s public transit authorities to share their data and the results can be found here. Data is available from: Washington DC, Vancouver, Cleaveland, Perth (Australia) and other places besides.
- Comprehensive Knowledge Archive Network (CKAN): This is a general purpose archive with data on many different topics including Afghanistan election data and data from the British Antarctic Survey.
- Infochimps: Find Any Dataset in the World: (Note: not all data here is free). All kinds of data here including; data on Twitter, population statistics for US states, and 65 datasets about income.
- DBpedia: This project seeks to process the content of Wikipedia into database format where people can run queries and do other things like that.
- Freebase: Offers datasets on a variety of topics including recreational / pop culture subjects
- National Longitudinal Surveys (NLS): Created by the US Bureau of Labor Statistics, this is the place to go to learn about the US workforce.
- European Social Survey: Created by several universities across Europe, the ESS has data on income, population and other typical qualities of interest to researchers
Did I miss some important datasets? If so, please feel free to share your ideas in a comment. I am particularly curious to know what experience people have had with these tools and how easy or difficult they are to work with.
Related posts:
