We all have those default vendor datasets.
But everyone has always wanted new datasets, small and large, to play with from time to time.
Here you can find a list of useful and real data to play with.
I’ll keep this list updated when I see new, incredible or just fun datasets.
Got a dataset that you think needs to be listed as well?
Post it in the comments!
New York City OpenData
1300+ recent datasets, formatted for ease of use
makes the wealth of public data generated by various New York City agencies and other City organizations available for public use.
Reddit 1.7 Billion Public Comments
Original reddit post
Torrent of complete archive | Torrent of 54M comment subset
This dataset in Google BigQuery (direct link)
A dataset that is 250GB when compressed, over a terabyte uncompressed. Talk about a lot of data for all your text analysis needs…
Airline on-time performance data
zip files per year
The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. This is a large dataset: there are nearly 120 million records in total, and takes up 1.6 gigabytes of space compressed and 12 gigabytes when uncompressed.
4000+ FOIA datasets all forms (email, excel, …)
is a collaborative news site that brings together journalists, researchers, activists, and regular citizens to request, analyze & share government documents.
Data is requested through Freedom of Information Acts.
New York State
2500+ charts, maps, calendars and of course regular datasets
Not only the city New York, but also the state has opened up it’s data.
Anything ranging from Citi Bike System Data to Subway Entrances to even Bicycle Routes.
World Country information
Site with API links
Get information about countries via a RESTful API