Cool Sites to Find Datasets for Programmatic SEO

Written by Ian Nuttall | Follow on Twitter

Datasets are the backbone of any programmatic SEO project, but finding high-quality datasets can be a challenge in itself. No matter how efficient the popular search engines are, they can’t always find what you’re looking for.

A while ago, I was searching for a dataset for a project but couldn’t find it anywhere. So, I outsourced everything to a freelance scraper, but a week later, I found the exact dataset on a website.

What a waste of money!

And then I got the idea to find and collect all the well-known as well as hidden sources of datasets that can be useful for programmatic SEO projects. Because the more time you spend on finding the perfect dataset, the better will be the quality of generated pages.

Let’s take a look…

Cool places to find datasets for pSEO

I will list out all the well-known as well as lesser-known sites that are extremely helpful for finding datasets. I have personally used most of these sites for my projects.

  1. Kaggle: Kaggle has hundreds of thousands of datasets on almost all topics and you can filter the data by file types (CSV, JSON, SQLite, and BigQuery), file size, as well as licenses (creative commons, GPL, Open Databases, etc.).

  2. data.world: data.world has more than 100k open datasets available for anyone to download, and the data is available in 20 different categories.

  3. Google Dataset Search: Dataset Search has millions of datasets indexed which you can search for by keywords and filter by several file formats, usage rights, last updated, and price.

  4. Open Data AWS: AWS Open Data doesn’t have a huge library of datasets, but all the available datasets are of extremely high quality. And the datasets are available in several different categories.

  5. Harvard Dataverse: Harvard Dataverse has more than 150k datasets in categories like agriculture, computers, arts, business, chemistry, engineering, math, law, health, environment, etc.

  6. Datahub: Datahub has 1000s of datasets around finance, crypto, population, and several other categories. And datasets are available to download in multiple formats.

  7. The World Bank Open Data: The World Bank Open Data portal contains global development data that you can browse by country or by indicator.

  8. OpenDataSoft: OpenDataSoft portal has tens of thousands of datasets that you can download as well as access via API calls. Not to mention, datasets are available in multiple formats.

  9. Hugging Face: Hugging Face has thousands of datasets available with powerful search features and with filters like languages, filesizes, and licenses. 

  10. GitHub: GitHub is not specifically a dataset repository, but there are tons of high-quality datasets uploaded on the site in CSV and other formats. This tweet will help you extract datasets from GitHub like a pro.

  11. Awesome Public Datasets: Awesome Public Datasets is a GitHub repository that lists 100s of useful datasets in multiple categories. The repo is maintained by the community and is always updated.

  12. r/datasets: r/datasets is a subreddit that describes itself as a place to share, find, and discuss datasets. The community is great and there are multiple quality datasets shared every day by members of the community.

  13. DataSN: DataSN has 100s of paid datasets available in different categories. And since it’s not free, chances are only a few people would have accessed these datasets.

  14. Earth Data: Earth Data is managed and maintained by NASA. It has some real-quality data about the earth taken from satellites that NASA sends to space.

  15. Academic Torrents: Academic Torrents has become very popular for providing an easy way to download/access huge datasets. There are more than 120TB of data on the platform.

  16. Free GIS Data: Free GIS Data has an extensive index of geographic datasets. And everything is categorized by different areas of geography.

  17. Papers With Code: If you’re searching for machine learning datasets, Papers With Code is where you should go. It has several thousand datasets with powerful search and filters.

That’s it.

And if you are looking for useful datasets in different niches, I have created a section where I manually curate useful datasets by niches – visit datasets for programmatic SEO section.

I will keep adding more and more sites to list as I explore. Meanwhile, if you know a site that has some good datasets available, kindly let me know in the comments and I will add it to the list.

Link to this page

I can haz links? If you can, do me a solid and link to this article if you mention it in your blog post or newsletter.