Datasets

Having good access to data is important. At various times I have collected, cleaned, or acquired data that I found helpful, useful, or rare. I share these here.

Eurovision Song Contest Lyrics

Although this certainly exists out there, I have a plaintext repository of the lyrics from the Eurovision Song Contest at https://github.com/davidlowryduda/EurovisionLyrics. In fact, this step one of a project with my wife, and I left a bit of html because I was feeling lazy (it’s easy to fix, and the hard regex was done already).  If at any point this presents a problem to you, let me know and I can clean it up.

Political Terror Scale

From the Political Terror Scale, there are the PTS trends (PTS 2008 trends 10-09b, as of 2008). This is primarily concerned with human rights, and uses data from Amnesty International and the US State Department. The great bit about PTS is that it transforms much of the raw data into a comparable format. Unfortunately, there is a bit of subjectivity in the system and almost all the data is simply ordinal.

However, I did once cowrite a paper (Why Democracies Repress) using this data, for an empirical methods project back in the day. This is another thing that I’ll follow up on and write a paper with a more complete and better method, sometime (don’t read it too close as it – it was a bit exploratory).

International Crisis Behavior

In a similar vein, there is much information on the International Crisis Behavior site, and in particular there is a large set of data available. The idea is that many interactions between different countries have been broken up and categorized. I used this data as well to write the paper linked above.

Mungoagoa Water Analysis

I came across this when a friend from the Georgia Tech chapter of Engineers Without Borders asked for a little statistical help. As far as I know, this is the only public copy of this data.

A group of people went door to door and analyzed some of the hygienic practices of local people in the village of Mungoagoa in 2009. They used a questionaire (Hygiene Questionaire) to collate there data. The full record is available here (MUNGOAGOA SURVEY). If interested, I have a copy of this data in SPSS format, but organized in more convenient ways.

Please let me know if there is anything you think should make it here to this list.

Leave a Reply