Legislative Voting Data


In order to make sense of the datasets, you will need the codebook, which tells you which parties correspond to which party code numbers in which countries, what the code numbers associated with votes and thresholds mean, etc.


Data on recorded votes are available in two formats. The All Variables format includes whatever information I have on each legislator (e.g. name, party, coalition, sex, region, and sometimes more) as well as all information I have on each vote (e.g. threshold for approval, date, vote number, issue area, and sometimes more). The Master-file format includes only legislator ID# and party, plus the vote number and threshold. Master-files are configured to work with the Stata do-files (read below) to produce various indices of voting unity and other statistics of interest. Click here to see a table describing the vote data files, then click on the date of each assembly under either format to download data files in that format.


Read on, and click below to download various files with computer code, written in the Stata statistical software package, that allow users to generate from the raw datasets various indices of voting unity – as well as other statistics – that I use in my research. You will find descriptions and examples of these indices and statistics, as well as discussions of their properties, in various of my papers and articles, and especially in the book, Legislative Voting and Accountability. Below are brief descriptions of what each do-file does. There is also text embedded in each do-file that describes, at each step, what the computer code is doing.

Whichever do-files you use, you will start with a file called “XXmasterYY.dta” where “XX” identifies the country (and, where relevant) chamber, and “YY” identifies the year(s) from which the data are drawn (e.g. mexicomaster19982000.dta). (For the US datasets, YY identifies the number of the congress, rather than the year.)

Master files are set up so that the legislators, identified by ID numbers, are observations (i.e. each deputy to a row in the matrix), and the votes are variables (i.e. each vote to a column in the matrix). Thus, the first two variables in each Master file are the Legislator ID number and the Legislator’s party code number , and the rest of the variables are votes. The first row of the Master file is a code number identifying the threshold necessary to approve the measure at stake in each vote (e.g. simple relative majority, simple absolute majority, 2/3 relative majority, etc.) The interior cells of the matrix identify what each legislator did on each vote (e.g. ‘aye’ ‘nay’ ‘abstain’ ‘not vote’ etc.).

You can use the Master files to run a couple of the main do-files,

  1. UnityRice.do

  2. UnityRice-novotes=nays 1.do

which take a given Master file as an input and calculates a number of indices (e.g. unweighted and weighted Unity and Rice indices, Winning percentages, etc.) that describe the voting behavior of parties. If you have a dataset that identifies legislators according to some characteristic other than party (e.g. coalition, region, gender, race, etc.) you could easily adapt these programs to generate indices by that group characteristic, rather than party.

The difference between UnityRice.do and UnityRice-novotes=nays.do has to do with how the Stata code interprets non-votes by legislators in the subset of cases where the threshold to approve a measure is set as an absolute share of the membership of the legislature, rather than as a share of those legislators casting actual votes. Among the 21 chambers for which I provide data on this site, the only cases that use this kind of ‘absolute majority’ rule for all legislative business are the Russian Duma and the Guatemalan and Nicaraguan Assemblies. In a few other chambers, some (rare) votes, such as those on constitutional amendments, may be taken under such rules, but most are not.

When absolute thresholds are used, one might reasonably wonder how a non-vote should be ‘interpreted’ for the purposes of calculating indicators of voting unity for each group. If the rest of my groups votes ‘aye’ but I do not vote, the effect of my non-vote on the outcome is equivalent to my having voted ‘nay,’ so it is reasonable for the code to interpret this action as a ‘nay’ vote and calculate my group’s voting unity accordingly. On the other hand, there is an expressive, symbolic element to legislative voting beyond the effect of any legislator’s vote on a given outcome. For me to break ranks and vote against my group on the floor of the legislature may well have a different meaning from my not showing up and not voting, even if both actions are regarded as equivalent by whoever is counting the votes. If this is the case, then non-votes should not be counted as ‘nays.’

When relative thresholds are used, as in most legislatures most of the time, how to interpret non-votes is not a big problem. Non-votes have a different effect on outcomes from ‘nay’ votes (and from ‘aye’ votes, of course), and the indices of voting behavior produced by my computer code treats them differently. (More on this in the papers.) But when non-votes are effectively equivalent to nay votes – that is, when absolute thresholds are used – the analyst calculating the indices has to decide which interpretation of non-voting is appropriate. Thus, I provide two versions of the UnityRice.do file, one that treats non-votes taken under absolute threshold rules as ‘nays’ and one that does not. If you use the program, you decide!

In addition to the do-files that produce Unity and Rice indices, you might want to produce what I call Loser indices, as well as various types of what I call Roll and Stuff indices. These are statistics that describe how frequently each group under analysis loses on votes (i.e. suffers outcomes contrary to what the group appeared to prefer, based on how the bulk of its members voted), and the strategic conditions under which it loses. Roll and Stuff distinguish losses on which a group unsuccessfully tried to defend the status quo by voting against a proposal (i.e. got rolled) from losses on which a group unsuccessfully tried to advocate a proposal that failed (i.e. got stuffed). The Loser indices indicate how frequently each group suffered these sorts of setbacks even though, given how all non-group-members votes, the group in question could have won had it voted in a unified manner. Finally, the loser do-files also calculate the average margins by which parties win the votes they win and lose the votes they lose.

But there’s a catch. The do-files are not set up to take files in Master-file format as input. Instead, they draw on what I call bydep.dta and byvote.dta files. A bydep.dta/byvote.dta pair contains all the same information as a Master-file, but they are broken down, and the byvote.dta version is transposed so that the votes are observations, rather than variables. You don’t have to do this yourself, though. I am providing a little program here that takes a Master-file as input and spits out a pair of bydep.dta/byvote.dta files as output. Once you have these in hand, you are set to run the loser do-files.

  1. bydep and byvote.do

  2. newloser-margins.do

Once you have produced XXbydep.dta and XXbyvote.dta files, using bydepandbyvote.do, you can use newloser-margins.doc to generate statistics that describe the conditions and the manner in which various groups lose votes. This do-file generates two results files. The first has Loser indices, including rates of rolls and stuffs, for each group, the average vote margin for wins and losses, and the ratio of average win margin to average loss margin. The second file contains a list of votes on which each party was rolled or stuffed. This allows you to go back through the primary source documentation and examine the substance of specific votes where the group in question suffered a defeat, and the political context in which the event took place.

Next, if you look at the Master-files, you will find that my datasets from the United States and from Uruguay are not disaggregated to the level of individual legislator votes, like the rest. Instead, they are aggregated at the level of party – that is, instead of telling you what each legislator did on each vote, they tell you how many from Party A voted aye, nay, etc. This is just the form in which I found the data. The next do-file, USandUruguayloser.do, file is set up to take as input a dataset that is aggregated in this way and produce from it the same indices as newloser-margins.do does for disaggregated datasets.

  1. usloser.do

  2. uruguayloser.do

Finally, rather than producing output that describes the characteristics of specific parties (e.g. how unified Argentina’s Peronists are, and how unified their Radicals are), you might want to generate statistics that describe how unified Argentine parties in general are. This can be useful if the country (or rather, the legislature) is the unit of cross-national analysis, rather than any sub-group within the legislature. In this case, you could use one of my little merge do-files to create aggregated results from party specific results files:

  1. countrymerge 3.do

  2. countrymergeloser.do

The former aggregates output files produced by UnityRice.do, whereas the latter aggregates output files produced by newloser-margins.do.

That’s about it. The do-files themselves have lots of internal text embedded in them describing what they do, so if it isn’t entirely clear yet, open a few do-files in the Stata do-file editor and read along. The main thing to pay attention to there is to get the directory structure in the do-file correct for your computer, so the do-file can find the Master.dta files and work with them. There are instructions embedded, but how to do this should be more obvious once you’ve looked at the do-files themselves. You’ll need Stata, of course. If you’re at a university (as opposed to just doing this for fun), your university may have a site license. As a computer support staff person or a reference librarian. Have fun! And let me know if you run into problems.