Project Link has been updated; the Excel file with the data updated through to June 2021 is available here. I skipped the 2020 update for Project Link for a couple of reasons. There was obviously the distraction of the pandemic, but mainly because I hadn't yet finished the next extension. Every year, I try to extend the data base, and the latest extension took more time than I had originally expected. This year's extension is the data set behind this animation I posted on twitter a while ago; the details are below the fold. The Statistics Canada data tables for the Canadian income distribution start in 1976 (see, for example, here and here). That starting date is problematic. The more I look at postwar data, the more prominently the 1970s stands out as a pivotal decade. How
Stephen Gordon considers the following as important: Canadian economy, Inequality, Stephen Gordon
This could be interesting, too:
Eric Crampton writes Credit Suisse wealth inequality data
Eric Crampton writes Afternoon roundup
Stephen Gordon writes Movements in income inequality in Canada, 1944-2010
Stephen Gordon writes What happened to real incomes in the 1970s?
I skipped the 2020 update for Project Link for a couple of reasons. There was obviously the distraction of the pandemic, but mainly because I hadn't yet finished the next extension. Every year, I try to extend the data base, and the latest extension took more time than I had originally expected.
This year's extension is the data set behind this animation I posted on twitter a while ago; the details are below the fold.
The Statistics Canada data tables for the Canadian income distribution start in 1976 (see, for example, here and here). That starting date is problematic. The more I look at postwar data, the more prominently the 1970s stands out as a pivotal decade. How could I push those data back?
The basic methodology behind Project Link fails me here. What I've been doing is searching out old Statistics Canada data and stitching them onto the data that are maintained in the current data tables. But the income distribution statistics in the tables are based on survey data, and the current source is the Canadian Income Survey. Statistics Canada has been publishing survey data for the income distribution since at least 1951, but I don't see how those data could be spliced onto the current series:
- The tables only provide the distribution of incomes for a small number of income groups, typically fewer than twenty. That's too coarse for what I would want to do.
- There are no data for income shares.
- I have no idea how the numbers for family income could be stitched together, and then there's the question about how to track the changes in family composition over time.
Tax data are a more promising source for my purposes. The Canada Revenue Agency and its predecessors have been publishing data from personal income tax files since 1920, and Statistics Canada maintains tables with statistics derived from the Longitudinal Administrative Databank (LAD) from 1982
The tables in Taxation Statistics and its successor publications let me deal with the problems listed above.
- There are more than 50 income groups reported in the tables of Taxation Statistics and its successor publications for most of the sample, although the number of groups drops dramatically in 1992 (this is why the distributions plotted in that animated gif look so different over 1992-2010).
- The tax data do let me calculate income shares.
- As noted by Adolf Buse, tax files are for individuals, so I can side-step the issue of evolving family composition. On the other hand, this also means that the data can't be used to say much about the evolution of economic welfare.
The next issue was whether I should use all tax files, or just the taxable ones. Adolf Buse used all tax files, arguing that since the same level of income could be made taxable or non-taxable over time as the tax code evolved, this made for a more stable population base. But his study was written before governments got into the habit of using personal income tax files as a way of distributing transfers, especially with the arrival of the GST and the GST rebate. Many Canadians who have no or negligible income now file tax returns in order to obtain refundable tax credits. Indeed, Buse had to dummy out one of his observations because the introduction of a new tax credit led to a surge of new files with low reported incomes.
The data set starts in 1944, mainly because that was when the Department of National Revenue started publishing data based on the year in which income taxes were assessed, instead of when income taxes were paid. The data end in 2010, because that's when the CRA stopped breaking out the data for taxable files. I've asked them if they had the numbers already prepared somewhere, and was told that they didn't. They also said that any attempt to retrieve them would be prohibitively expensive. I think it's possible to get numbers for just the taxable files from the LAD and to link them to these estimates, but that is going to have to be a job for someone else. (If you're used to working with the LAD and have some spare time, please let me know!)
For each year and for each income group, this excel file provides
- The lower bound of the income group
- The upper bound of the income group
- The number of tax files
- The total income declared
Total income taxes paid are also given for the tax rental era 1944-1953. All data are in current dollars. These numbers were all entered by me, so any errors are obviously mine. And while I can't promise that there are no errors, I was careful to cross-check the various totals and subtotals against those reported in the data tables.
I've also prepared some statistics based on these data. The accumulated shares of the number of files and for total income for the income groups give points on the respective cumulative distribution functions; these points were linked by cubic splines to construct estimates for the continuous CDFs. (Occasionally some income groups were merged in order to ensure that the estimates CDFs are always monotonically increasing). Given these estimates for the distribution function, you can obtain
- estimates for percentiles
- estimates for the density of the income distribution (both the levels and the the derivatives of cubic splines are continuous at the thresholds)
These estimates are converted to constant 2015 dollars using the CPI.
There are also estimates for Gini coefficients, obtained from the piecewise-linear Lorenz curves.
All of these estimates are available in this excel file.
I've witten two companion posts that take a first pass at these data: