work

Data FAQ

Have a question about our data? We have an answer.

Have a question about our data? We have an answer.

Data can be messy, so our team makes conscious decisions to ensure our data is as transparent as possible. If you can’t find an answer to your question on this page or in our Technical Report, feel free to reach out. Our small team of volunteers will do our best to get back to you and/or post answers to your frequently asked questions on this page.

Where do you get your data?

All our data are exclusively collected from publicly available sources, including government reports/releases and accredited news media. Sources are included as a reference for each entry in our dataset.

A summary of our go-to sources is available here. We make all efforts to keep this table up-to-date, but these links may change as regions change their websites, create dashboards, etc. We supplement this information with additional case information from news media and other releases, where available.

Why are your daily numbers different than my local health region/provincial/territorial page?

Many provinces/territories/health regions provide an internal report date (i.e., the date a case is recorded within the health region) that may not alwaysalign with the date that the case is publicly announced. For example, many regions do not conduct press conferences and/or do not report updates on weekends, instead this information is provided on the following Monday. In these instances, we report cases on the subsequent Mondays to preserve the aggregate-level health region information provided. Similarly, many regions will remove cases due to data errors and/or add old cases that are back-dated (i.e., retrospectively added to the dataset). To ensure that our dates are consistent across time and between regions, we always aim to use the date of public report.

You may notice these daily differences more on weekends/Mondays, but rest assured our cumulative totals for most provinces (see below for information on Ontario) will be aligned.

Why are your cumulative numbers different than the Ontario provincial page (i.e., Ontario Ministry of Health)?

There are internal lags in cases/mortality as reported by the Ontario Ministry of Health dataset (which pulls from the iPHIS dataset). These lags are due to delays between when cases can be reported by a public health unit (PHU) and when the case is captured within iPHIS, as well as cut-off times in daily reports—i.e., the iPHIS dataset reports cases from up to 4PM of the day before. To better represent the current and evolving outbreak in Ontario, on April 1, 2020 we began collecting case and mortality data directly from Public Health Units (PHUs). As a result, the numbers we report will usually be higher than those from other sources, including the Ontario Ministry of Health. Please see ‘Why is there a large spike in cases on April 1, 2020 in Ontario?’ for further information on how we updated this.

Given our reporting method for Ontario cases, our cumulative total of Canadian cases is also higher than that reported by the Public Health Agency of Canada for the province of Ontario.

Why is there a large spike in cases on April 1 in Ontario?

On April 1, 2020, we moved from using Ministry of Health data to individual PHU data. In order to align our case numbers with each PHU, we back-filled cases for which information on health region was previously not reported. We then assigned all extra cases not yet reported by the Ministry, but that had been reported by PHUs, to April 1. This results in a large, artificial spike of cases reported on April 1, 2020; however, our cumulative numbers remain in concordance with PHUs. This is noted in the additional notes and methods note variable for cases where we have made this change.

Why are there large spikes in cases in March and May in Quebec?

On March 23, 2020 Quebec changed their reporting of cases such that confirmation only required a hospital-based positive test without the need for a secondary confirmatory test from provincial labs. As a result, there is a large spike in case numbers on this day. On May 3, 2020 Quebec added 1,317 missing positive COVID-19 cases from April 2–30, 2020 that had not been reported previously due to a computer error. As the report date for these missing cases was not provided, we added them all on the day they were announced. On May 31, 2020 Quebec added 165 missing deaths from Montreal that had not been reported previously due to a data transmission issue.

Why is there a large spike in recoveries (and subsequently a large drop in active cases) in Quebec in July?

On July 15, 2020 Quebec changed their algorithm for defining a case as recovered, which we have previously detailed. The updated algorithm captures a broader definition, therefore there is a large spike in recoveries (and subsequent large drop in active cases) on this day. Quebec has continued to make further revisions to the definition of a recovery since this date; for the latest information please check their definitions and methodology page.

Why do your health region boundaries not match Statistics Canada?

We report our data at the health region-level in accordance with how each province/territory releases sub-provincial cases/mortality data. The shape files used are the Regional Health Boundaries as prepared and maintained by the ESRI Canada Community Maps Team.  We do not use the shapefiles from Statistics Canada due to changes to the provincial health region boundaries over time. Further information on correspondence between health region names used in our dataset and HRUID values given in Esri Canada’s health region map, is available on our GitHub page.

How are your active cases calculated?

Active cases are calculated by subtracting total recovered and total deaths from total reported cases. Repatriated cases are listed as recovered 14 days after their case report date, which follows definitions used by the province of Ontario. The definition of recovered cases varies by province.

Why could there be negative recoveries and/or tests?

Throughout the course of the pandemic, there have been changes in definitions used to capture testing, recoveries, and case/mortality reporting. As such, there is not only variation between provinces, but there is also variation within a province over time. Notable changes include shifts from reporting ‘people tested’ to ‘tests completed’ or vice versa, as well as changes in formula for recoveries. These changes are noted in the testing and recovered datasets corresponding to the day the change was announced. Additional information on testing definitions is included on the testing tab of our dashboard.

Why don't all cases have demographic data available?

When the pandemic first emerged and case counts were low, many provinces and health regions provided very detailed information, including demographics, for each case. However, this information has become less available as case counts have increased. This is why you may notice that our earlier cases have more detailed information than our more current cases.

When demographic information is readily available and reported, we do our best to ensure that this information is captured in our individual-level dataset. Unfortunately this is not always possible, and there are additional complications such as unstandardized age categories across provinces/territories/health regions.

We hope to collect demographic data where available, and we are currently working on developing a probabilistic linkage algorithm to match our dataset with provincial datasets that release this information.

What are some notable data changes that have occured?

There are too many changes and updates that we make to comprehensively list all, but these will be recorded as a method note in the dataset and/or detailed in our upcoming technical report.

Some important instances are below:

  • On March 21, 2020, Manitoba announced that a previously presumptive positive case was determined to be negative. However, the case number for this individual was not given. We compared our data with that of the Public Health Agency of Canada and made an assumption as to which individual this was and have removed them from the dataset.

  • On March 25, 2020 Saskatchewan changed the boundaries of their reported health region. Given reported case numbers in each newly defined health region, we assumed redistribution of cases and matched the new boundaries moving forward. We included a note in the additional notes variable for cases where we have made any change in health region information. The province changed their boundaries again on August 4, we are currently working to align our datasets with the available information.