De-identify datasets

Minimum you need to do

De-identify datasets where possible and take steps to avoid re-identification

De-identifying data

De-identifying data is when you take steps to ensure that an individual cannot be identified through the personal or health information you have collected. 

When you de-identify data you reduce the risk of a privacy breach. Legal requirements under privacy legislation do not apply to de-identified data. 

It may not be suitable to de-identify all datasets. There may be a higher risk of re-identification when you combine them with other datasets. Consider this when handling unit record level data.  

If you’re not sure that data can be, or has been, properly de-identified, you should treat it as you would personal information. Also, the data may not be suitable for open release.  

Avoid re-identifying data

You can avoid data being re-identified by:  

  • presenting and sharing aggregated rather than specific results or raw data 
  • checking if elements of what you’ve recorded would potentially allow someone’s identity to be inferred or derived. 

Remember that when you reuse or recycle datasets over multiple projects there’s a risk of being able to identify someone by linking the datasets together.  

Resources 

How to show you’ve met the de-identification requirement

You will have:

  • de-identified data as appropriate in your context. For example, removed names of individuals or used pixilation in video.

  • aggregated data before sharing it (rather than sharing specific results or raw data)

  • stored de-identified data separate from other personal information or health information and applied appropriate levels of restriction to that storage.

Last updated