De-identify datasets

De-identifying data

De-identifying data is when you take steps to ensure that an individual cannot be identified through the personal or health information you have collected. 

When you de-identify data you reduce the risk of a privacy breach. Legal requirements under privacy legislation do not apply to de-identified data. 

It may not be suitable to de-identify all datasets. There may be a higher risk of re-identification when you combine them with other datasets. Consider this when handling unit record level data.  

If you’re not sure that data can be, or has been, properly de-identified, you should treat it as you would personal information. Also, the data may not be suitable for open release.  

Avoid re-identifying data

You can avoid data being re-identified by:  

  • presenting and sharing aggregated rather than specific results or raw data 
  • checking if elements of what you’ve recorded would potentially allow someone’s identity to be inferred or derived. 

Remember that when you reuse or recycle datasets over multiple projects there’s a risk of being able to identify someone by linking the datasets together.  


Last updated