De-identify datasets

Minimum you need to do
De-identify datasets where possible and take steps to avoid re-identification
De-identifying data
De-identifying data is when you take steps to ensure that an individual cannot be identified through the personal or health information you have collected.
When you de-identify data you reduce the risk of a privacy breach. Legal requirements under privacy legislation do not apply to de-identified data.
It may not be suitable to de-identify all datasets. There may be a higher risk of re-identification when you combine them with other datasets. Consider this when handling unit record level data.
If you’re not sure that data can be, or has been, properly de-identified, you should treat it as you would personal information. Also, the data may not be suitable for open release.
Avoid re-identifying data
You can avoid data being re-identified by:
- presenting and sharing aggregated rather than specific results or raw data
- checking if elements of what you’ve recorded would potentially allow someone’s identity to be inferred or derived.
Remember that when you reuse or recycle datasets over multiple projects there’s a risk of being able to identify someone by linking the datasets together.
Resources
- De-identification Decision Making Framework - Office of the Australian Information Commissioner and Data 61
- Reasonably ascertainable identity - Information and Privacy Commission NSW.
How to show you’ve met the de-identification requirement
You will have:
de-identified data as appropriate in your context. For example, removed names of individuals or used pixilation in video.
aggregated data before sharing it (rather than sharing specific results or raw data)
stored de-identified data separate from other personal information or health information and applied appropriate levels of restriction to that storage.