Building data capability

Data is the foundational element that makes AI so powerful. The combination of data and AI has the potential to unlock significant value in informing the design, development and delivery of government policy and services for better customer outcomes.

In recognition of the fact that Government is the custodian of highly valuable data with the potential for innovation in service design and service delivery, there was a strong interest in making more Government data available to solve complex problems. However, this is dependent on the quality of the data, the appropriateness of the data model and ensuring the right data safeguards are in place.

As for any use of data, robust governance must be in place to manage and store it, comply with policies/regulatory obligations, be clear on ownership and use and, particularly for AI, ensure transparency regarding data use and decision-making. As a core principle, personal or sensitive information should be removed from datasets before they are made available for analysis. Where it is not possible to completely de-identify or de-sensitise a dataset, appropriate protection controls should be put in place.

Data cleansing and integration are very important, but care needs to be taken to ensure features that might be useful in AI models are not removed. Decisions about the data that can be shared and for what purpose are generally made on a case by case basis. Cleansed datasets may be able to be made available more widely for a broader range of purposes but may have less utility than the original.

NSW citizens must have confidence that data used for AI projects is used safely and securely, and in a way that is consistent with privacy and data sharing requirements. Government use of emerging technologies based on access to data will be undermined by lack of public trust if the risk of a data breach is not managed effectively.

Lack of public trust with respect to emerging technologies is not limited to data breaches. It is also about the decisions that are made using the outputs from these technologies and the appropriateness of the underlying data and its use in making these decisions. This is what differentiates AI from other forms of analytics. Therefore, we need to ensure that the data model we use is free of bias and we need to check the outputs for results that may lead to unintended consequences.

What we heard

Feedback from government, industry and universities, from both expert and new users of AI, was broadly consistent in relation to developing data capability. The key messages were:

Government-generated data is a public asset and should, where appropriate, be made available as widely as possible

While the NSW Government has put in place mechanisms that encourage open data, data sharing and data curation, government and industry stakeholders wanted more clarity on what data can be shared and for what purpose.

Data must be used safely, securely and consistent with privacy considerations

Stakeholders sought guidance on privacy risks around the use of government data. Robust data governance and sound data management practices are required to ensure data availability, accessibility, quality, privacy and security. The NSW Government AI Assurance Framework provides guidance to agencies on how to approach AI projects and data governance and reflects the information available through the Information and Privacy Commission.

Data quality for AI projects needs to be understood and be fit for purpose

It is important to maintain the richness of a dataset. Datasets should be accurate, appropriate, complete, timely, representative and consistent. However, there is a trade-off between making a dataset “safe” and its utility. Sometimes, poor quality (incomplete or inconsistent) data can be predictive of an outcome. Projects should determine an outcome and only then gather and analyse various datasets to understand which are appropriate.

There needs to be AI-specific guidance on risks in the context of using data and safeguards must be put in place to manage data bias risks

It is important to understand the appropriateness of the underlying data and its use in making decisions. Elements of bias can emerge during the collection, collation, analysis and eventual use of data. Bias cannot be completely removed (and can sometimes be necessary) but government needs to ensure that bias that can harm individuals is limited as much as possible. Outputs should then be carefully monitored and tested for results to ensure there are no unintended consequences.

Assurance mechanisms must be implemented to ensure regular testing of outcomes and recommendations

Government must verify data sources, and constantly test and retest instead of 'set and forget'. AI solutions should be designed with and monitored against explicit standards for performance, reliability, robustness and auditability – and the NSW Government Ethical AI Principles. Systems should be tested rigorously prior to release.

NSW needs data standards, preferably consistent with international standards to ensure consistent approaches across Australia and internationally

There is a great deal of work being undertaken by Standards Australia, other jurisdictions and internationally on development of data standards. These would assist with AI implementation and sourcing. NSW should align this work with that being undertaken more broadly, rather than develop its own standards.

As International Standards, through ISO and IEC, are developed and adopted through Standards Australia over the coming years, agencies might wish to adopt or use these as guidance, to prevent 'set-and-forget' mentalities. These Standards are likely to span governance, through to a management system approach and more granular guidance on bias within machine learning environments.

Actions

Commitment	Responsibility	Timeframe	Status
Participate in shaping International Standards for AI being developed by ISO/IEC/JTC 1, through Standards Australia's AI Committee	DCS – Data Analytics Centre	Immediate and ongoing	Complete
Develop a Data Governance Toolkit to support good data governance and best management practices	DCS – Data Analytics Centre	Q4 2020	Complete
Undertake stocktake of key data assets across government that may inform development of algorithms	DCS – Data Analytics Centre	Q4 2021	Complete
Develop practical resources to increase awareness of data policy issues specific to AI, highlighting likely risks and mitigation strategies in AI projects	DCS – Data Analytics Centre	Q4 2021	Complete
Develop an assurance mechanism for AI projects that are under $5m to ensure projects are consistent with AI principles	DCS	Q4 2021	Complete
Review the ICT Assurance Framework and add requirements specific to assurance of AI projects	DCS	Q4 2021	Complete