Information Governance in AI Development

Business interest in artificial intelligence (AI) and machine learning (ML) has soared over the past few years. Surveys show corporate investment in AI increased by 40 per cent between 2019 and 2020. In the financial sector, AI is used to refine the guidance provided by chatbots and robo-advisors, and to detect fraud patterns and make decisions on customer creditworthiness. In retail, AI is used to provide customized consumer recommendations, manage supply chain logistics, and streamline store operations.

AI potential has attracted government attention as a key generator of digital transformation. In May, the Biden administration launched AI.gov as the central information source for the National Artificial Intelligence Initiative established in 2020.

AI Regulations are Rapidly Emerging

However, the use of AI comes with significant concerns, particularly in areas of ML which rely on enormous stores of data. Current events have highlighted ethical concerns associated with AI, spurring governmental regulatory and enforcement action. Capping a recent wave of new AI guidelines and standards, the EU recently published a proposed EU Regulation on Artificial Intelligence, intended to address AI risks to fundamental rights and safety. In the absence of specific federal legislation for AI and data protection, the US Federal Trade Commission recently finalized a settlement ordering a company to destroy algorithms and AI/ML models derived from privacy violations found in the misuse of consumer biometrics. As consumers begin to question the levels of surveillance in their home devices, their usage patterns being monitored, and the logic applied to automated decisions affecting their lives, operating on a “build first, question later” basis comes with increasing risk.

Data Risks and Vulnerabilities in AI Development

In ML, the availability of vast quantities of data has constituted the basis for innovation. Data is used to train algorithms to apply desirable inferred labels to new input data. Understanding this reality highlights certain data risks in AI/ML, particularly the following examples:

Poor data quality. Considering the large quantity of data used to enable AI, it can be difficult to maintain sufficient data quality. The risk associated with poor data quality includes problems in already-labelled data being used as training data—for example, it may contain unconscious biases or fail to be a truly representative sample. In addition, data may be inaccessible, inconsistent, disorganized, and lack proper controls for the creation and addition of metadata.

Privacy and other compliance concerns in the use of personal data. Training and input data used in AI technology often involves personal data, whether in the consumer or employment context. Processing this data requires close attention to privacy requirements, particularly where personal data is used in algorithms to make certain assumptions about individuals.

Where Information Governance Can Intervene

Fundamentally, ML requires the availability of accessible, trusted, secure and reliable data, which is already a key goal of information governance. Implementing an IG program can help an organization to:

Improve data quality. The report on AI data quality of the EU Agency for Fundamental Rights expounded on the meaning of “quality data” in the context of creating quality algorithms, which includes ensuring that data is accurate, representative, complete, and not outdated. Businesses can save considerable time and expense in the laborious data cleaning and preparation process by strategically setting policies and protocols on data collection, storage, taxonomy, and sufficient metadata.

Facilitate data accessibility. An IG program should aim to break down the information silos between business functions. Creating consistency in structures, terminology and protocols, as well as mapping organizational data flows, should encourage greater collaboration and information sharing between functions. Better decisions can then be made on additional data types that an organization can harvest.

Achieve AI transparency and establish audit trails. As more governments look to implement audit requirements for AI algorithms, IG programs can respond by adjusting existing IG policy considerations to reflect AI system concerns. IG programs that inventory and map information creation and usage can establish and ensure data lineage. In addition, keeping technical documentation and system logs provides the ability to trace the AI system’s function for both regulatory compliance and performance-tracking purposes.

Comply with privacy regulations. AI systems using personal data must be in compliance with applicable privacy requirements. An IG program may set controls for processing sensitive personal data, such as tagging the data to ensure appropriate access, use and retention.