The ICO has recently launched a public consultation on the first chapter of its draft guidance on generative AI and data protection.

This consultation has a particular focus, it is a call to explore the lawful basis for extracting data from the web to train generative AI models (a process which is becoming more common across numerous markets). The ICO is requesting input from developers, users and wider interested parties.

What is generative AI?

Generative AI is artificial intelligence capable of generating text, images and other media using generative models. Such AI produces outcomes similar to the characteristics of the data on which it is trained. The development and deployment of generative AI models raises questions regarding how existing data protection law can and should apply to the development and use of such models.

This consultation is the first in a series of consultations that will be launched by the ICO in an effort to summarise the ICO’s assessment as to how data protection law should be applied to generative AI.

Five Takeaways

We have summarised below five key takeaways from this first consultation chapter, which readers should review and have in mind before submitting their views to the ICO below:

  1. Lawful basis: Training generative AI models operating in this manner may be viable, however developers need to carefully consider the legality and compliance requirements prior to setting up such processes. What constitutes the most reliable lawful basis here? It is the ICO’s assessment that excluding legitimate interests, all other lawful bases are unlikely to be available for training generative AI on web-scraped data.
  2. Purpose test: Controllers (i.e., developers) need to consider whether there is a valid use case for training the generative AI model using the online available data (i.e. the purpose test). To be compliant with legal obligations, the purpose must be specific and based on the available data.
  3. Data minimisation: The training of generative AI models currently involves the use of large volumes of data, which at face value conflicts with privacy requirements to minimise data processing to what is necessary to achieve the specified purposes. The current training of generative AI models clearly creates a tension  here, and there appears to be minimal evidence that smaller volumes of data could be used.
  4. Balancing test: Even if controllers conclude that they have a sufficient lawful basis and processing is necessary to achieve the use case, do the individuals’ rights and freedoms override the controllers’ interests? The ICO’s raises a key concern here in that such processing (i.e., extracting data from the web to train models) is ‘invisible’, making it more difficult for individuals to maintain control and understand what organisations are doing with their data.
  5. Risk mitigation: Potential risks include individuals losing control of their data, or potentially unlimited downstream application. In order to mitigate this, the controller should ensure it has adequate organisational and technical security measures in place, has assessed potential risk to individuals and can control and evidence whether the model has a wider societal benefit. The method of generative AI model deployment will also be key to the assessment as to potential harm can be mitigated – for example, where a generative AI model is deployed by a third-party (not the developer) via an API there is potential for the model to be used in a number of ways. Developers should seek to limit such use contractually. The ICO is particularly interested in stakeholder views regarding potential mitigation measures, their efficacy and how such efficacy should be evaluated and documented.

Next steps

Our view is that this consultation presents a great opportunity to work with the ICO to help surface more and varied use cases for consideration. The consultation on the first chapter is open until Thursday 1 March 2024. You can respond via the survey, or by emailing the ICO – please see here.  The means by which businesses are considering what ‘good’ looks like when balancing data protection requirements and mitigating data protection risk is, we believe, going to be key to creating a better opportunity for the ICO to understand how data protection by design techniques can be possible as a route to maintaining lower risk (or certainly not high risk!) to individuals.


Marilyn is an associate in the Intellectual Property, Data and Technology team based in London. She joined Baker McKenzie as a Trainee Solicitor in September 2020 and was admitted as a solicitor in England and Wales in September 2022. During her training, Marilyn was seconded to Baker McKenzie's Dubai office for six months and later to Google's commercial legal team for six months.



Vin leads our London Data Privacy practice and is also a member of our Global Privacy & Security Leadership team bringing his vast experience in this specialist area for over 22 years, advising clients from various data-rich sectors including retail, financial services/fin-tech, life sciences, healthcare, proptech and technology platforms.