The ICO has recently published a joint statement on data scraping in conjunction with 11 other members of the Global Privacy Assembly’s International Enforcement Cooperation Working Group, including authorities from Argentina, Australia, Canada, Columbia, Hong Kong, Jersey, Mexico, Morocco, New Zealand, Norway, and Switzerland. The statement was issued to highlight the privacy risks associated with data scraping, focusing on how social media companies (“SMCs“), and other websites with publicly accessible personal data, should protect the data of their users and enable users to protect themselves.
As a brief summary, data scraping is a technique whereby data is extracted from websites and input into vast spreadsheets by a computer program. This automated process enables substantial amounts of data to be quickly and easily accessible. Scraping social media sites creates significant privacy risks due to both the high volumes of personal data collected by the sites on registration, but also the content of the posts shared by users on the sites.
The statement provided two key areas of focus for SMCs in relation to protection against and prevention of data scraping;
- the steps that the SMC can take themselves in relation to protections on their own sites (primary steps) and;
- the information that SMCs can provide to their users to help them self-protect against scraping (secondary steps).
Primary steps: Looking first at the steps that SMCs can take themselves, the overarching message of the statement was that SMCs should take a multi-layered approach to protecting their users from the privacy harms related to data scraping. They should implement a range of technical and procedural controls. Examples of such measures were provided, including:
- creating a specific team focusing on identification and implementation of mechanisms to protect against data scraping. This should include the functionality to be able to both monitor and react to data scraping incidents;
- monitoring how frequently users access the website. If a user is visiting unusually frequently, over a certain threshold per hour or per day, organisations should limit their access and perform data scraping checks;
- assessing the volume of new user access requests. If a new user is seeking out high volumes of other users, this can be a sign that they are looking to data scrape. SMCs should monitor this and limit access of users with unusual traffic;
- implementing measures to help with the identification of ‘bots’ both accessing the site and conducting data scraping. The statement provided the following example to assist SMCs in identifying bot activity – “a group of suspicious IP addresses can be detected by monitoring from where a platform is being accessed by using the same credentials from multiple locations. This would be suspicious where these accesses are occurring within a short period of time”. Steps that SMCs can take to prevent bot access include the use of CAPTCHAs and blocking any IP addresses with suspicious activity;
- taking appropriate action as soon as data scraping has been identified. This may include measures such as sending cease and desist letters and ensuring any data is deleted by the scraper including confirmation of the same.
- notifying any affected individuals and regulators in jurisdictions where data scraping incidents are considered to be a data breach.
Secondary steps: The statement also outlined the steps that SMCs can take to help inform their users on how to self-protect against data scraping. The main focus of the advice centred around information, suggesting that SMCs should take necessary steps to ensure that users are well informed of the steps that the SMC has taken to protect them, but also the steps that a user can take to protect their data. Recommendations included:
- engaging with users to ensure that informed decisions can be made on the use of SMCs, and what personal information and content is shared on the site; and
- working to improve awareness of privacy settings which users can utilise to protect themselves and their data.
- The statement also urged users of SMCs to take steps to protect their own data, urging them to consider the type of information that is shared online, specifically in relation to special category data. Users should also think about the content of their posts, and whether they would be happy seeing that same post years down the line.
Protection for special category personal data: The measures to be adopted by the SMCs should also be proportionate to the sensitivity of data. If any data falls within the scope of special category data, organisations are expected to take further steps to ensure that this is protected from data scraping techniques.
The statement was distributed directly to significant SMCs. These SMCs (among others) are required to give input to the ICO by September 24 (i.e., within one month of the statement’s issue), demonstrating how they are compliant. For the time being, we will have to wait on the ICO’s reaction to the SMCs’ comments, since it is expected that their replies will be published to assist further guidance to other platforms on how to comply with data protection standards in this area.