Artificial Intelligence and Data Protection: The Beginning of a Unique Marriage (Part One)

With the emergence of ChatGPT, the peculiar legal relationship between artificial intelligence (AI) and data protection has become a truly “hot topic.” This comes as no surprise since AI is everywhere, collecting personal data at an unprecedented scale and speed. But how legal is this data collection? Dr Endre Várady explores this exciting topic through the principles of GDPR [General Data Protection Regulation].

According to some studies, 90% of the data available today has been created in the last few years, with AI playing an increasingly dominant role in data collection. Data protection authorities, including Hungary’s NAIH, are intensifying their scrutiny of such data practices. For example, Hungary imposed a record fine of HUF 250 million for the illegal use of AI, while the Italian Data Protection Authority  banned the use of ChatGPT for Italian subjects outright.

Despite this, AI does not initially appear to be creating unique GDPR issues. In line with the principle of technological neutrality, AI must comply with the same GDPR provisions as any other data processing operation. In principle, this is true, but the devil is in the details. In practice, contradictions constantly arise, often forcing data protection authorities to deviate from traditional GDPR approaches.


The GDPR Principles: A Closer Look at AI’s Compliance

First GDPR Principle: The Principle of Transparency

The way AI operates is based on the “black box” principle. While the input data and the system’s final output are known, the intermediate processes and the reasoning behind the conclusions remain opaque. This raises the question: can AI systems be transparent to data subjects if they cannot access or understand what happens within the “black box”? However, opening the black box may infringe upon trade secrets and intellectual property rights.

Resolving this contradiction requires a pragmatic approach. What is the primary interest of the data subject? To ensure that the algorithm’s decisions can be improved through a simple, transparent procedure. This can be achieved, for instance, by allowing for human intervention or by enabling subjects to use a test system to see how specific input data leads to particular conclusions.

Moreover, the logic behind AI and its anticipated consequences can be communicated in a concise and understandable way without opening the black box. For instance, if an insurance company uses algorithms to monitor customers’ driving habits for premium calculations, it should inform them of this before processing begins. The company could explain that it uses AI for the calculations, describe the type of model (e.g., a neural network or decision tree), and inform them that reckless driving may result in higher premiums.

Second GDPR Principle: The Principle of Data Minimization

Another recurring question in the relationship between AI and data protection is how to reconcile AI’s massive data requirements with the principle of data minimization, which mandates that data processing should be limited to what is necessary for the purpose.

AI systems indeed require vast datasets. For example, training the software of a self-driving car may require analysing hundreds of thousands of photos and videos to teach the system to recognize pedestrians. However, this does not mean AI systems cannot be designed to continuously evaluate the nature and amount of data used, incorporating mechanisms to reduce unnecessary or insignificant data. Furthermore, the size of the training dataset can gradually be increased as needed.

It is also important to emphasize the quality of the dataset as a more critical factor. Without proper selection, labelling, and validation, a dataset is like an unorganized library. For instance, even a vast dataset of pedestrian images would fail to help a self-driving car if the photos and videos were not chosen based on appropriate parameters. On the other hand, a careful selection, labelling, and validation process can bring order and meaning to the raw mass of data.

Another technique to achieve data minimization is to significantly limit intrusions into individuals’ privacy by making it harder to identify individuals in the data. This can be done through techniques such as pseudonymization, encryption, or other privacy-enhancing methods.

Cikk megosztása:

Facebook
Twitter
Pinterest