Explaining the proposed AI regulations in the EU
Proposal for EU’s new AI regulation, “AI act” in short, was published in April 2021, and it has been a widely discussed topic in data circles. And no wonder as it will have a large impact on some fields of AI and there’s a lot of uncertainty of how this Regulation will be applied in practice. As a senior data scientist I’ve been involved in numerous data science projects and I have special expertise in explainable AI methods and AI ethics so I wanted to read this proposal to evaluate how feasible the suggested Regulation is. After reading the proposed version of the EU’s AI act multiple times, I wrote this post to share my interpretation and thoughts of this Regulation. Note that the proposal contains so many small details that one post cannot cover them all and there are a lot of misconceptions around the topic that I cannot straighten in a short post. The proposal is available in all EU languages and you can find all the versions from here.
The outline of this article is the following: first I go through the reason why the Regulation is needed, then I will explain the main parts of this Regulation and how to fulfill these requirements, what is the penalty of violating this act and what is the expected time schedule of when it will take place. Finally I list some of my own thoughts and worries of this act. At the end of this post you can also find a summary of this Regulation.
In order to understand the urgency of this Regulation, it’s essential to understand what is the motivation behind the AI act. The main two reasons for AI act are:
- Ensure AI systems are safe and respect fundamental rights and Union values: EU’s law is based on the fundamental values of the Union which include respect for human dignity, freedom and democracy. On top of that the Union is bounded by fundamental rights, including the right to non-discrimination, equality between women and men, the right for data protection and privacy, and the rights of the child. Despite the multiple benefits that AI technology serves, AI systems can strongly contradict Union values and rights, and provide powerful tools for harmful practices. Ensuring AI that is developed in ways that respect people’s rights and earns their trust makes Europe fit for the digital age.
- Prevent market fragmentation inside EU: Some Member States are already considering national rules for AI products and services as the risks of AI systems have become evident. As products and services are likely to circulate across borders, divergent national rules will fragment the market inside the EU and endanger the protection of fundamental rights and Union values across the different Member states.
The amount of regulation in this proposal depends on the risk level the AI system generates. To reduce unnecessary costs and to prevent slowing uptake of new AI applications, the AI act wants to be gentle towards AI systems that have low or minimal risk to people and only targets AI systems that impose the highest risks. The regulation will prohibit certain AI practices, lay down obligations for high-risk AI systems and demand transparency for some AI applications. Regulation also suggests establishing a European AI Board that advises and assists on specific questions to ensure a smooth and effective implementation of this Regulation.
How is “AI” defined in this act?
The term “AI” is defined very broadly and surprisingly many applications will fall under “AI”. In this Regulation “AI system” means a software that is developed with one or more of these techniques (list in Annex I):
- Machine learning approaches, including supervised, unsupervised and reinforcement learning, using a wide variety of methods including deep learning;
- Logic- and knowledge-based approaches, including knowledge representation, inductive (logic) programming, knowledge bases, inference and deductive engines, (symbolic) reasoning and expert systems;
- Statistical approaches, Bayesian estimation, search and optimization methods.
How are the risk-levels defined?
The risk levels in this proposal are divided into three groups: unacceptable risk, high risk and low or minimal risk.
A) An uncceptable risk
In this category the risks of the AI system are so high that they are forbidden (with three exceptions). The forbidden practises are:
- manipulative systems: techniques that are beyond a person’s consciousness or that exploits any vulnerabilities of a specific group (age, physical or mental disability) in order to distort a person’s behavior in a manner that causes harm to that person or another person,
- social scoring algorithms: an AI system used by public authorities that evaluates trustworthiness of natural persons that leads into “social scoring” of citizens,
- real-time biometric systems: the use of a real-time system that identifies people from a distance in publicly accessible spaces for the purpose of law enforcement, unless it’s strictly necessary in the three following exceptions:
i) targeted search for potential victims of crime or missing children;
ii) prevention of imminent safety threats or terrorist attacks;
iii) detection of a perpetrator or suspect of a serious criminal offense.
The three exceptions require evaluation of what is the seriousness and the scale of harm if the system is used versus it is not used, and their usage needs to be always approved by appropriate authorities.
B) A high risk
AI systems identified as high-risk might have a significant impact on a person’s life and ability to secure their livelihood or they can complicate a person’s participation in society. Improperly designed systems might act in a biased way and show patterns of historical discrimination so in order to mitigate the risks of these systems, they can be put into service only if they comply with certain mandatory requirements discussed in the next section. An AI system is considered to be a high-risk if:
- It is covered by the Union harmonization legislation listed in Annex II AND it must undergo a third-party conformity assessment before it can be placed on the market. The products falling under this category are e.g. machinery, medical devices and toys.
- It is listed in Annex III. Systems in this category are divided into eight main groups that
i) use biometric identification;
ii) operate in critical infrastructure (road traffic, water, heat, gas, electricity);
iii) determine access to education or evaluate students;
iv) are used in recruitment, or make decisions on promotions, termination of contracts or task allocation;
v) determine access to services and benefits (eg. social assistance, grants, credit scores);
vi) are used in law enforcement;
vii) are used in migration, asylum or border control management;
viii) assist in judicial systems (eg. assist in researching facts and the law).
C) A low or minimal risk
Only certain transparency rules are required for AI systems that are not considered to be high-risk. These obligations are:
- If an AI system interacts with people, it must notify the user that the user is interacting with an AI system, unless this is obvious from the context of use.
- People must be informed if they are exposed to emotion recognition systems or systems that assigns people to specific categories based on sex, age, hair color, tattoos, etc.
- Manipulated image, audio or video content that resembles existing persons, places or events that could falsely appear to be authentic or truthful (eg. “deep fakes”) has to clearly state that the content has been artificially generated.
However it is encouraged that providers of low and minimal risk AI systems voluntarily create and implement the codes of conduct themselves. These codes of conduct may follow the requirements set for high-risk systems or they can include commitments to environmental sustainability, accessibility for persons with disability, diversity of development teams, and stakeholders’ participation to the design and development process of the AI system.
AI systems that are developed or used for military purposes are excluded from the scope of this Regulation. Regulation applies for all systems that a) are placed on the market, b) put into service, or c) used in the Union or if they d) impact people located in the Union (for example if an activity performed by AI is operating outside of the Union but its results are used in the Union). Also there’s no difference whether the AI system works in return for payment or free of charge.
Given the early phase of the regulatory intervention and the fact that the AI sector is rapidly developing and that the expertise for auditing is only now being accumulated, the Regulation relies heavily on internal assessment and through reporting. The list of requirements for high-risk AI applications is long and often confusing, and it is hard to comprehend the level of precision that these requirements must be fulfilled. The providers of high-risk systems must at least:
- Establish a risk management system: that needs to be regularly updated throughout the entire lifecycle of a high-risk AI system. It needs to identify and analyze all the known and foreseeable risks that might emerge when the high-risk AI system is used in its intended purpose or under reasonable misuse, especially if it has an impact on children.
- Write technical documentation: that must be kept up-to-date at all times and the documentation must follow the elements set out in Annex IV so it must contain at least:
a) a general description of the AI system e.g. its intended purpose, version of the system and description of the hardware,
b) a detailed description of the AI system including the general logic of the AI system, the key design choices, main characteristic of the training data, the intended user group of the system, and what the system is designed to optimize,
c) detailed information on the AI system’s capabilities and limitations in performance including overall expected level of accuracy and the accuracy levels for certain groups of persons, and evaluation on risks to health, safety, fundamental rights and discrimination.
If a high-risk AI system is part of a product that is regulated with legal acts listed in Annex II (such as machinery, medical devices and toys), the technical document must contain the information required under those legal acts as well.
- Fulfill requirements on used training, testing and validation data sets (if the system is trained with data): the data sets need to be relevant, representative, free of errors and complete. They must have the appropriate statistical properties especially on the groups of persons on which the high-risk AI system is intended to be used. Consideration must be given eg. to the relevant design choices, collected data, data preparation processes (such as annotations, labeling, cleaning, aggregation), assumptions of the data (what the data is supposed to measure and represent), examination of possible bias and identification of any possible data gaps and shortcomings.
- Achieve appropriate level of accuracy, robustness and cybersecurity: A high-risk AI system must achieve appropriate level of accuracy and it must perform consistently throughout its lifecycle. It needs to be resilient towards errors, faults and inconsistencies that might occur in the usage of the AI system. Users must be able to interrupt the system or decide not to use the system’s output. Also the AI systems must be resilient towards attempts to alter their use or performance by exploiting the system vulnerabilities.
- Perform conformity assessment of the system: in some cases comprehensive internal assessment (following the steps in Annex VI) is enough but in some cases a third party assessment (referred in Annex VII) is required. Note that for those high-risk systems that fall under the legal acts listed in Annex II (such as machinery, medical devices and toys), the conformity assessment must be done by the authorities that are assigned in those legal acts.
- Hand over detailed instructions to the user: users must be able to interpret the system’s output, monitor its performance (for example to identify signs of anomalies, dysfunctions and unexpected performance) and they must understand how to use the system appropriately. Instructions should contain contact information of the provider and its authorized representative, specify the characteristics, capabilities and limitations of performance (including circumstances that might affect the expected level of performance), and the instructions must clearly state the specifications for the input data.
- Register the system to the EU’s database that is accessible to the public: all high-risk systems and their summary sheets must be registered to the EU’s database and this information must be kept up-to-date at all times. The summary sheet must contain all the information listed in Annex VIII including the contact details of the provider, the trade name of the AI system (plus any other relevant identification information), description of the intended use of the system, the status of the AI system (on the market/no longer in market/recalled), copies of certain certificates, the list of Member States where the system is available and the electronic instructions for usage.
- Keep record when the system is in use: The AI system must automatically record events (‘logs’) while the system is operating to the extent that is possible under contractual arrangements or by law. These logs can be used to monitor the operation in practice and they help to evaluate if the AI system is functioning appropriately, paying particular attention to the occurrence of risky situations.
- Maintain post-market monitoring and report serious incidents and malfunctioning: provider is obligated to document and analyze the data collected from the users (or through other sources) on the performance of high-risk AI systems throughout their lifetime. Providers must also immediately report on any serious incidents or malfunctioning that has happened.
The authorities must be granted full access to the training, validation and testing datasets, and if necessary access to the source code as well. Note that the authorities must respect the confidentiality of information and data obtained in this process (including business information and trade secrets). As the authorities in question have great power to determine which AI systems are allowed to be set on the EU’s market, the Regulation sets strict rules for the parties that can carry out conformity assessments. For example, no conflict of interest is allowed to arise, and the authorities cannot provide any activities or offer any consultancy services that could compete with the AI systems.
The Regulation also lists obligations to importers and distributions of AI systems that they need to make sure the AI system fulfills the requirements listed in this act. Also users of high-risk AI systems have obligations: for example they need to ensure that input data is relevant for the intended use of the high-risk AI system and if the user encounters any serious incident or malfunctioning of the system, the user must interrupt the use of the AI system and inform the provider or distributor of the event.
Fines are divided into three categories:
- Using forbidden AI practices or violating the requirements for the data: 30 million euros or 6 % of total worldwide annual turnover for the preceding financial year, whichever is higher.
- Non-compliance of any other requirement under this Regulation: 20 million euros or 4 % of total worldwide annual turnover for the preceding financial year, whichever is higher.
- Supplying incorrect, incomplete or misleading information on the requirements set in this Regulation: 10 million euros or 2 % of total worldwide annual turnover for the preceding financial year, whichever is higher.
When deciding on the amount of the fine, circumstances of the specific situation should be taken into account, for example the nature and duration of the non-compliance and its consequences, and the size and market of the operator committing infringement.
In March 2018 European Commision set up an AI expert group to draw up a proposal for guidelines of AI ethics. In April 2019 “Ethics guidelines for trustworthy AI” was published. The first draft of EU’s AI regulation was published in February 2020 and it was called “White Paper on AI”. The paper invited all interested parties to express their feedback on the suggested regulation and it received in total over a thousand contributions from companies, business organizations, individuals, academic institutes and public authorities. Based on that feedback, the current version of the AI act was published in April 2021. It is not known when the Regulation will enter in force but it is estimated that it will be accepted earliest in 2023 and assuming a two year transit period, the Regulation would become applicable in 2025.
I have been following the AI ethics and transparency discussion already for several years, and it seems that more people have started to understand the risks of blindly trusting the AI systems. We already know cases where an AI system discriminated against women by giving them lower scores in a recruitment tool (link) and lower credit limits (link). Also, an AI system designed to work on US hospitals decided that white sick people required help more urgently than equally sick black people even though black person was substantially sicker than the average white person (the training data was biased as white people had better health insurances so their diseases were better diagnosed, link). So the risks of AI are not imaginary.
This is why I applaud the attempts of regulating the AI systems to ensure that people will be treated equally. Too many providers and developers don’t know or don’t care to check if their system will discriminate against certain people and what it might mean to these people. But I’m not convinced that the excessive amount of reporting suggested in this Regulation is the correct way to fulfill the wanted objectives. I think some of the requirements sound nice but are hard or impossible to fulfill.
After reading this proposal multiple times, a few sentences especially stand out. For example the highest level of penalty is given if the data used to train the AI system does not fulfill the requirements set in this Regulation. For example (direct citation from the proposal): “Training, validation and testing data sets shall be relevant, representative, free of errors and complete.” Also the data must have appropriate statistical properties on groups of people that are the intended user group of the system. These requirements are very heavy on real data science cases and it is disappointing that the Regulation does not state better how to ensure the data set is according to the restrictions. For example: a data set could have a good amount of different age groups, races and equal number of men and women in it but in the detailed analysis we notice that we have very few young, black women. So if we look at the statistics of one characteristic, the data set looks good. But if we look at the combination of three characteristics, for example age, race and gender, we start to find issues. If we focus on multiple features at the same time, it’s inevitable that the group size will become so small that it’s not statistically significant anymore. So how can we ensure that no biases exist in our data set and we don’t have to pay a 30 million euro fine?
The Regulation states that it relies heavily on internal reporting as the expertise for auditing is only now being accumulated. However the fines seem to be very high even though we are in the early stage of this kind of regulation. For me this appears as gambling: do you take the risk that you accidentally forgot to mention something and if you are caught your company will be doomed, or do you drop all development of AI systems to be on the safe side? I think the consequences seem unnecessarily harsh as the lawmakers don’t seem to fully know how difficult it is to fulfill some of the requirements. For this reason I believe that there’s a very high risk that investments for essential AI systems drop as investors fear for high and unpredictable fines. I would hope that lawmakers would revise this Regulation and only demand steps that the real-world AI systems could fulfill. Let’s not ruin good intentions with unfair legislation!
The latest proposal for regulating AI systems in the EU contains the following points:
- AI systems are divided into three categories based on the risk they generate.
- Unacceptable-risk systems are generally forbidden. Systems in this category consist of manipulative systems, social scoring algorithms and real-time surveillance systems.
- High-risk systems have a significant impact on a person’s life and ability to secure their livelihood or they can complicate a person’s participation in society. These systems can eg. determine access to services and benefits, be used in recruitment, or evaluate students. These can be put into service only if they comply with certain mandatory requirements that contain internal assessments and through reporting.
- Low and minimal risk systems has to comply with certain transparency rules. For example, if an AI system interacts with people, it must notify the user that the user is interacting with an AI system. Also “deep fakes” videos etc. must clearly state that the content has been artificially generated.
- The Regulation is expected to become applicable earliest in 2025.
The author is a senior data scientist and an expert in explainable AI methods, and she has spent years following the discussion of AI ethics. She has academic background with years of experience from computational physics simulations and a PhD from theoretical particle physics. The attempts to regulate AI interests her because AI models do not work in a straightforward manner so it’s not easy to write such regulation, but she fully supports regulating AI as AI developers focus too much on building models quickly without taking AI ethics into consideration.