"Making AI safe is a more complex problem."

As head of the "Artificial Intelligence" department at the Fraunhofer Heinrich Hertz Institute (HHI) in Berlin, Dr. Wojciech Samek is working to make AI explainable and secure. His current white paper on the testing and certification of AI applications, which he published together with the TÜV Association and the German Federal Office for Information Security (BSI), is intended as an impetus for the directive currently under discussion in Europe to regulate AI.

The Berlin Fraunhofer Heinrich Hertz Institute (HHI) is one of the world's leading institutes in research into explainable AI (XAI). How did you get involved and what exactly is meant by this?

For a decade now, the trend has been to use deeper neural networks and more complex AI models with ever more parameters and layers. Due to their complexity, these AI models have long been considered "black boxes", i.e. models whose mode of operation can neither be completely understood nor whose results can be comprehensibly explained. This changed in 2015 when, together with Prof. Klaus-Robert Müller from TU Berlin, we developed a general technique that can be used to make deep neural networks explainable. Since then, we have written around 30 papers explaining how we have extended these techniques theoretically, but also applied them in different domains.

Let's go back to the beginning. What did you find out back then?

We studied some AI models developed and published by leading research groups to find out how they make decisions. We wanted to use this to get a deeper insight into these black boxes and also compare the models' learned problem-solving strategies. One thing that surprised us a lot was: We saw that many models decide quite differently than expected.

Can you give an example?

There was an international challenge in which the best research groups submitted their AI models for image classification. There were 20 categories. One of them was to recognize horse images. The challenge was done for eight years and every year the best AI was awarded. But they only compared the performance values of the models and calculated how correctly the models distinguished the horse images. It was not clear what the model was looking at. With our technique, we examined the models that won the challenge afterwards. We were astonished to find that many of them cheat. They don't do what was expected. For example, they distinguished boats by the presence of water. They did not look at the boat, but at the water. In the case of horse images, they also did not look at the horse itself, but at a copyright sign in the image. In fact, when the data sets were collected, they used data from the Internet. Many of these websites had horse pictures with copyright. Nobody noticed that. We have seen these blatant cases more often, and others have also reported that AI models cheat more often. We too often rely on performance values, but don't know how the models arrive at their decisions. This was our entry point into the topic of reliability and testing AI reliability. After all, especially in medical and critical applications, you want to make sure that the AI works the way you want it to. But to do that, you need some explainability component. There is a lot of interest in this from researchers and all the players using AI. Chemists, physicists, medical doctors... they all want to apply AI and at the same time understand what AI does. A black box is of no use to them.

One of your developments to make AI explainable is the analysis method Layer-wise Relevance Propagation (LRP). What exactly do you mean by this?

It is important to fundamentally understand what the neural network does. For example, when it has to recognize image elements, it receives pixel values as input and processes them layer by layer. Through non-linear operations, the information is passed on until you have a decision at the end. Then a neuron fires, which - in our example - is responsible for the horse. At LRP, we go backwards and distribute the result back in a mathematically meaningful and theoretically sound way. We have thought very carefully about how to distribute the result back layer by layer, so that each element of the neural network gets its share of the decision. From this, you can see which pixels were particularly important, and what contribution each neuron made to the result. We often compare this process with the flow or electric circuit, except that in our case it is not water or electricity that flows back, but the so-called "relevance". More relevance flows through certain neurons because they are particularly important for the result. We have found a technique to calculate this and have found a very efficient algorithm for it. Today, we can distribute the result back in milliseconds.

What can you use these declarations for?

We are very interested in this question. On the one hand, we can learn from it, and on the other hand, we can verify whether a result makes sense or not. It is also important to use the explanation to make the model better. Here, we are currently working on how to incorporate these techniques into the training of the AI, so that the models not only provide the desired result, but also the path to the result is the correct one. The explanations are also good to use to find any bias in data. For example, in one paper, an AI was trained on a large dataset of facial images to estimate a person's age. However, we saw that whether the person was laughing or not played a role in the age estimation. This is because younger people laughed more than older people in the datasets. Whether someone was wearing a suit or a shirt also played a strong role. The model has used features that are either irrelevant or bias and therefore you don't want to have them. We're looking at how to automatically work that out to make models more reliable, fair, and ethical. This is where explainability plays an important role, because it allows us to quickly determine that. All it took was one picture to see that attention was paid to the collar of the shirt. That's when we knew: Here is a problem. There are also people who say that there is no need for explainability because AI is measured by the result. But in practice, that's not true. When data has a bias, it limits. Explainability helps to overcome this limitation.

Explainability is also needed for AI reliability certification, which you addressed in your recent white paper „Towards Auditable AI Systems“.

Ja, was man durch Erklärbarkeit gewinnt, trägt dazu bei, eine KI zuverlässig zu machen. Aber da spielen noch andere Aspekte eine Rolle.

What would they be?

The robustness of the model. That it is resilient to disruptive factors. When an autonomous car is driving, it should not be disturbed by external influences such as weather conditions. This is also required of all other technical systems. An important area is the bias mentioned, that the model generalizes as well as possible and does not choose features that are relevant in the training data but have nothing to do with reality. Another area is the issue of security. That the AI is protected from attack or manipulation. I'm sure you've heard the example where a stop sign with a sticker is not recognized as a stop sign. Or that attackers insert backdoors into the training data to make the model vulnerable to attack. This security component can be tested to some degree. Also, there is a lot of research in this area. However, I personally think that we are not there yet with these techniques. Today's neural networks only work "bottom up": pixels are processed, then you get a result. If we compare it with us humans, we do get pixel values in the form of light on our retina, which then stimulate certain cells. That is also a bottom-up process. But there is still the top-down process that drives perception. We use a lot of experiential knowledge that compensates for errors in perception. This also tells us that it is a stop sign - even if there is a sticker on it. Today's models don't have this process.

If you can't guarantee these aspects in the models, how can you - nevertheless - test and certify an AI. What does it depend on?

In our whitepaper, we write that it is enormously important to look not only at the model itself, but at the entire lifecycle of the AI. All steps in the lifecycle - starting with the data and ending with the training process - must go through quality control and meet certain criteria. Of course, the model must be robust and do what it is supposed to do. But you can also take steps during operation to respond to errors. For example, one can output an uncertainty value in addition to the result to indicate to the user whether the result is certain or not. The perception of one's own uncertainty is possible. There are also requirements for the environment in which the model operates. It does not operate in a vacuum, but is integrated into other systems. Redundant checking mechanisms could help to ensure that the systems mutually safeguard each other. Making AI secure is a more complex problem because all these aspects have to be taken into account.

What other insights were you able to gain from the whitepaper on AI application testing and certification?

Processes can be certified, but different processes are needed for different applications. There are many differences in detail. Here, for each area, it must be looked at exactly whether and how one certifies and approves such a system. This is already happening to some extent. If you think about medicine, some systems have been approved by the FDA. We are also working with representatives of the FDA in the "AI for Health" focus group, which was founded by the ITU and WHO, and are looking together at what such a process should look like in medicine. We are also learning from the experts how to benchmark systems. The international consortium includes regulators from other countries, universities, and also companies that are committed to investigating and advancing these aspects: How can AI be tested in medicine, what can the process look like. For this, we have co-developed a platform that can make something like this available to the public.

Speaking of platforms, this one launches on May 19-21. Can you tell us more?

The focus group, which was established by the ITU and WHO and is led by HHI, consists of subgroups that focus on different medical problems such as dermatology. The goal is to develop test methods and tests for these different areas. A developer should be able to upload his models. These are then encrypted so that the developer can be sure that his knowledge is safe from unauthorized access. They will then go through a series of defined tests that we have developed. The first version will contain generic tests. That's about robustness or explainability, areas that need to be tested independently of the application. The platform will be continuously developed. For example, if someone develops an AI app that detects skin cancer from photographs, this should also be testable in the future.

In the focus group, you work with different countries. How do your colleagues from the USA, for example, approach the topic of testing and certification?

We are very happy that we can learn from other countries. The FDA has come a long way and thought about these issues early on. It has also looked at the issues of updates. You have a certified system, but what happens when you update? Does everything have to be recertified? Or how can you cover any updates, and therefore the entire lifecycle, with just one certification. That's one step further, after all, you don't want to re-certify. Updates and further learning are important for AI models. We are very pleased that our colleagues are part of this and have clearly recognized the relevance of the topic.

A directive to regulate AI is also currently being discussed within the EU. What are the essential questions?

It is about finding a regulation to regulate the use of AI and to restrict it in some areas or to make it transparent, for example when a user talks to a chatbot. There are also attempts to regulate certain manipulations. Biometric facial recognition is being discussed in particular, and privacy aspects are also important. Who owns the data? It's about the manipulability with AI on the one hand, but also about discrimination and explainability of AI. These questions have been going on for a while. The data protection regulation talks about a "right to explicability." I supported that from the start; our methods fit well there. But what does that mean legally for this or that application? You have to do that specifically for certain applications. The first steps have been taken at the high-level, but the industries and legislators still have to find out what this means in practice.

You also provide an impetus for this with a white paper on the testing and certification of AI applications called "Towards Auditable AI Systems", which you recently published with TÜV and the German Federal Office for Information Security (BSI). What exactly is that all about?

I am very happy about the cooperation with VdTÜV and BSI. Both are experts in setting standards and know how to do certification and testing in traditional industry. We have joined forces to cover the state-of-the-art: In layman's terms, how can you imagine the lifecycle of an AI? What is the state of the art? After all, the whitepaper grew out of a workshop in which we invited leading international experts on these topics, for example from MIT. It was important for us to identify open questions and try to give recommendations and directions to look at more. This October, a follow-up event is planned that will focus even more on applications in practice. Industry representatives should report concretely how to enforce quality standards in certain areas, what the best practices are. That's the next step: from state-of-the-art down to the application level, but also to the problems.

Speaking of the future: What else do you have planned?

I've already touched on one thing briefly. A big issue is how to use explanation to make AI models better. Not just explainability that explains, but that improves the model. Explainability 2.0 in principle.

Other than that, I'm happy to be in Berlin. The environment here is extremely good. The critical mass of excellent researchers and industry is here in Berlin. At BIFOLD (note: The Berlin Institute for the Foundations of Learning and Data), I work with colleagues on how to measure reliability. Also within our institute and with other actors, I would like to further develop research on explainability and reliability.