Artificial Intelligence

World Experts call for greater transparency in the evaluation of Artificial Intelligence systems.

[ 26/04/2023 ]

"Recent advances in Artificial Intelligence based on systems that require vast amounts of data and computation, such as GPT-4, have highlighted the difficulties in understanding the capabilities and weaknesses of these AI systems. We need to find out where these systems are safe to use and how they could be improved. And this is due to the way AI is assessed today, which needs to change urgently”.

Behind these words are 16 of the world's leading experts in Artificial Intelligence, including researchers from the VRAIN Institute of the Universitat Politècnica de València (UPV), José Hernández-Orallo, Fernando Martínez Plumed and Wout Schellaert.

Coordinated by Professor Hernández-Orallo, the 16 researchers have published a letter today in Science Journal in which they call for a "rethink" of the evaluation of AI tools to move towards more transparent models, know about their effectiveness, and their actual capabilities, what they can do and what not.

In their paper, the authors propose a roadmap for AI models in which their results are presented more nuanced, and the case-by-case evaluation results are publicly available.

As Hernández-Orallo explains, the performance of an AI model is measured with aggregated statistics. And this poses a risk because while they can give a picture of good overall performance, they can also hide low reliability/usefulness in specific, more minority cases, "and yet it is implied that it is equally valid in all cases when in fact it is not".

In the paper, the authors explain this with the case of AI models of clinical diagnostic aids and point out that these systems may have a problem when analysing people of a particular ethnicity or demographic group because these are cases that constitute only a small proportion of their training.

"We are asking that whenever an AI result is published, it should be broken down as much as possible to know its real usefulness and reproduce the analysis. In the article published in Science, we also talked about an AI facial recognition system that gave a 90% accuracy rate, and then it was found that for white men, the accuracy rate was 99.2%, but for black women, it was only 65.5%. This is why sometimes the results sold about an AI tool's usefulness are not entirely transparent and reliable. If they don't give you the detail, you think the models work very well, and that's not the reality. Not having that breakdown with all the possible information about the AI model means that applying it could entail risks," says José Hernández-Orallo.

The VRAIN UPV researcher stresses that the proposed changes can improve the understanding of AI. And also to reduce the "voracious" competition among AI labs to announce that their model improves previous systems by a certain percentage.

"Some labs want to go from 93% to 95% no matter what, which goes against AI's ultimate applicability and reliability. What we want, in short, is to contribute to a better understanding, among all of us, of how AI works, what the limitations of each model are, to guarantee the correct use of this technology," concludes Hernández-Orallo.

Along with researchers from the VRAIN Institute of the Universitat Politècnica de València, research staff from the University of Cambridge, Harvard University, the Massachusetts Institute of Technology (MIT), Stanford University, Google, Imperial College London, the University of Leeds, the Alan Turing Institute in London, Deepmind, the US National Institute of Standards and Technology (NIST), the Santa Fe Institute, Tongji University in Shanghai and Shandong University in Jinan have also participated in this article.

Reference

Ryan Burnell et al.Rethink reporting of evaluation results in AI.Science380, 136-138(2023).DOI:10.1126/science.adf6369

Outstanding news

ARWU 2023
The Shanghai ranking reaffirms the UPV as the best polytechnic in Spain for yet another year

Science Meets Regions CV 2023
The UPV and the Almussafes City Council begin a collaboration in search of solutions to maintain the automobile sector in the Valencia Region

Scientific reference
Avelino Corma, Distinguished Research Assistant at the UPV, awarded an Honorary Doctorate by the University of Huelva

Micronanofabs NTC UPV-PERTE CHIP Conference
María Marced, TSMC Europe president: "The sector's future is bright, the market is expected to double by 2030"

Goya nomination
Javier Polo, who holds a degree in Audiovisual Communication from the UPV, directs the successful short documentary Una terapia de mierda

Sant Carles Medal 2023
The Faculty of Fine Arts of the UPV awards the Sant Carles Medal 2023 to outstanding Valencian art and culture figures