Introducing a context-based framework to comprehensively assess the social and ethical risks of AI systems
Generative AI systems are already being used to write books, create graphic designs, and assist healthcare workers, and they are becoming increasingly more capable. Developing and deploying these systems responsibly requires careful assessment of the potential ethical and social risks they may pose.
In a new paper, we propose a three-tiered framework for assessing the social and ethical risks of AI systems. This framework includes an assessment of AI system capabilities, human interactions, and systemic impacts.
It also maps the current state of safety assessments and finds three key gaps: situational, specific hazards, and multimodality. To address this gap, we call for repurposing existing evaluation methods for generative AI and implementing a comprehensive approach to evaluation, as in our case study on misinformation. This approach combines findings, such as the likelihood that an AI system will provide factually incorrect information, with insights into how and in what context people use those systems. Multi-layered assessments allow us to draw conclusions beyond model capabilities and indicate whether harm (in this case misinformation) is actually occurring and spreading.
For any technology to work as intended, both social and technical challenges must be addressed. Therefore, these different layers of context must be taken into account to better assess AI system safety. Here, we build on previous research identifying potential risks of large-scale language models, such as privacy leaks, task automation, misinformation, etc., and introduce a method to comprehensively assess these risks in the future.
Context is important when assessing AI risk.
The capabilities of an AI system are an important indicator of the broad types of risks it may encounter. For example, AI systems that are more likely to produce factually inaccurate or misleading results are more likely to raise the risk of misinformation, leading to problems such as lack of public trust.
Measuring these capabilities is central to AI safety assessments, but these assessments alone cannot ensure the safety of AI systems. Whether subsequent harm occurs, for example, whether people develop false beliefs based on inaccurate model results, depends on: context. More specifically, who uses AI systems and for what purposes? Does the AI system work as intended? Does it create unexpected externalities? All of these questions inform our overall assessment of the safety of an AI system.
expand beyond that ability For our assessment, we propose an assessment that can assess two additional points where downstream risks emerge: human interaction at the point of use, and the systemic impact as AI systems are embedded and widely deployed in broader systems. Integrating the assessment of a given risk of harm across these layers provides a comprehensive assessment of the safety of an AI system.
human interaction The evaluation centers on the experiences of people using AI systems. How do people use AI systems? Does the system function as intended at the time of use? How does the experience differ across demographics and user groups? Could we observe any unexpected side effects when using this technology or being exposed to its consequences?
systemic impact The evaluation focuses on the broader structures in which AI systems are embedded, including social institutions, labor markets, and the natural environment. Assessment at this layer can reveal harm risks that only become visible after AI systems are adopted at scale.
Safety assessment is a shared responsibility
AI developers must ensure that their technology is developed and released responsibly. Public actors, such as governments, are tasked with maintaining public safety. As generative AI systems become increasingly widely used and deployed, ensuring their safety is a shared responsibility between multiple actors.
- AI Developer They are well placed to investigate the capabilities of the systems they produce.
- application developer Designated public authorities are in a position to evaluate the functionality of different functions and applications and their possible external effects on different user groups.
- Broad public stakeholders We are uniquely positioned to predict and assess the social, economic and environmental impacts of new technologies such as generative AI.
The three evaluation tiers of our proposed framework are a matter of degree rather than a neat division. None of them are entirely the responsibility of a single actor, but key responsibilities will vary depending on who is best positioned to carry out the assessment at each tier.
Gaps in current safety assessments for generative multimodal AI
Given the importance of additional context for assessing the safety of AI systems, it is important to understand the availability of these tests. To better understand the broader landscape, we have made extensive efforts to collate assessments applied to generative AI systems as comprehensively as possible.
By mapping the current state of safety assessments for generative AI, we discovered three key safety assessment gaps:
- context: Most safety assessments consider generative AI system capabilities separately. Relatively little work has been done to assess potential risk at the point of human interaction or systemic impact.
- Assessment by risk: Performance evaluation of generative AI systems is limited to the risk areas they address. There is little assessment of many risk areas. Where present, assessments often operationalize harm in a narrow way. For example, representational harm is typically defined as the stereotypical association of an occupation with different genders, while other instances of harm and risk areas go undetected.
- Multi-way: Most of the existing safety assessments for generative AI systems focus only on text output. A major gap remains in assessing the risk of harm in image, audio or video formats. This gap is widening with the introduction of multiple modalities in a single model, such as AI systems that can take images as input or produce output that interweaves audio, text, and video. Some text-based assessments may apply to other modalities, but new modalities introduce new ways in which risks may emerge. For example, descriptions of animals are not harmful, but when applied to images of people, they are harmful.
We are making a list of links to publications detailing safety assessments of generative AI systems that will be publicly accessible through this repository. If you’d like to contribute, please fill out this form to add your review.
Practice of more comprehensive evaluation
Generative AI systems are driving a wave of new applications and innovations. A rigorous and comprehensive assessment of AI system safety that considers how these systems might be used and embedded in society is urgently needed to understand and mitigate the potential risks of these systems.
A practical first step is to repurpose existing assessments and leverage large-scale models themselves for assessment. However, this has important limitations. A more comprehensive assessment also requires developing approaches to evaluate AI systems from the perspective of human interaction and systemic impacts. For example, while spreading misinformation through generative AI is a recent problem, we show that there are many methods that can be reused to assess public trust and credibility.
Ensuring the safety of widely used generative AI systems is a shared responsibility and priority. AI developers, public actors, and other parties must jointly build a thriving and robust evaluation ecosystem for secure AI systems.