The US AI Safety Institute Signs Research Agreements with Anthropic and OpenAI

Agreement has potential to influence safety improvements on AI systems

By Christina Catenacci

Sep 13, 2024

Key Points:

The Safety Institute has signed research agreements with Anthropic and OpenAI

The Safety Institute will receive access to major new models from each company prior to and following their public release

The Safety Institute will be providing feedback and collaborating with the companies

The US AI Safety Institute (Safety Institute) has recently signed research agreements with Anthropic and OpenAI. This article describes the details as set out in the Safety Institute’s recent press release.

What is the Safety Institute?

The Safety Institute located within the Department of Commerce at the National Institute of Standards and Technology (NIST), was established following the Biden-Harris administration’s 2023 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence to advance the science of AI safety and address the risks posed by advanced AI systems. In fact, it is focused on developing the testing, evaluations and guidelines that will help accelerate safe AI innovation in the United States and around the world.

The Safety Institute recognizes the potential of artificial intelligence, but simultaneously acknowledges that there are significant present and future harms associated with the technology.

Additionally, the Safety Institute is dedicated to advancing research and measurement science for AI safety, conducting safety evaluations of models and systems, and developing guidelines for evaluations and risk mitigations, including content authentication and the detection of synthetic content.

And 270 Days Following President Biden’s Executive Order on AI, the Safety Institute created draft guidance in order to help AI developers evaluate and mitigate risks stemming from generative AI and dual-use foundation models. In fact, NIST released three final guidance documents that were first released in April for public comment, as well as a draft guidance document from the Safety Institute that is intended to help mitigate risks. NIST is also releasing a software package designed to measure how adversarial attacks can degrade the performance of an AI system.

The goal is to have the following guidance documents and testing platform inform software creators about the risks and help them develop ways to mitigate those risks while supporting innovation:

Preventing Misuse of Dual-Use Foundation Models

Testing How AI System Models Respond to Attacks

Mitigating the Risks of Generative AI

Reducing Threats to the Data Used to Train AI Systems

Global Engagement on AI Standards

One guidance document, Managing Misuse Risk for Dual-Use Foundation Models deals with the key challenges in mapping and measuring misuse risks. This is followed by a discussion of several objectives: anticipate potential misuse risk; establish plans for managing misuse risk; manage the risk of model theft; measure the risk of misuse; ensure that misuse risk is managed before deploying foundation models; collect and respond to information about misuse after deployment; and provide appropriate transparency about misuse risk.

What do the Agreements Require?

In its press release, the Safety Institute announced collaboration efforts on AI safety research, testing, and evaluation with Anthropic and OpenAI.

In fact, each company’s Memorandum of Understanding establishes the framework for the Safety Institute to receive access to major new models from each company prior to and following their public release. The agreements will enable collaborative research on how to evaluate capabilities and safety risks, as well as methods to mitigate those risks.

Elizabeth Kelly, director of the U.S. AI Safety Institute, stated:

“Safety is essential to fueling breakthrough technological innovation. With these agreements in place, we look forward to beginning our technical collaborations with Anthropic and OpenAI to advance the science of AI safety…These agreements are just the start, but they are an important milestone as we work to help responsibly steward the future of AI.”

It will be interesting to see what comes of these collaborations. More specifically, time will tell whether the Safety Institute actually provides meaningful feedback to Anthropic and OpenAI on potential safety improvements to their models, and whether the companies attempt to incorporate the Safety Institute’s feedback to improve safety and better protect consumers.