OpenAI Latest To Add 'Made By AI' Metadata To Model Work

Thursday February 08, 2024.

Images emitted by OpenAI's generative models will include metadata disclosing their origin, which in turn can be used by applications to alert people to the machine-made nature of that content.

Specifically, the Microsoft-championed super lab is, as expected, adopting the Content Credentials specification, which was devised by the Coalition for Content Provenance and Authenticity (C2PA), an industry body backed by Adobe, Arm, Microsoft, Intel, and more.

Content Credentials is pretty simple and specified in full here: it uses standard data formats to store within media files details about who made the material and how. This metadata isn't directly visible to the user and is cryptographically protected so that any unauthorized changes are obvious.

Applications that support this metadata, when they detect it in a file's contents, are expected to display a little "cr" logo over the content to indicate there is Content Credentials information present in that file. Clicking on that logo should open up a pop-up containing that information, including any disclosures that the stuff was made by AI.

How the C2PA 'cr' logo might appear on an OpenAI-generated image in a supporting app. Source: OpenAI

The idea being here that it should be immediately obvious to people viewing or editing stuff in supporting applications – from image editors to web browsers, ideally – whether or not the content on screen is AI made.

Helping people distinguish between real and faked images will no doubt become increasingly important as AI-generated content improves to the point at which it is not obviously the output of a machine. Experts fear that such synthetic media will mislead netizens and be used to sow misinformation or support disinformation campaigns. Tech companies are stepping up efforts to minimize risks as, for one thing, the 2024 US presidential election nears.

Yet the Content Credentials strategy isn't foolproof as we've previously reported. The metadata can be easily stripped out or exported without it, or the "cr" cropped out of screenshots, so no "cr" logo will appear on the material in future in any applications. It also relies on apps and services to support the specification, whether they are creating or displaying media.

To work at scale and gain adoption, it also needs some kind of cloud system that can be used to restore removed metadata, which Adobe happens to be pushing, as well as a marketing campaign to spread brand awareness. Increase its brandwidth, if you will.

OpenAI acknowledged as much this week when announcing this week ChatGPT and DALL-E 3 API will include C2PA's Content Credentials in their output.

"Metadata like C2PA is not a silver bullet to address issues of provenance," OpenAI admitted. "It can easily be removed either accidentally or intentionally. For example, most social media platforms today remove metadata from uploaded images, and actions like taking a screenshot can also remove it. Therefore, an image lacking this metadata may or may not have been generated with ChatGPT or our API."

In terms of file-size impact, OpenAI insisted that a 3.1MB PNG file generated by its DALL-E API grows by about three percent (or about 90KB) when including the metadata. A 287K WebP file, meanwhile, will jump by about 32 percent with that 90KB payload.

C2PA is gaining traction: as more apps and services pledge to support it, more software can use it, more people should see it, and so it grows. Meta announced it was building tools to support the industry group's specification this week.

"We believe that adopting these methods for establishing provenance and encouraging users to recognize these signals are key to increasing the trustworthiness of digital information," OpenAI declared.

We're told ChatGPT via the web and DALL-E 3 model via its API will now include the C2PA metadata, and mobile users will get it by February 12. ®

OpenAI Latest To Add 'Made By AI' Metadata To Model Work

From Chip War To Cloud War: The Next Frontier In Global Tech Competition

The High Stakes Of Tech Regulation: Security Risks And Market Dynamics

The Tyranny Of Instagram Interiors: Why It's Time To Break Free From Algorithm-Driven Aesthetics

The Data Crunch In AI: Strategies For Sustainability

Google Abandons Four-Year Effort To Remove Cookies From Chrome Browser

LinkedIn Embraces AI And Gamification To Drive User Engagement And Revenue

navigation

media coverage

follow us