As artificial intelligence (AI) continues to advance, the ongoing clash between copyright law and AI has led many jurisdictions to reconsider how to enforce existing copyright laws, both when AI models infringe on protected works and when an AI model’s use of copyrighted content is exempt from liability. Balancing robust copyright laws and reliable guidance around AI innovation is crucial. In this second installment of our “Copyright and AI Basics” blog series, we are exploring the national initiatives in the United States and European Union aimed at addressing the complex issues intersecting copyright and AI. Check out our first blog in the series, “Copyright and AI Basics: A Software Perspective.”

Background

For years, software developers have been using AI to streamline development processes and enhance their technologies. However, the rise of new generative AI (GAI) platforms—particularly those with public-facing applications—has introduced new copyright challenges. These challenges primarily stem from the way GAI models interact with copyrighted content during both the training and output stages.

As courts around the world continue to examine the extent of copyright liability for GAI providers and users, key points of potential copyright infringement have emerged. Some countries offer text and data mining (TDM) exceptions in their copyright laws, while other countries rely on legal tests to determine when the copying of a work constitutes infringement. In the United States, this test is called “fair use” and is evaluated to safeguard the public’s constitutional right to freedom of expression. Many other countries have “fair use” or “fair dealings” tests that are evolving to address the unique challenges posed in today’s evolving AI landscape.

The Push for Global Transparency

 For ACT | The App Association members, the evolution of AI and the protections afforded by copyright laws are both important pieces in the innovation lifecycle. However, when these two critical components of software development come into conflict, our members turn to policymakers to provide clear guidance.  A significant issue hindering courts and innovators from accurately assessing an AI model’s potential for infringement is the lack of transparency around AI training datasets and the lack of public awareness around potentially infringing or AI-generated outputs.

As the legal and policy landscape develops, some jurisdictions are proposing transparency requirements for AI providers to give their users and copyright holders the visibility needed to continue developing incredible technologies. Most prominently, the US and EU have initiated national efforts to address transparency concerns at the training stage. While these efforts do not directly address how an AI model will provide the public with more transparency on the origins of their outputs (yet), improving transparency at the training stage is a step toward greater clarity across the AI value chain.

To prevent AI models from infringing on copyrighted works, some proposals in the US would require AI model providers to disclose data that their AI model is being trained on. The United States Patent and Trademark Office (USPTO) and the United States Copyright Office (USCO) have also conducted consultations (find our comments here, here, and here, respectively), ex parte meetings, and stakeholder listening sessions to gather feedback to inform their recommendations to the Biden Administration’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. These actions will include ways to promote transparency in the AI model training processes.

The EU’s recently enacted AI Act imposes minimum transparency requirements for all High-Risk AI Systems (HRAI), General Purpose Artificial Intelligence (GPAI), and AI systems generally. These requirements vary depending on the technology’s classification levels and the level of risk posed by the technology. While the App Association supports transparency obligations that foster reliable innovation, requiring an AI model provider to disclose all data trained on without considering potential harm could place undue burden on independent and small business developers. The EU’s tiered approach for evaluating risk offers a better (though still imperfect) solution, as it aligns compliance obligations with the level of risk, pinpointing preventable harms. However, we are concerned that its broad definitions and the cost of compliance could present significant challenges for small and medium-sized enterprises (SMEs) and startups.

Current and Proposed Copyright Carve-Outs

  • European Union: The EU AI Act references EU copyright laws to ensure compliance. In relation to the use and deployment of AI models, the law allows the use of copyrighted works through text and data mining (TDM) practices, if lawfully accessed, under two exceptions: 1) if it is done by research organizations or cultural heritage institutions for scientific research; 2) if the rights holder does not expressly prohibit the use of their copyright protected work in TDM practices (known as an “opt-out”). What the EU AI Act does not cover, and what EU copyright law leaves to the Member States, is the parameters around copyrightability, authorship, and ownership of an AI-assisted or AI-generated work.
  • United States: The United States is also contemplating carve-outs to copyright law through a recently introduced Content Origin Protection and Integrity from Edited and Deepfaked Media Act (COPIED) Act. This proposed legislation aims to enhance transparency around AI-generated and AI-assisted content by developing guidelines and standards for content provenance information, watermarking, and synthetic content detection housed by the National Institute of Standards and Technology (NIST). The bill also mirrors the EU’s “opt-out” regime by allowing content owners to use “content provenance information” to protect their work from being trained on by an AI model. Importantly, the COPIED Act prohibits tampering with content provenance information, with enforcement handled by the Federal Trade Commission (FTC) or the content owner themself. While the bill is a compelling solution towards transparency, without carve outs for the fair use doctrine, it raises significant concerns about potentially undermining US copyright law and the First Amendment.

 Addressing Liability for Infringing AI Outputs

Platform users are increasingly concerned about their liability in using AI to support creative and inventive processes. If a user unknowingly incorporates AI-generated content that infringes on a copyrighted work or is AI-generated, they may still be liable for copyright infringement. To address this issue, many jurisdictions are considering new transparency measures that would alleviate this issue by requiring AI model providers to clearly label, notify, or disclaim when outputs are AI-generated or AI-assisted.

As these transparency solutions are being discussed, it’s crucial to consider the impact on users and deployers of open-source software. While copyright law generally provides an author the exclusive rights to make, sell, or otherwise use their work, open-source software is shared with the public under specific licensing terms. When these terms are violated, it constitutes copyright infringement. While AI models certainly have the capability of infringing on open-source licenses, how to prove this and what liability will be assigned to the AI model remains to be seen.

Conclusion

The world is continuing to learn more about the intersection and interdependencies of AI, IP, and a more advanced society. Calls for transparency are clear, and while the perfect solution does not exist, thoughtful, flexible, and intentional interventions are possible. The App Association urges jurisdictions to focus on actions that are responsive to actual and understood harms rather than hypothetical use cases. Our members require defined guidance to develop best practices that support their development and protect existing rights.