It might be time for IT to consider AI models that don’t steal

by Samantha Rowland July 22, 2025

written by Samantha Rowland July 22, 2025 3 minutes read

In the age of rapid technological advancements, the integration of Artificial Intelligence (AI) models into IT systems has become a common practice for many enterprises. However, as organizations delve deeper into the realm of generative AI initiatives, concerns regarding the legal implications of these models have started to surface. Despite the extensive data refinement processes that precede the deployment of large language models (LLMs), the opacity surrounding the training data used by major model creators poses significant risks.

For instance, leading entities like OpenAI, Google, AWS, Anthropic, Meta, and Microsoft do not disclose crucial details about their training data, such as its age, reliability, or compliance with privacy and copyright regulations. This lack of transparency raises questions about the potential infringement of intellectual property rights and the likelihood of unknowingly utilizing stolen data. In a scenario where AI models are trained on unauthorized information, enterprises could face legal repercussions, even if they were unaware of the data’s illicit origins.

Moreover, the derivative risks associated with unauthorized data usage further compound the legal uncertainties surrounding AI models. Consider a situation where valuable intellectual property, like a novel energy extraction technique, is unlawfully incorporated into an AI model and subsequently commercialized by a corporation. The original inventor might have grounds to claim a share of the profits, leading to complex legal disputes and financial liabilities for the involved parties.

Amidst these legal ambiguities, the concept of utilizing AI models that prioritize ethical data sourcing is gaining traction. Initiatives like Common Pile, Pleias, and Fairly Trained offer alternatives that aim to limit model training to legally permissible data, such as openly licensed or public domain content. Despite potential performance disparities compared to commercial models, these ethically trained models provide a safer alternative for enterprises seeking to mitigate legal risks associated with AI deployments.

On the other hand, major AI model vendors are beginning to offer indemnification services to address legal concerns related to copyright infringement. By committing to cover legal costs for potential lawsuits arising from model-generated content, vendors like IBM and Anthropic seek to alleviate the liability burden on their customers. While indemnification offers a level of protection, the scope of coverage and enforceability of such policies remain subjects of debate among legal experts.

In the midst of these evolving legal landscapes, IT professionals and decision-makers face the challenging task of balancing innovation with compliance. Retail giants like Macy’s, leveraging top-tier AI models, acknowledge the legal complexities inherent in AI deployments but believe that the benefits outweigh the risks. The shift towards using models with sanitized training data signifies a strategic risk transfer from enterprises to model vendors, who assume responsibility for the legality and integrity of the data used in their models.

As the debate between ethically sourced AI models and indemnification-backed commercial models continues, enterprises must weigh the legal uncertainties against the performance benefits offered by different AI solutions. While the allure of cutting-edge AI capabilities is undeniable, the imperative to navigate the intricate web of legal implications surrounding AI data usage remains paramount. In a landscape where legal compliance and technological innovation intersect, the choice between AI models that prioritize ethical data sourcing and those backed by indemnification arises as a pivotal decision for IT stakeholders.

accelerating innovation accessibility compliance AI copyright infringement AI training data Audio Technology Innovation auditing AI models decision-makers ethical data sourcing generative AI initiatives government IT professionals indemnification services intellectual property rights large language models Macy's

It might be time for IT to consider AI models that don’t steal

What Does a Crypto Launchpad Do?

It might be time for IT to consider AI models that don’t steal

You may also like