His death comes amidst growing controversy, as his final blog post alleges that OpenAI violated US copyright laws during the development of ChatGPT — a claim that could have global repercussions for the company.
Balaji’s detailed critique of OpenAI’s practices directly challenges the company’s reliance on “fair use” as a defence for training its generative AI models.
The blog, now under intense scrutiny, lays out how OpenAI’s approach fails all four legal criteria for fair use, potentially undermining its defence in the 13 copyright lawsuits it currently faces worldwide.
1. Commercial Purpose: Balaji argued that ChatGPT, as a profit-driven product, cannot qualify for fair use, as its development relies heavily on copyrighted material.
2. Nature of Work: Although less significant legally, he pointed out that generative AI often disregards the protective boundaries of highly copyrighted works, just like the internet does.
3. Substantiality of Data Used: While AI models avoid replicating data outright, Balaji critiqued their selective borrowing, which skirts legal definitions of substantiality.
4. Market Harm: He emphasised that OpenAI and other companies have signed licensing agreements with platforms like Reddit, Stack Overflow, and The Associated Press, proving that a data licensing market exists. Using copyrighted data without such agreements not only bypasses compensation for rights holders but also creates market harm—a central argument in copyright law.
These claims could further complicate OpenAI’s global legal battles, which include lawsuits in the US, Europe, and India.
ANI vs OpenAI
In India, news agency ANI has accused OpenAI of copyright infringement.
On November 19, the Delhi HC heard the case. ANI argued that Open AI is using publicly available data to train ChatGPT, including ANI’s news stories. The news agency argued that news stories published by them are protected under copyright laws, and are being illegally used by Open AI.
The case is now likely to be bolstered by Balaji’s argument that bypassing licensing agreements undermines creators’ revenue streams.
“Suchir Balaji’s views on the copyright implications of generative AI are based on the four-factor ‘fair use’ test under the US Copyright Act of 1976. Indian courts have, in the past, relied on the US four-factor test but have ultimately adjudicated on the basis of the concept of ‘fair dealing’ or other specific exceptions to copyright infringement under Section 52,” said Ranjana Adhikari, Partner-TMT, IndusLaw.
She also added that one of the issues framed by the Delhi HC in its order dated 19 November 2024 mentions the term ‘fair use’, it specifically restricts the scope of the issue to Section 52. Balaji’s analysis may have a vital persuasive value in the case (as any other foreign jurisprudence or an expert’s view), but it may not be a determining factor in the adjudication.
“As rightly pointed out by Balaji, the applicability of exceptions to copyright infringement, regardless of jurisdiction, have to be considered on a case-to-case basis,” Adhikari further explained.
Balaji’s blog is already being referenced and could serve as key evidence in ongoing litigation across the globe. If courts find OpenAI’s training practices unlawful, the tech giant faces potential damages, stricter regulations, and disruptions to its generative AI operations.
OpenAI, in their argument at the Delhi HC, had bragged that 13 similar cases are going on against the tech company globally including in the US, Canada and Germany and none have led to conviction.
The circumstances surrounding Balaji’s death remain unclear, but his critique has intensified scrutiny on OpenAI, placing it at the center of a global debate over AI ethics, copyright, and the future of generative technology.