Codacy Just Teased its New GPL License Scanner for AI Code

In this article:
Subscribe to our blog:

Key Outcomes:

  • AI coding assistants don't check licenses before generating code suggestions. That creates hidden GPL compliance risks that could force you to open-source your entire proprietary application.

  • Traditional Software Composition Analysis (SCA) tools only scan declared dependencies which miss AI-generated code that's structurally similar to GPL-licensed projects without directly importing them.

  • Codacy Guardrails addresses this by scanning AI-suggested code in real-time within your IDE. It flags GPL similarities before you commit and deploy potentially problematic code.

Imagine this: your AI assistant suggests a code snippet. It looks and runs great and solves the problem. So you commit it and move on.

But a few months later, your legal team gets a letter. That snippet, it turns out, was GPL-licensed. And because you used it in proprietary software, you've triggered a compliance violation. Your options are now to either open-source your entire application or prepare for litigation.

Scenarios like this aren’t edge cases anymore. As AI coding assistants become standard tools across engineering teams, being trained on code from places like Stackoverflow, the risk of licensing violations, especially with restrictive copyleft licenses like GPL, has become very real.
At Codacy’s July showcase, a panel of legal, security, and developer experts came together to unpack what that risk looks like today, and what companies can actually do about it:

  • Kendrick Curtis, VP of Technology at Codacy, opened with a blunt assessment: most organizations have no idea what their AI agents are pulling into their repos.
  • Nuno Lima da Luz, Senior Associate at Cuatrecasas, explained how IP law and the concept of “moral rights” creates even more exposure for companies operating in the EU.
  • James Berthoty, founder of Latio.tech, described the operational chaos that comes with trying to audit thousands of dependencies across hundreds of services.
  • Luís Ventura, Codacy’s Product Engineer, walked through the company’s newest feature: real-time GPL similarity detection inside the IDE, built to surface risks at the moment of code generation, not after the fact.

The takeaway was clear: AI might be changing how we write code, but it’s also changing what we’re legally responsible for. And without the right visibility, teams are walking straight into liability.

AI Doesn’t Understand Licensing But the Law Still Applies

Developers often assume open source means free to use. That's not always the case.

Licenses like MIT and Apache are relatively permissive. You can use and modify the code with few restrictions. GPL, LGPL, and AGPL work differently. If you use GPL-licensed code in your application, the "copyleft" requirement may force you to release your entire codebase under the same license.

 

“With the copyleft licenses like GPL, if you use it, it can be understood as using a derivative work... it makes the entire work licensed under that GPL model.”

– Nuno Lima da Luz (Senior Associate at Cuatrecasas)

The difference matters because GPL creates what's called a "viral" effect. Use even a small portion of GPL code, and your entire application may need to be open-sourced.

 


This risk isn’t new, but the way code enters projects has changed. AI assistants pull from massive public datasets, many of which include GPL code. The models don’t check license terms before generating output. They just complete patterns. If the result overlaps with GPL-covered logic, you could be on the hook whether you realize it or not.

In the EU, the legal exposure is even broader. Copyright law there includes “moral rights,” like the right to attribution and the right to prevent distortion of a work. These rights stay with the original creator and can’t be waived, even if the code is reused unintentionally.

So while your team might not copy code directly, AI might generate something close enough. And in the eyes of the law, close enough can be enough.

 

What One Line of AI-Generated Code Can Cost You

GPL license violations have already landed in court with serious consequences for major companies.

One case involved BusyBox, a common set of Unix utilities used in embedded devices. Vizio included it in their smart TVs but didn't release the source code as required under GPLv2. The Software Freedom Conservancy, a nonprofit watchdog group that enforces open source licenses, filed suit on behalf of the BusyBox developers. Vizio’s case is still ongoing and is currently scheduled for trial.

 

 

This didn't happen because a developer did something reckless. It occurred because no one saw the license risk until the product was already on the market.

James Berthoty described how difficult it is to catch this kind of risk once the code is live. Teams might be using thousands of packages across hundreds of services. By the time legal or compliance raises a concern, security is chasing down dependencies, engineers are mapping usage, and no one can say with certainty where the original logic came from or whether it should have triggered a license review.

At that point, the question isn't whether the code is allowed. It's whether anyone can prove where it came from.

Why Traditional License Checks Don’t Catch This

Most companies, including Codacy, rely on Software Composition Analysis (SCA) tools to track license risk. These tools work by identifying third-party packages and checking their declared licenses. But they don’t look at individual code snippets, compare structure, or scan what the AI just wrote in your editor.

That gap is why license violations often go unnoticed until a legal review or a lawsuit surfaces them. If the snippet came from an AI model, and the license isn't declared, the SCA tool won't catch it.

The timing problem compounds the issue. SCA tools typically run during CI/CD or scheduled scans, well after the developer has written and committed the code. By then, the AI-generated snippet is already integrated into the codebase, potentially spreading across multiple files and functions.

Even when SCA tools do run, they're looking for explicit package imports and dependencies. They can't detect when your AI assistant generates code that's structurally similar to GPL-licensed functions without directly importing them. The similarity-based license violations that AI creates fall completely outside traditional SCA detection methods.

License scanning needs to happen at the point of code generation, not during post-hoc analysis. This requires tools that understand both the development workflow and the legal implications of code similarity, not just package dependencies.

Codacy’s Solution: Bringing License Awareness Into the IDE

Codacy's approach addresses this gap by scanning code as it's written, not after it's committed.

The license provenance scanning feature teased during the Showcase, and soon to be available with Codacy Guardrails, compares AI-suggested code against known GPL-licensed projects in real time.

The scanning happens at the source level, not through dependency analysis. There's no package file to examine because the code isn't imported as a dependency. Codacy analyzes the structure and logic patterns, detecting when AI-generated code resembles existing GPL implementations.

During our showcase, we demonstrated this with WordPress integration code. The AI assistant generated functional PHP code, but Codacy immediately flagged five similarity warnings to GPL-licensed WordPress core functions.


This puts license checking alongside security scanning and code quality checks in the developer workflow. When the AI suggests code that matches GPL patterns, developers see the warning in their IDE before committing. They can review the suggestion, find alternative implementations, or manually flag it for legal review while the context is fresh.

The integration works with popular AI coding assistants like Cursor and Windsurf through Codacy's MCP server, as well as directly in VS Code and IntelliJ through IDE extensions. 

License awareness becomes part of the development process rather than a separate compliance step. If you’re writing code with help from AI, license scanning belongs in your editor. Codacy Guardrails makes that possible. Start for free.

RELATED
BLOG POSTS

Open Source License Scanning: A Complete Guide
Building software with open-source components isn’t always the best idea. Licenses dictate how you can use, change, or share these components. And if...
Why Shift Left is Failing: Key Takeaways from Codacy’s Latest Showcase
"Imagine vibe-coding in your favorite LLM, without the vibe migraine."
Pair programming at Codacy and why we do it
Pair programming, also known as pairing or “dynamic duo” model is not a new concept, and it was pioneered by C/C++ guru P.J. Plauger (Scott W. Ambler,...

Automate code
reviews on your commits and pull request

Group 13