Home All Posts How To Keep Your AI-Generated Code Secure

How To Keep Your AI-Generated Code Secure

Codacy Platform

24/09/2024

In this article:

Subscribe to our blog:

Codacy

16 mins read

Many software developers use AI coding assistants like ChatGPT and GitHub Copilot to significantly speed up their development process, relying on them for everything from writing code and getting coding suggestions to detecting bugs.

This trend presents an intriguing challenge: while AI-powered coding assistants can save time and allow developers to focus on creative, high-level tasks, an overreliance on AI will inevitably result in code quality and security issues.

Notwithstanding the risks, the reality is that AI-assisted coding is here to stay. And besides, the AI-generated code is probably no more buggy or vulnerable than the code many developers produce (particularly the less-experienced ones). So, regardless of who (or what) writes your code, checks should be implemented to ensure it meets quality and security standards.

So, what should your organization do to benefit from AI-generated code while avoiding the security and quality risks? Let’s explore how to keep AI-generated code secure.

Common Ways AI Can Introduce Insecure Code

Generative AI tools often learn to generate new code by analyzing vast amounts of existing code, including open-source repositories and legacy systems. This reliance on pre-existing code means that any vulnerabilities or security flaws present in the original can easily be replicated and amplified in the new code generated by AI.

Here are four errors to keep an eye out for in AI-generated code.

Inadequate Security Practices

When AI tools suggest or generate code, they may not adhere to best practices for security. For instance, they might recommend hardcoding sensitive information directly into the code, such as API keys or database credentials.

As an example, we asked ChatGPT to generate the code for a Node.js server app with a MongoDB connection and a few routes, but it hardcoded the secret key instead of storing it as an environment variable (it also didn’t require “dotenv” in the imports)

// app.js

const express = require('express');
const bcrypt = require('bcrypt');
const session = require('express-session');
const MongoStore = require('connect-mongo');
const mongoose = require('mongoose');
const { body, validationResult } = require('express-validator');
const app = express();

// Connect to MongoDB
mongoose.connect('mongodb://localhost:27017/simpleapp', {
useNewUrlParser: true,
useUnifiedTopology: true
});

// User model
const UserSchema = new mongoose.Schema({
username: { type: String, unique: true },
password: String
});
const User = mongoose.model('User', UserSchema);

// Middleware
app.use(express.urlencoded({ extended: false }));
app.use(session({
secret: 'supersecretkey', // hardcoded secrets keys
resave: false,
saveUninitialized: false,
store: MongoStore.create({ mongoUrl: 'mongodb://localhost:27017/simpleapp' })
}));

// Routes go here

This practice increases the risk of exposure, especially in collaborative environments where code is shared or stored in version control systems.

Perhaps we could’ve prevented the omission by being more specific with our prompt, but the point is that AI is not infallible, so you must take the right steps to secure your code.

No Input Validation

A common source of vulnerabilities is the failure to validate user input properly. AI might generate code that accepts input without implementing adequate checks, leading to issues like SQL injection, cross-site scripting (XSS), or buffer overflows.

If an application accepts user data without sanitization or validation, an attacker could input harmful SQL commands instead of expected data, potentially gaining unauthorized access to sensitive information or taking control of the database. Similarly, without proper validation, user-supplied scripts could be executed in a web context, compromising user sessions or stealing data.

Quality Issues

AI-generated code can sometimes deliver functional results but may not adhere to best practices for performance and efficiency. This can lead to issues such as unnecessary complexity, poor resource management, and suboptimal execution speed.

For instance, consider a Node.js function that handles multiple requests to read files:

const readFiles = (fileNames) => {
const results = [];
fileNames.forEach(fileName => {
const data = fs.readFileSync(fileName); // Synchronously reading each file
results.push(data);
});
return results;
};

While this code works as intended, it reads files synchronously, blocking the event loop and potentially causing performance bottlenecks. A more efficient approach would be to read the files asynchronously, allowing for better resource utilization and improved responsiveness:

const readFiles = async (fileNames) => {
const promises = fileNames.map(fileName => fs.promises.readFile(fileName)); // Asynchronously reading files
return await Promise.all(promises);
};

In this example, the AI-generated code fulfills its purpose but lacks the performance optimization that a more experienced developer would likely implement. Recognizing and addressing such quality issues is crucial for maintaining efficient and scalable applications.

Using Compromised Dependencies

AI-generated code can sometimes rely on older versions of software dependencies, which can create security vulnerabilities. This happens because the AI learns from existing codebases that may use outdated libraries or frameworks, leading to the adoption of known vulnerabilities in those versions.

For example, if an AI tool generates code that depends on an outdated version of a library with known security flaws, the resulting application may be at risk even if the code itself appears functional. Additionally, older dependencies often lack the latest security patches, exposing the application to threats already mitigated in more recent releases.

3 Steps For Securing Your AI-Generated Code

Vulnerabilities can sneak into your code in many subtle ways. For this reason, you must take a proactive approach to code security. The following practices are essential for identifying and addressing potential vulnerabilities early.

1. Implement Rigorous Code Review Processes

The most effective and non-obtrusive way to secure all human-written or AI-generated code is to incorporate automated security checks into the development pipeline through stringent code reviewing. Doing so lets you catch vulnerabilities in the early stages of development and fix them before pushing to production.

Codacy offers static application security testing (SAST), dynamic application security testing (DAST), software composition analysis (SCA), penetrative testing, and other tools to help you fix code-related issues and secure your code throughout the SDLC. Let’s explore how Codacy can help you secure AI-generated code.

Perform Real-time Analysis Within Your IDE

Most of the software creation process happens in the IDE. It’s also the place where most vulnerabilities get introduced into the code. This makes it the perfect place to fix code-related issues, and a good security tool needs to allow developers to do just that without interrupting the development flow.

Codacy’s IDE extensions enable real-time human and AI-generated code analysis within the Visual Studio Code editor and IntelliJ IDEA. Once installed, the plugin scans your code and suggests corrections in seconds, blending reliable code analysis functionality into your standard development workflow.

Rather than disrupting developers during their work and forcing them to wait for external code testing, Codacy operates alongside them in real-time. It catches mistakes as they occur, offering immediate feedback and actionable insights.

To scan a repository, add it to Codacy via your user dashboard. The plugin scans all open pull requests and categorizes issues by author, date, category, and severity. It supports over 40 programming languages, open-source tools, and frameworks.

Turn On Status Checks

Status checks help you catch vulnerabilities in pull requests. They automatically identify issues in code style, security, error proneness, performance, unused code, and other categories, allowing developers to fix them before merging the new code to the main branch.

Codacy checks your pull requests using your quality settings and sends a report to your Git provider showing whether the code changes are up to standards. If you enforced branch protection on your main branch (within your Git Provider), pull requests with issues are blocked from merging until they are resolved.

This step lets you catch vulnerabilities that may have slipped your IDE before pushing the code to production.

Deploy Other Security Tools

Beyond SAST and DAST, it’s essential to implement various other security practices, including SCA, pen testing, Infrastructure as Code (IaC) security, secrets detection, and Cloud Security Posture Management (CSPM) coming soon—all of which Codacy can help you with.

SCA ensures third-party components are secure. Penetration testing simulates attacks to reveal weaknesses, and IaC security automates checks for infrastructure configurations.

CSPM monitors cloud environments for misconfigurations, and secrets detection protects sensitive information embedded in code. Together, these security tools create a robust defense against potential threats, ensuring the integrity and safety of AI-driven applications.

2. Continuous Monitoring And Auditing

As continuous deployment becomes more ubiquitous, more and more organizations are ramping up their deployment frequency to delight their customers with new products and features. This rapid approach to deployment must go hand in hand with continuous auditing, which helps maintain code quality and security over time.

The requirement for being able to audit is even more important if changes do not follow the usual ‘blocking’ PR review process. Rather than asking users to track these, the more effective approach is to use automatic tooling to conduct your audits.

Codacy’s Security and Risk Management Overview page offers a clear view of your organization’s security health. The centralized dashboard showcases security risks and compliance challenges, helping teams pinpoint areas needing improvement and take action to enhance their security posture.

The dashboard presents key metrics like the number of open findings categorized by severity, resolution history, an overview of the highest-risk repositories, and the most frequently detected security issues. Product leaders can leverage this data to swiftly assess their organization’s security status and track progress over time.

3. Training And Thorough Examination

When developers understand how these tools function and the common pitfalls associated with their outputs, they can better evaluate and refine the code produced. This training can cover topics such as recognizing vulnerabilities that AI might inadvertently introduce, writing better prompts, researching, intellectual property protection, and many more.

Beyond that, it’s essential to thoroughly evaluate AI coding assistant tools to ensure they align with your organization’s policies and standards. Consider how the vendor safeguards your intellectual property and how transparent they are regarding the data used to train their language model.

The benefits of AI coding assistants are enormous. So yes, you should embrace AI. But always remember that these tools are imperfect, and you should always review AI-generated code. Codacy offers robust solutions that help identify vulnerabilities, ensure compliance, and streamline code reviews, making it easier to handle quality and security issues proactively.

Don’t leave your code’s safety to chance—sign up for Codacy today and take the first step towards a more secure coding environment.

Platform

How To Keep Your AI-Generated Code Secure