Using AI to Normalize and Improve Scan Rule Documentation at Codacy

In this article:
Subscribe to our blog:

At Codacy, we integrate 34 open-source tools to provide insights into code quality and security across 43 different programming languages. Many of these tools come with plugins and add-ons, giving our users access to over 22,000 unique rules. These rules are frequently updated: new ones are added, and existing ones are improved or deprecated.

To help users navigate this ever-evolving landscape, Codacy imports and displays the official documentation for each tool’s rules. But this raises a critical question:

How can we consistently provide clear, up-to-date, and standardized documentation for thousands of rules coming from dozens of different sources?


The Challenge

There are several challenges:

  • Tools don't follow a common documentation format; some provide code examples, others don't.
  • The level of detail and tone can vary wildly between tools.
  • Plugins and add-ons often lack the same documentation standards as their parent linters.

And beyond just displaying documentation, Codacy also normalizes rules by assigning categories, severities, and tagging them as "recommended" or not. However, many tools don’t define these attributes themselves, or if they do, they might use different criteria than Codacy. This means our team had to invest significant manual effort to:

  • Curate and enrich documentation for thousands of rules,
  • Implement complex logic to import and standardize content,
  • Continuously adapt to tool updates and changes.

 

Using AI to Normalize Documentation

To reduce this manual workload and improve consistency, we built and trained an AI assistant to help us automatically normalize and enhance rule documentation.

Here are a few examples of the improvements we’ve achieved—before and after applying the AI assistant:

Analyzer Rule before improvements Rule after improvements

GoSec (Golang)

G101

Look for hardcoded credentials

(No explanation)

Avoid Hardcoded Credentials

Detects hardcoded credentials in the code.

Why is this a problem?

Hardcoded credentials can lead to security vulnerabilities as they may be exposed in source code repositories or logs, allowing unauthorized access to systems.

How do you fix it?

Remove hardcoded credentials from the code and use secure methods such as environment variables, configuration files with proper access controls, or secret management tools to handle sensitive information.

Pylint (Python)

not-in-loop (E0103)

%r not properly in loop Used when break or continue keywords are used outside a loop.

Explanation

%r not properly in loop Used when break or continue keywords are used outside a loop.

Avoid using break or continue outside loops

Detects usage of break or continue statements outside of loops.

Why is this a problem?

Using break or continue outside of a loop causes a syntax error and indicates a logical mistake in the code structure.

How do you fix it?

Ensure that break or continue statements are only used inside loops such as for or while loops.

Revive (Golang)

time-equal

This rule warns when using == and != for equality check time.Time and suggest to time.time.Equal method, for about information follow this link

Explanation

Description: This rule warns when using == and != for equality check time.Time and suggest to time.time.Equal method, for about information follow this link

Configuration: N/A

Enforce time.Time Equality Using Equal Method

Warns when using == or != operators to compare time.Time values and suggests using the time.Time.Equal method instead.

Why is this a problem?

Using == or != operators to compare time.Time values can lead to incorrect results because these operators compare the entire struct including unexported fields, which may not reflect actual time equality. The time.Time.Equal method correctly compares the time instant represented by the values.

How do you fix it?

Replace any usage of == or != operators for comparing time.Time values with the time.Time.Equal method to ensure correct equality checks.

Good Examples

t1.Equal(t2)

!t1.Equal(t2)

Bad Examples

t1 == t2

t1 != t2

Cppcheck (C/C++)

Array index 'i' is used before limits check.

Defensive programming: The variable 'i' is used as an array index before it is checked that is within limits. This can mean that the array might be accessed out of bounds. Reorder conditions such as '(a[i] && i < 10)' to '(i < 10 && a[i])'. That way the array will not be accessed if the index is out of limits.

(No explanation)

Avoid Using Array Index Before Limits Check

Detects when an array index variable is used before confirming it is within valid bounds, which can lead to out-of-bounds array access.

Why is this a problem?
Using an array index before verifying it is within the array limits can cause undefined behavior, including crashes or data corruption, due to out-of-bounds access.

How do you fix it?
Ensure that any array index is checked against the array limits before using it to access the array. For example, reorder conditions to check the index first before accessing the array element.

Good Examples
if (i < 10 && a[i]) { /* safe access */ }

Bad Examples
if (a[i] && i < 10) { /* unsafe access */ C}

Semgrep (22 languages)

unsafe-path-combine

String argument $A is used to read or write data from a file via Path.Combine without direct sanitization via Path.GetFileName.

Explanation

String argument $A is used to read or write data from a file via Path.Combine without direct sanitization via Path.GetFileName. If the path is user-supplied data this can lead to path traversal.

Avoid Unsafe Path.Combine Usage Without Sanitization

Detects usage of Path.Combine with string arguments that are not sanitized via Path.GetFileName, which can lead to path traversal vulnerabilities.

Why is this a problem?

Using Path.Combine with unsanitized user-supplied input can allow attackers to perform path traversal attacks, potentially accessing or modifying unauthorized files on the filesystem.

How do you fix it?

Ensure that any string arguments used in Path.Combine are properly sanitized using Path.GetFileName or equivalent methods before combining paths to prevent path traversal.

Good Examples

var safeFileName = Path.GetFileName(userInput);
var fullPath = Path.Combine(basePath, safeFileName);

Bad Examples

var fullPath = Path.Combine(basePath, userInput);

Checkov
(infrastructure-as-code)

Ensure that SSL validation isn't disabled with dnf

Explanation

More information [here].

Ensure SSL Validation Is Not Disabled with dnf

Checks that SSL validation is enabled when using dnf to avoid insecure package downloads.

Why is this a problem?
Disabling SSL validation can expose the system to man-in-the-middle attacks, allowing attackers to intercept or tamper with package downloads, leading to potential compromise or installation of malicious software.

How do you fix it?
Configure dnf to keep SSL validation enabled by not setting sslverify to false or any equivalent disabling option in the configuration or command usage.

Good Examples

- name: Install packages securely
ansible.builtin.dnf:
name: httpd
state: present
sslverify: yes

Bad Examples

- name: Install packages insecurely
ansible.builtin.dnf:
name: httpd
state: present
sslverify: no

New to Codacy? 

Add your first project in a few clicks and we'll have your results ready while you read the rest of the article. Free of charge.

Smarter Categorization and Tagging with AI

Beyond improving the documentation itself, our AI assistant also helped tackle one of the biggest pains in managing thousands of rules: consistent categorization and tagging.

In the past, assigning a category (e.g., "Code Style", "Security", "Performance"), a severity level (e.g., "Minor", "Medium", "Critical"), or identifying relevant topics (like "ReactJS" or "Accessibility") relied on a mix of hardcoded heuristics, keyword matching, and manual overrides. This approach was fragile, hard to scale, and often missed the nuance of what each rule was truly about.

Now, instead of relying on scripted rules to infer this metadata, we have an AI assistant that reads and understands each rule’s documentation, just like a human reviewer would. This means:

  • Categories are assigned based on actual rule intent and context, not just keywords.
  • Severities reflect the potential impact of the rule, taking into account language norms and best practices.
  • Tags are more precise, helping users filter and discover rules relevant to their goals or concerns.

We no longer have to "guess" a rule's purpose based on pattern matching. Instead, we get a much richer and more coherent catalogue, with metadata that actually reflects the diversity and complexity of the tools we integrate.

 

Analyzer Rule before improvements Rule after improvements

ESLint 9
(JavaScript / TypeScript)

Lodash: Consistent compose

Lodash has two ways to compose functions: left to right (_.flow) or right to left (_.flowRight). Composing functions right to left has syntax that is in the same order as chaining, while composing function left to right is similar to actual function application. This rule enforces a consistent style.

Severity: Critical 🛑
Category: Error Prone 🐞

Enforce Consistent Lodash Compose Method

Using inconsistent function composition methods can lead to confusion and reduce code readability, as the order of function application differs between left-to-right and right-to-left composition styles.

Choose one composition method (either left-to-right using flow/pipe or right-to-left using flowRight/compose) and consistently use it throughout the codebase.

Severity: Minor ℹ️
Category: Code Style 🎨
Tags: lodash

ESLint 9
(JavaScript / TypeScript)

Jsx a11y: Aria role

Enforce that elements with ARIA roles must use a valid, non-abstract ARIA role.

Elements with ARIA roles must use a valid, non-abstract ARIA role. A reference to role definitions can be found at WAI-ARIA site.

Severity: Minor ℹ️
Category: Code Style 🎨

Enforce Valid ARIA Roles in JSX

Using invalid or abstract ARIA roles can lead to accessibility issues because assistive technologies rely on correct ARIA roles to interpret and interact with UI elements properly. Invalid roles may confuse users relying on such technologies and reduce the accessibility of the application.

Use only valid, non-abstract ARIA roles as defined by the WAI-ARIA specification. Avoid empty or invalid role attributes. For custom roles, configure the rule to allow specific invalid roles if necessary, and optionally ignore non-DOM components if applicable.

Severity: High ☣️
Category: Best Practice 📘
Tags: accessibility

ESLint 9
(JavaScript / TypeScript)

Vuejs accessibility: Anchor has content

Enforce that anchors have content and that the content is accessible to screen readers. Accessible means that it is not hidden using the aria-hidden prop. Refer to the references to learn about why this is important.

Severity: Critical 🛑
Category: Error Prone 🐞

Enforce Anchor Has Accessible Content

Anchors without accessible content are problematic for users relying on screen readers, as they cannot understand the purpose or destination of the link, leading to poor accessibility and user experience.

Make sure every anchor element contains readable content or accessible child elements that are not hidden with aria-hidden. Use text, components recognized as accessible children, or directives that provide accessible content.

Severity: Medium ⚠️
Category: Best Practice 📘
Tags: accessibility

PHP CodeSniffer (PHP)

DB: Restricted Functions

(No description or explanation)

Severity: Minor ℹ️
Category: Code Style 🎨

Avoid Restricted Database Functions

Using restricted database functions can lead to security vulnerabilities, compatibility issues, or maintenance problems within WordPress projects.

Replace restricted database functions with recommended alternatives that adhere to WordPress coding standards and best practices.

Severity: High ☣️
Category: Security 🔒
Tags: wordpress

To visualize the impact, we also analyzed the overall distribution of categories and severities before and after using the AI assistant. The charts below show how much better balanced and representative our metadata has become:

image-Jun-29-2025-08-04-34-0898-PM

These improvements make it significantly easier for users to browse, filter, and focus on what matters most for their codebases—whether that’s fixing critical security issues, improving maintainability, or enforcing style consistency.

Take for example just a single tool like Checkov and the impact this change had on its patterns:

image-Jun-29-2025-08-05-06-6536-PM

All Checkov patterns were labeled as Medium severity; now most of them will be High, with a lower share of Medium and Critical ones. And they were mostly categorized as Error Prone, with only some categorized as Security, when in reality, most of the patterns were related to Security problems. For example:

 

Analyzer Rule before improvements Rule after improvements

Checkov (Infra-as-code)

Ensure terraform is not sending SSM secrets to untrusted domains over HTTP

Severity: Medium ⚠️
Category: Error Prone 🐞

Ensure Terraform Does Not Send SSM Secrets to Untrusted Domains Over HTTP

Severity: Critical 🛑
Category: Security 🔒

Where We Are and What’s Next

The improvements powered by our AI assistant are already making a difference. The updated rule documentation and enriched tagging are now live and available in the Codacy platform. Users can see these enhancements when browsing and selecting patterns, and also when viewing details for issues detected in their code.

While categories and severities haven’t yet been updated in the product, we’re currently in the process of validating these changes to ensure they can be rolled out safely and without disrupting workflows. This step is critical, as these attributes are tied to many parts of the Codacy experience.

Once the new severities and categories are deployed, users will begin to see gradual changes in their dashboards and historical metrics as their repositories are reanalyzed. These shifts reflect a more accurate and consistent classification of rules, which will help teams focus on what truly matters—whether that’s fixing critical issues or improving long-term maintainability.

This update will also directly impact our Coding Standard assistant, which uses categories and severities to recommend a tailored set of rules for each team. With better metadata in place, the assistant will be able to generate more relevant and effective default configurations.

We’re excited about what this unlocks: a smarter, more intuitive Codacy, built to help developers make better decisions, faster. And this is just the beginning—by combining automation with AI understanding, we’re laying the foundation for the next generation of intelligent developer tooling.

How do your projects perform? 

Start free today and see your security and quality scan results within minutes.
No credit card required.

RELATED
BLOG POSTS

Thousands of New SAST Rules Added With Semgrep Integration
If 2023 taught us anything, it’s that code quality and code security are inextricably linked. Their main commonality? They are both required upstream...
Deciphering Javascript Checkers: Know Why and When You Might Use Them
We’re all familiar with rules of grammar, which aid in communication by letting us know when to use a comma, how to spell a word or the proper use of...
Vulnerability in Rules Files With Hidden Unicode Characters
Pillar Security, a cybersecurity company specializing in securing the entire lifecycle of artificial intelligence (AI) applications, just released...

Automate code
reviews on your commits and pull request