How Semgrep calculates contributor count

This page explains how Semgrep calculates contributor count beyond the basic billing definition. It is intended to help explain why a contributor count may differ from your organization's internal estimate and how Semgrep reduces double-counting in repository history.

Why contributor counts can be hard to calculate

Raw commit history does not always cleanly map to unique people. The same contributor can appear under multiple identities over time, including:

Multiple company email addresses
Email aliases or formatting variations
GitHub-generated noreply addresses used in merge commits

Repository history can also include bots and automation accounts that should not count as human contributors. To make contributor counts more accurate, Semgrep applies normalization, deduplication, and filtering steps to the underlying commit data.

How Semgrep reduces double-counting

Semgrep uses commit metadata from scanned repositories to identify likely duplicate identities and count them once.

This process can include:

normalizing common email variations
matching contributors who appear under multiple company domains
resolving GitHub noreply addresses back to known contributor identities when possible

The goal is to better reflect distinct human contributors rather than counting every raw identity in commit history as a separate person.

How Semgrep handles personal email addresses

Personal email addresses sometimes appear in repository history alongside company-managed identities. Personal emails are weak identifiers and are harder to match reliably across environments. Semgrep applies some filtering rules to reduce overcounting and also keeps a pre-filtered version of the data for auditing and comparison.

If the primary domain for the deployment is a company domain, Semgrep does not count contributors who appear only with personal email addresses. It still counts contributors who have at least one company email address.
If the primary domain for the deployment is a personal email domain, such as gmail.com, Semgrep counts only contributors whose email matches that domain. It does not count contributors who appear only with other personal email domains.
If Semgrep cannot identify a primary domain, it does not apply personal email filtering.

How Semgrep handles bots and automation accounts

Contributor count is intended to measure human contributors, not automated systems. Semgrep excludes known bot and automation accounts from the calculation using maintained exclusion lists informed by bot-related patterns in commit metadata.

Public and private repositories

Public GitHub repositories that are explicitly set to be visible to everyone are excluded from contributor count calculations.

All GitHub Enterprise Server repositories are treated as private for this purpose, regardless of visibility.

Why your internal estimate might differ

A customer’s internal estimate may differ from Semgrep’s contributor count for a few common reasons:

One person appears under multiple identities in commit history
Bots or service accounts are present in raw repository data
Public repositories are excluded
Personal email addresses cannot always be matched reliably
Limited git history reduces the set of visible contributors

Because of this, contributor count should be understood as a usage metric based on observed repository activity over a defined period, not as an employee roster or HR headcount.

Why git history matters

Contributor count depends on the commit history available at scan time. If a checkout includes limited history, Semgrep might not see every contributor active during the full 90-day lookback window.

Questions about your contributor count

If you have questions about your contributor count, contact Semgrep support or your account manager for help reviewing the result.

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.

Why contributor counts can be hard to calculate​

How Semgrep reduces double-counting​

How Semgrep handles personal email addresses​

How Semgrep handles bots and automation accounts​

Public and private repositories​

Why your internal estimate might differ​

Why git history matters​

Questions about your contributor count​