North Korean hackers recently employed a technique called “typosquatting” to trick developers who misspelled a popular package name into downloading a Trojan program onto their machines. Such incidents demand a separate audit process on software supply chain for blocking malicious typosquatting attacks, similar to performing static and dynamic security testing.

This post was originally published on Stacklok.

Use Case: Blocking Malicious Typosquatting Attacks

Problem Statement

Problem Statement

Typosquatting attack is a prevalent form of URL hijacking, specifically targeting the URLs of software library package repositories with the intention to trick developers into downloading and using a Trojan program as a valid dependency in place of its official package.

Realization Approach

Realization Approach

Integrating a dependency audit process enriched with data insights for identifying official software packages, developers can run periodic checks to ensure that their source code repository is free from malicious package dependencies.

Solution Space

Solution Space

It empowers developers to automatically enforce security policies across their source code repositories to flag pull requests before merging. This approach attains early detection of security issues in the software supply chain via the shift left philosophy.

Earlier this year, Japanese cybersecurity officials determined that a North Korean hacking team (Lazarus Group) had uploaded tainted packages to the Python PyPI registry. The hackers used a strategy called “typosquatting,” giving their packages a similar name to the reputable package “pycrypto,” an encryption toolkit for Python—for example, one package was named “pycryptoenv.” 

This strategy tricked developers who misspelled the name of “pycrypto” into downloading a malicious package on their machine that infected it with a Trojan program, “Comebacker,” that could be used to inject malware and ransomware, and steal credentials. 

This isn’t the first time malicious attackers have used this strategy with PyPI packages. Back in 2019, two libraries from the same developer were uploaded to PyPI with similar names to popular libraries; when installed, they would steal SSH and GPG keys from the projects of infected developers. One of those typosquatting packages had been available for nearly a year before it was detected. And in 2023, an attacker uploaded thousands of malicious packages to the PyPI registry with randomly generated names that were similar to reputable packages. 

Because typosquatting attacks like these are becoming more common, we’ve taken action in Trusty to proactively analyze packages that are uploaded to public registries (including PyPI, npm, Maven, Go, and crates) for the likelihood of typosquatting. Here’s how we do it.

Detecting Typosquatting in Trusty  

Trusty is a free-to-use web app from Stacklok that analyzes data about thousands of open source packages and ranks them based on their supply chain risk. Trusty looks at factors like repo and author activity; the presence of security best practices, like artifact signing; and the presence of malicious activity, like typosquatting and starjacking.

To identify likely typosquatting attacks, we rely on popularity data and a method of data analysis called the “Levenshtein distance.” Here’s how it works. 

Table 1 below shows the names of malicious typosquatting packages that have been discovered in the past, and how those names compare to the popular package the developer intended to install.  

You can see that the differences between the two packages names are very slight—one or two changes—making it easy for a developer to mistype and accidentally install malicious code.

Malicious PackagePopular PackageDifference
mumpynumpyReplace “m” by “n”
virtualnvvirtualenvDelete “e”
cryptcryptoDelete “o”
pysprakpysparkSwap “a” and “r”
setup-toolssetuptoolsInsert “-”
urlib3urllib3Delete “l”
openvcopencvSwap “c” and “v”

Table 1: Past examples of typosquatting packages

Step 1: Identify packages that have similar names to a given package

To find similarly named packages, Trusty needs to calculate the distance between the name of a given package and the names of all known packages. 

In information theory, the Levenshtein distance is a widely used measure of distance between two strings. For a pair of strings (x, y), the Levenshtein distance is defined as the number of deletions, insertions, or substitutions required to transform x into y. 

For example, the Levenshtein distance between “test” and “best” is 1, as “test” can be transformed into “best” with one substitution: replacing “t” with “b”. Table 2 below shows the Levenshtein distance for the packages from Table 1.

Malicious PackagePopular PackageLevenshtein Distance

Table 2: The Levenshtein distance for past examples of typosquatting packages

For a given package, Trusty uses Levenshtein distance to identify the packages with similar names.

Step 2: Assign a typosquatting score for the given package

As mentioned earlier, attackers typically name typosquatting packages similarly to existing popular packages. In Trusty, we use repo and author activity as a proxy for package popularity, and assign scores for both (read more about our scoring here). Repo activity scores are based on factors including the number of stars, forks, open issues, and watchers, while author activity scores are based on the number of public repos that author has, as well as number of followers. The repo and author activity scores are combined into a single activity score. Malicious packages tend to have a particularly low activity score. 

Along with Levenshtein distance, Trusty uses the activity scores to assign a typosquatting score for some given package, say X. If the activity score of package X is lower than the minimum activity score among similarly named packages, it is highly likely to be a typosquatting package. Therefore, Trusty assigns it a lower typosquatting score closer to 0. 

Conversely, if the activity score of package X exceeds the maximum activity score among similarly named packages, it is less likely to be a typosquatting package. In this case, Trusty assigns it a high typosquatting score closer to 10. 

If the activity score of package X falls between the minimum and maximum activity scores among similarly named packages, Trusty assigns it a score between 5 and 8, based on where package X’s score lies within that range.

Step 3: Factor the typosquatting score into the overall Trusty Score for the package

Trusty aggregates the various scores it computes alongwith the typosquatting score to assign an overall score to a package. The aggregate score of a typosquatting package tends to be very low.

Examples of Typosquatting Scores in Trusty

Let’s take a look at some actual examples of package typosquatting scores in Trusty. 

For the reputable Python package requests, Figure 1 below shows the typosquatting score calculated by Trusty. Since requests is a popular package with high repo and author activity, it is unlikely to be a typosquatting package. As expected, the typosquatting score of requests is high.

Figure 1: The typosquatting score for the reputable Python package “requests”

Figure 2 below shows the typosquatting score of a Python package named “requests5.” This package has a name that is very similar to the reputable requests package, but it has no repo and author activity scores since its repo and author information is not available. So it is very likely a typosquatting package. As a result, its typosquatting score is lower.

Figure 2: The typosquatting score for the likely malicious package “requests5”

Trusty also displays all of the potential typosquatting packages for a given package, as shown below. This can help our security researchers (and external researchers) identify potential real-world instances of typosquatting, and make developers aware when they’re installing the popular package.

Figure 3: Warnings of possible typosquatting attacks on the reputable “requests” package

How Stacklok Helps In Blocking Malicious Typosquatting Attacks

To avoid falling prey to typosquatting attacks, developers need to exercise caution when installing open source packages, and make sure they’re installing the right one. 

In Trusty—as shown above—typosquatting packages have low overall Trusty scores, so you can use Trusty to evaluate open source packages before you install them. To effectively catch typosquatting packages before they’re integrated into your source tree, you can use Stacklok’s open source platform, Minder

Minder helps you apply and automatically enforce security policies across your repos, including a policy to flag pull requests that contain dependencies with low Trusty Scores. With this policy in place, Minder can alert you or even block a pull request that introduces low-scoring dependencies (you can configure your scoring threshold). This is a low-friction way that allows you to catch risky dependencies—like typosquatting packages—before a PR is merged. 

About the author Staff

Showcasing and curating a knowledge base of tech use cases from across the web.

TechForCXO Weekly Newsletter
TechForCXO Weekly Newsletter

TechForCXO - Our Newsletter Delivering Technology Use Case Insights Every Two Weeks