Hyas Blog | Easy as PyPI Pie

Published by Kell van Daal on Jun 07, 2023

Recently PyPI suspended new user sign-ups as well as new project registration, citing the volume of malicious users and malicious projects being created outpacing their ability to respond in a timely fashion. Registrations were closed for a weekend.

Python Package Index

PyPI is the Python Package Index, a repository of software for the Python programming language. And it's an important destination for developers. It contains many, many packages that developers use so they don't have to "reinvent the wheel" every time they create a new piece of software. Most software uses a lot of functionality that someone else already created and submitted to PyPI. It ranges from basic to very advanced functionality.

If you want to create a function that does an API call, do you really also want to program how to setup an HTTPS connection, including certificate checking etc? Of course not, you'll use a commonly used package that already handles all of that. This way developers can focus on the "new things" they want to develop and not have to spend more time coding the basic building blocks than actual innovation.

PyPI Popularity Contest

Watch our video "Understanding Attacker Infrastructure"

Depending on who does the measuring and how it is measured, Python is anywhere from the most popular programming to a bit down the list. But usually in the top three. This makes Python very widely used, which makes PyPI very popular by developers, which in turn makes targeting PyPI very popular with threat actors.

Being able to put malicious code in a popular package can mean literally millions of downloads per day. The top 20 most downloaded PyPI packages are all downloaded more than 3 million times a day at the time of this writing. So it's not a surprise that PyPI started enforcing two-factor authentication (2FA) for upload.pypi.org, hoping to protect legitimate projects and packages.

Too Legit to Quit?

However the sign-up suspension was not to protect current, legit projects. It was to stem the flood of new, malicious packages being uploaded. Whereas breaching a legitimate, popular package would be the jackpot for a threat actor, being able to create a new package that is easily mistaken for a legit, popular package would be a nice runner-up prize.

Take the package "urllib3" (an HTTP library) as an example. It has over 10 million downloads most days. A threat actor's malicious code in there would be jackpot, however it's also a name that's easily typo-squatted, for example "urlib3". It's also easy to have a package appear as the successor, for example "urllib4". Even though those wouldn't get 10+ million downloads a day, but even if it would only be 0.01%, that's still 1000 downloads each day.

Not a bad runner-up prize.

Lastly it's also possible to submit new packages that don't resemble existing ones, but promise functionality a developer might want. Often they will get the promised functionality, but it comes with unwanted extras like credential stealing code…

The Goal: A Successful Supply Chain Attack

So a threat actor has multiple options for getting malicious code into PyPI, but the goal is usually the same. A supply chain attack. A developer's computer is a juicy target. It's often less restricted than "regular users," has access to important company resources, and already runs a different code, often getting less "scrutiny" from tools like EDRs. And if as a threat actor you are even more lucky, maybe the software being written is for customers as well, being able to infect even more companies.

Repositories at Risk

The PyPI repository is outside the control of regular companies. Often companies will use private repositories, but often new packages will have to be added to them which can already be compromised.

Developers' computers are also more difficult to secure in many companies. They need special access for what they are doing. EDRs that analyze code would produce too many false positives to be useful. And they are often admin on their local machines.

Layered Defense

This is where a layered defense comes into play. Malicious code in a PyPI package usually sails right through your firewall, proxy, EDR etc. But the one thing malicious code in a package generally does that can be reliably detected, is communicating with threat actor infrastructure. Threat actors don't go to all the trouble of getting their code on PyPI without a way to monetize it, like ransomware or selling access. But that always requires the malicious code to phone home.

HYAS Protect is a solution that knows threat actor infrastructure. It maps threat actor infrastructure even before it's used, so it can detect traffic to that infrastructure before it shows up on any "list" or "threat feed". It will be able to detect and block communication from malicious Python packages without hindering legit traffic.

Learn more about:
HYAS Protective DNS
HYAS Threat Intelligence & Investigation

Schedule a demo with HYAS