Fake packages with code execution of malware

A Slovak security team published last week a security advisory on some PyPI packages.

In summary:

“SK-CSIRT identified malicious software libraries in the official Python package repository, PyPI, posing as well known libraries. A prominent example is a fake package urllib-1.21.1.tar.gz, based upon a well known package urllib3-1.21.1.tar.gz.

Such packages may have been downloaded by unwitting developer or administrator by various means, including the popular “pip” utility (pip install urllib). There is evidence that the fake packages have indeed been downloaded and incorporated into software multiple times between June 2017 and September 2017.”

SK-CSIRT eventually came to the conclusion that this malicious software wasn’t a major threat, but surely it could have been otherwise. So how did PyPI get in such trouble? And what lesson can we learn from that?

Background

Python packages are centralized into a main and single repository: PyPI.

PyPI is managed by the Python Software Foundation– i.e. a group of volunteers giving their time and skills to maintain the big packages index. The PyPI index has to deal with fake packages on a regular basis, and the Python community reacts as quickly as possible to get rid of them - that’s probably one of the many tasks that keep the PyPi maintainers busy. It’s not uncommon to see fake packages introduced in the index. In fact, we noticed such packages already on Gemnasium, because we are synchronizing our database with the PyPI registry. Those packages are easily recognizable: they have a name with various patterns, all pointing to some site or some phone numbers (ex: “AOL_Support_I8OO-xxx-xxxxAOL_Phone_Number”). This is not a security issue per se since these packages are never meant to be installed. They don’t even contain code, most of the time, and they’re just here for SEO purposes.

The security issue revealed by SK-CSIRT is a bit different. Some very popular packages like acquisition were duplicated, and re-published with almost the same names.

Impersonification

In this precise case: acqusition (notice the missing “i”). While a program won’t be affected by such trick, humans are likely to fall into this trap. Since the fake packages were duplicated from the original ones, they expose the same api. Unless you dive into the code, or the different name, you won’t notice a thing.

acqusition package page

In a context where developers manually edit their dependency files, there’s a slight chance for a typo to be leveraged by an attacker. If we consider the great number of, let’s say, Python projects, it’s very likely to happen, even though the probability of such a typo is quite low.

If something may happen, then it will happen if the number of trials is big enough. There are several references for that actually: The infinite monkey theorem, or the Murphy’s law.

“Anything that can go wrong will go wrong”

Risk

The risk is obvious. While the fake packages detected by SK-CSIRT contains “benign malware”, some information leaked to unknown destination:

“The malicious code added to the fake package is executed as soon as the developer or system administrator installs the package (which is often done with administrator privileges).

The executed code in identified samples is only used to report the following information, using a HTTP request to a remote server at http://121.42.217.44:8080/ :

  • name and version of the fake package
  • user name of the user who installs the package
  • hostname

The clear text data may look like this: Y:urllib-1.21.1 admin testmachine

Hopefully, this is not critical information, but the result could have been a lot different, and catastrophic for some companies. This kind of attack is resistant to code review because of the way the human brain works: we tend to see what we already know (pattern recognition), so a typo like “acqusition” can remain unnoticed while reviewing the source code.

We can easily imagine a fake package intercepting parameters in http requests, and send users + clear passwords to a remote server in real time. Most of the websites hosted on the Internet are protected by a firewall, but admins often “forget” to filter all output traffic.

In 2013, Benjamin Smith created a bunch of Ruby gems to demonstrate the risk of installing untrusted packages. A simple demo he maid was to ask the public to install a specific gem during his talk. Most attendees executed the command gem install [malicious gem here] without any doubt, and most of time with sudo, making things even worse. Some personal data started to appear in real time on Benjamin’s screen, as the gem was installed by the audience.

The malicious software can sneak in one of two places:

  1. The package pre/post scripts
  2. The package code itself

The former is not an issue for package managers like go get for go. They don’t run pre/post scripts, but the later is certainly an issue for everyone, no matter what language they use.

The npm ecosystem was affected by a similar security issue last August when Oscar Bolmsten found some malicious code in a package:

We can go on and on with many other examples, in various languages, you get the idea now.

Solution

We’ve talked about firewalls, and restricting access to unknown external IPs from your servers (especially the DB server, which should be completely isolated from the outside). The failures in outgoing traffic should be monitored and actions taken in consequence. Do not leave unattended ongoing traffic without supervision.

A tool like Gemnasium will never replace human common sense when it comes to take actions. The only thing preventing from installing packages (as root?) is you.

Nevertheless, Gemnasium can help you to identify fake packages. All the versions of the malicious PyPI packages are now marked as containing an advisory. So should you accidentally have installed these packages, Gemnasium will instantly notify you, and strongly suggest to remove them.