Serving ruby gems, the paranoid way

As I wrote in a previous blog post, there are good reasons to be paranoid with Ruby gems: they may have been hacked and “enhanced” with malicious code. It would be great if we could check every gem that we want to install, including their dependencies. You may think “this is not practical at all”, and you are probably right. But still, I wanted to give this idea a try and learn about the challenges that people will face if they want to review their gems before installation.

Let’s consider a company whose business is about making web applications. The tech team is divided in two:

  • a development team that writes the company software, leveraging Ruby gems
  • a security team that focuses on security issues

The security team is in charge of reviewing all the gems needed to run the company applications. This policy could bring a lot of tension between the two teams, so I hope that members of both teams enjoy having coffee breaks together.

A check point

The security team wants to ensure that all the gems dependencies used by the company software are safe. So they set up a kind of check point: a process in which all gems needed by the development team will be reviewed and the unsafe ones filtered out.

Since the company does not trust the rubygems.org source anymore, the development team is not allowed to download gems directly from there. If they do so, they have to use a sandbox environment, which could be a self-contained virtual machine.

Once successfully been reviewed, the gems will be available for both development and production environments. Over time, the development team will have access to more and more safe gems to work with.

The workflow to validate a new gem could look like this:

  1. development team thinks they need new gems that have not been reviewed yet
  2. the sandbox environment is used to experiment with those gems
  3. the exact names and versions of each gem to review are identified
  4. the gems and all their dependencies are reviewed by the security team
  5. if deemed safe, the gems are made available on the internal gem server

In other words, the security team acts as a middleman for rubygems.org.

The development team should be able to do its job the normal way, using bundler and the gem command-line too with minimal annoyances.

The built-in gem server

To serve Ruby gems, the security team first considers using the gem server command. It comes with rubygems itself so there’s nothing special to install.

The gem server command serves all the gems installed on the machine; so it’s very easy to add a new gem and its dependencies with gem install.

The gem server runs on webrick and can only serve one client at a time. It does no caching and is not compatible with Rack. So it will not run under more powerful application servers such as puma or unicorn. Fortunately, most rubygems clients cache the data on their side.

Let’s grab a debian-compatible server and try to run the gem server under its own user account:

$ sudo adduser --home /var/lib/gem-server gem-server

$ sudo su - gem-server

$ export GEM_HOME=$HOME/gems
$ export GEM_PATH=$HOME/gems

$ gem fetch json -v 1.8.0
# security team checks json 1.8.0 gem
$ gem install json -v 1.8.0

$ gem server
Server started at http://0.0.0.0:8808

Note that I had to review and install a first gem before running the server, otherwise it would have complained about missing directories.

From here, many things could be improved:

  • add SSL-enabled proxy with nginx; it’s all about security, after all
  • run the gem server from user’s crontab, init scripts or supervisor tool
  • setup environment variables in the user’s profile

Even without these steps, the gem server works out of the box.

Client setup

Each member of the development team must setup their rubygems clients so that gems are only fetched from the company’s private gem server. It can be done using the gem sources command.

$ gem sources --add http://checkpoint:8808
http://checkpoint:8808 added to sources

$ gem sources --remove http://rubygems.org/
http://rubygems.org/ removed from sources

$ gem sources --list
http://checkpoint:8808

All these settings are stored in the .gemrc configuration file, making it very easy to share them with team mates.

Everything seems OK. Remember that most rubygems clients maintain a cache of downloaded gems, and this cache may already contain plenty of code that we do not trust. So it’s best to move to a new setup.

# store new gems into user's HOME directory
$ export GEM_HOME=$HOME/gems

# fetch new gems from there only
$ export GEM_PATH=$HOME/gems

# cleanup
$ rm -rf $GEM_HOME
$ mkdir $GEM_HOME

Now we are ready to go: no local gem, and the only remote gem is a trusted one.

$ gem list --local

*** LOCAL GEMS ***

$ gem list --remote

*** REMOTE GEMS ***

json (1.8.0)

Switching to the user account running the gem server, we can check that our rubygem client has been talking to our internal gem server, as expected:

$ gem server
Server started at http://0.0.0.0:8808
localhost - - [02/Sep/2013:09:51:40 CEST] "GET /latest_specs.4.8.gz HTTP/1.1" 200 75
- -> /latest_specs.4.8.gz
localhost - - [02/Sep/2013:09:58:11 CEST] "GET /latest_specs.4.8.gz HTTP/1.1" 200 75
- -> /latest_specs.4.8.gz

Add trusted gems

We are now done with the setup. Now, the development team wants to build a new web application using sinatra version 1.4.3. The security team should fetch, unpack and review the gem. If deemed safe, the gem can then be installed and shared using gem server.

# fetch the gem archive
$ gem fetch sinatra -v 1.4.3
Fetching: sinatra-1.4.3.gem (100%)
Downloaded sinatra-1.4.3

# extract it
$ gem unpack sinatra-1.4.3.gem
Unpacked gem: '/var/lib/gem-server/sinatra-1.4.3'

# make extensive review
$ vim sinatra-1.4.3/sinatra.gemspec sinatra-1.4.3/Rakefile

# install to share with others... but wait!
$ gem install sinatra -v 1.4.3^C

But something is wrong: gem install will install sinatra along with its dependencies, yet we have checked none of those!

The standard gem fetch command does fetch the dependencies, so I have written a small rubygems plugin called rubygems-deep_fetch. gem deep_fetch will ignore the dependencies that are already in your cache.

Equipped with gem deep_fetch, the security team goes back to hard work. They can ignore the packages that are already in the cache as they have already been checked. They just want to fetch and review the missing ones.

# fetch the gem and its missing dependencies
$ gem deep_fetch sinatra --version 1.4.3
Fetching: sinatra-1.4.3.gem (100%)
Downloaded sinatra-1.4.3
Fetching: rack-1.5.2.gem (100%)
Downloaded rack-1.5.2
Fetching: rack-protection-1.5.0.gem (100%)
Downloaded rack-protection-1.5.0

# unpack everything
$ gem unpack *gem
Unpacked gem: '/home/fabien/tmp/rack-1.5.2'
Unpacked gem: '/home/fabien/tmp/rack-protection-1.5.0'
Unpacked gem: '/home/fabien/tmp/sinatra-1.4.3'

# review everything
$ vim */*gemspec

# install everything
$ gem install sinatra -v 1.4.3
Fetching: rack-1.5.2.gem (100%)
Fetching: tilt-1.4.1.gem (100%)
Fetching: rack-protection-1.5.0.gem (100%)
Successfully installed rack-1.5.2
Successfully installed tilt-1.4.1
Successfully installed rack-protection-1.5.0
Successfully installed sinatra-1.4.3
4 gems installed

# share
$ gem server
Server started at http://0.0.0.0:8808

Fortunately, rack, sinatra and tilt are all small gems, so the security team was able to review all of them within a reasonable time. That would be different for complex gems like rails, obviously.

Playing with bundler

So far, we only have played with the rubygems client, however the development team is more likely to use bundler instead. They have updated the Gemfile for this new sinatra-based web application they are working on:

source "http://checkpoint:8088"

gem "sinatra"

Bundler is very good at caching, so to avoid cache effects every developer was asked to clean up his/her .bundler directory beforehand.

$ bundle config path ~/.bundler
Settings for `path` in order of priority. The top value will be used
  Set for the current user (/home/fabien/.bundle/config): "/home/fabien/.bundler"

$ rm -rf /home/fabien/.bundler

No we’re clean. Let’s run bundle twice to do some benchmarking.

$ time bundle
Fetching gem metadata from http://checkpoint:8808/.
Fetching full source index from http://checkpoint:8808/
Installing rack (1.5.2)
Installing rack-protection (1.5.0)
Installing tilt (1.4.1)
Installing sinatra (1.4.3)
Using bundler (1.1.5)
Your bundle is complete! It was installed into /home/fabien/.bundler

real    0m1.000s
user    0m0.688s
sys     0m0.084s

$ time bundle
...

real    0m0.481s
user    0m0.448s
sys     0m0.028s

The logs shows up some interesting things on the server side.

$ gem server
Server started at http://0.0.0.0:8808
localhost - - [02/Sep/2013:11:00:48 CEST] "GET /api/v1/dependencies?gems=rack,rack-protection,tilt,sinatra HTTP/1.1" 404 289
- -> /api/v1/dependencies?gems=rack,rack-protection,tilt,sinatra
localhost - - [02/Sep/2013:11:00:48 CEST] "GET /specs.4.8.gz HTTP/1.1" 200 157
- -> /specs.4.8.gz
localhost - - [02/Sep/2013:11:00:48 CEST] "GET /prerelease_specs.4.8.gz HTTP/1.1" 404 293
- -> /prerelease_specs.4.8.gz
localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/rack-1.5.2.gemspec.rz HTTP/1.1" 200 554
- -> /quick/Marshal.4.8/rack-1.5.2.gemspec.rz
localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/rack-protection-1.5.0.gemspec.rz HTTP/1.1" 200 764
- -> /quick/Marshal.4.8/rack-protection-1.5.0.gemspec.rz
localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/sinatra-1.4.3.gemspec.rz HTTP/1.1" 200 493
- -> /quick/Marshal.4.8/sinatra-1.4.3.gemspec.rz
localhost - - [02/Sep/2013:11:00:48 CEST] "GET /quick/Marshal.4.8/tilt-1.4.1.gemspec.rz HTTP/1.1" 200 615
- -> /quick/Marshal.4.8/tilt-1.4.1.gemspec.rz
localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/rack-1.5.2.gem HTTP/1.1" 200 216576
- -> /gems/rack-1.5.2.gem
localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/rack-protection-1.5.0.gem HTTP/1.1" 200 15872
- -> /gems/rack-protection-1.5.0.gem
localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/tilt-1.4.1.gem HTTP/1.1" 200 42496
- -> /gems/tilt-1.4.1.gem
localhost - - [02/Sep/2013:11:00:49 CEST] "GET /gems/sinatra-1.4.3.gem HTTP/1.1" 200 333312
- -> /gems/sinatra-1.4.3.gem

Bundler first tries to query the dependency API but it is unsuccessful since the feature is not available in the standard gem server. As a consequence, bundler falls back to retrieving the full index to resolve the gem dependencies on the client side.

By the way, the Rubygems 2 client also knows about this new dependency API, but Rubygems 1.8 does not.

We also notice that the server was not queried on our second bundle run. That means that bundler is smart enough to cache the dependency resolution. No network connection is required when nothing has changed in the bundle. Very nice.

The gem server can work with bundler, but it will quickly hits his limits as the security team adds more gem in the trusted gems database. Do you remember how slow bundler felt previous to 1.1 version? You got it.

Better gem serving with geminabox

geminabox makes it very easy to serve your own gems. It can be installed as a gem and has two main features:

  • a sinatra-based web application to host your gems
  • a plugin to add a new command to the gem tool

Once geminabox is installed, Rubygems is enhanced with a new gem inabox command. It expects *.gem arguments and behaves like the gem push command (that publishes to the official rubygems.org repository).

Usage of geminabox has already been covered in a few blog posts like Hosting your own rubygem server and Setting up a private ruby gems server.

The geminabox gem server is more efficient than gem server because it implements the dependency API. It is also compatible with Rack so it’s possible to run it using a modern web server (which can serve a lot faster than webrick). Rack compatibility makes it very easy to add SSL protection and HTTP authentication using middleware.

The security team is running the geminabox server under a dedicated user, using puma as application server:

$ sudo adduser --home /var/lib/geminabox geminabox
$ su geminabox
$ mkdir /var/lib/geminabox/data

$ puma --port 8808 config.ru
Puma starting in single mode...
* Version 2.5.1, codename: Astronaut Shoelaces
* Min threads: 0, max threads: 16
* Environment: development
* Listening on tcp://0.0.0.0:8808
Use Ctrl-C to stop

Here is a basic Rack config for the latest stable version:

require "rubygems"
require "rubygems/user_interaction"
require "geminabox"

Geminabox.data = "/var/lib/geminabox/data"
run Geminabox

Using geminabox rather than the standard gem server won’t break anything on the client side, however it may feel faster with bundler and rubygems 2 clients due to its support of the bundler dependency API.

The performance gain is noticeable, even if sometimes difficult to measure on small gem sets. Starting from an empty bundler cache, using geminabox on the server side will decrease our installation time from 1000 ms to almost 800 ms.

$ time bundle
Fetching gem metadata from http://checkpoint:8808/..
...

real    0m0.773s
user    0m0.684s
sys     0m0.064s

The security team now has to publish the approved gems using gem inabox followed by the package filename. They cannot install dependencies automatically using gem install, so they could really use some kind of “deep fetch”, like in our rubygems-deep_fetch plugin.

$ gem deep_fetch sinatra --version 1.4.3
...

# review unpack and review new gems

$ gem inabox --host http://checkpoint:8808/ *gem
...
Gem tilt-1.4.1.gem received and indexed.

Geminabox also provides an administration web interface, so that the security team can unpublish the gems they don’t need or trust anymore.

The development team also gains a server to publish their own private gems. After all, this is what geminabox has been designed for.

How about a proxy?

I’ve experimented with gem server and geminabox to implement our check point. Along the way, it gave me a better understanding of the relationship between a gem server and its clients (i.e. rubygems and bundler) and was good to remind about the dependency API introduced with bundler 1.1.

Using similar techniques, it’s also possible to set up a proxy for rubygems.org, and even work off-line. The gem mirror is a rubygems plugin that aims to do so. But so far there is no open source project to setup an intelligent proxy in front of rubygems.org, that could anticipate the upcoming needs of the clients. geminabox may evolve to become such a cache, like mentioned in a recent forum discussion, but this is just a guess. We’re still missing something as smart as apt-cacher-ng.