From Monolithic to Micro-services (Part 1 / 4)

Gemnasium’s founder, Philippe Lafoucrière, recently had the opportunity to attend and speak at the Web à Québec 2016 conference in Quebec City a month past. The WAQ conference, as it is known, is the largest gathering of francophone entrepreneurs in America, with guests arriving from not only Québec, but also France, Belgium, Africa, and the Caribbean.

Because not everyone could attend the event, and the topic shared by Philippe is essentially part of Gemnasium’s life story, we thought our readers might find it of interest.

Philippe’s presentation, co-delivered with Jean-Philippe Boily of Metrics Watch, covered Gemnasium’s journey from a monolithic single application using Ruby on Rails to a more distributed micro-services approach using Go. This journey is not uncommon, but the reasons for such a shift are not often publicly shared or explained thoroughly. Why was the shift made? What were the implications of the move? How was the transition performed? These are all questions that any developer might want to learn about before attempting their own such endeavor. As this is a rather in depth topic, we’ll split it up into several articles for easier consumption:

  • Why the Transition? (this article)
  • Criteria for Architecture
  • Making the Transition
  • Anatomy of a Micro-Service


What’s the Difference?

A monolithic application, in our definition, is a self-contained application, independent from other computing applications. A micro-services application is one in which the single application is made up of a suite of self-contained services, each running in its own process, and communicating with the rest using lightweight mechanisms.

Performance

As is often the case, the primary impetus for making a change was performance related. As Gemnasium’s code base grew, it required more and more RAM to stand up, and horizontal scaling (adding more nodes) required a great deal more hardware, as the whole app had to be stood up X number of times for each new node. As we used Resque to process background work, and each worker had to start up the whole app to function, gigabytes of RAM were invested in Resque workers. A move to Sidekiq was considered, but this seemed a band-aid solution.

Not only were system requirements increasing, but also testing was becoming horribly time-consuming. Real DB testing was a requirement, but our suite was taking longer and longer to fully test, with times upwards of 45 minutes towards the end of our run. This led us to implement fast_specs, and try to get our testing suite back to a reasonable time of under 5 minutes. However, again this was dealing with the symptoms and not the ‘disease’.

Even starting the app was becoming problematic (in case of a crash, for instance). Rails often needed 30 seconds just to boot up completely.

Code-based Discoveries

Another reason for the need to make a change was Rails 4. As we migrated to Rails 4, we tried to evaluate the migration throughout our effort. Unfortunately, the migration was far from smooth, as we found several bugs not present with our Rails 3 implementation. In fixing these bugs, other parts of the code were falling apart. As a result, we could not truly evaluate the effort of migrating our code base - an ironic result, considering this is supposed to be Gemnasium’s specialty.

During this migration task, we discovered that we had been guilty of stacking code for years. To put this in perspective, Gemnasium’s was imagined in 2010, and our first commit was in early January, 2011. While stacking code proved an easy practice for us, because all the models, classes, etc. we needed were available right away in the app, it meant a LOT of code. Many features were transversal to 20 classes. Modifying the code was very difficult, because we could not immediately see the boundaries of a given feature.

Another reason for making the change was a desire to have more of the features managed by the database (PostgreSQL). Unfortunately, raw SQL does not often play well with ActiveRecord.

The Human Factor

One of the biggest problems that arose from our becoming bloated monolithic application was onboarding. New employees were subjected to a rigorous - far too rigorous to be practical - learning curve after being hired. Remember, when Gemnasium was founded, Docker was not available. We used vagrant VMs, with some success and failure, but the VMs became so heavy that it was soon decided to get back to local development. New developers had to install (and learn, if any were unfamiliar) RVM, then Ruby, then PostgreSQL, then setup PostgreSQL (accounts, etc.), then Redis, and so on. Due to the joys of native extensions, we wasted time on figuring out why the same setup was not functioning correctly on a given developer’s machine, often with the only difference being that one was using Linux, and the other a Mac.

Eventually, Docker helped us standardize the set up and versions. However, Docker (and docker-compose) itself was young and buggy at this point. To make a long story short, the learning curve was steep for each new resource, and often hampered by problems within their work implementations - learning processes from start to finish is difficult when those processes are consistently interrupted by crashes or other system failures.

Before writing their first lines of code, new developers had to read a lot of code and documentation, because everything tended to be related to other features. Creating a resource often meant dealing with Stripe, our DB, our workers, etc.

Enterprise

Last but certainly not least, the final impetus for us to transition to a micro-services approach was the pressure from our clients to deliver an Enterprise version of Gemnasium. The Enterprise version would be an on-premise installation, enabling it to be fully integrated with the client’s network and systems.

Our Rails app itself comes with a 3 to 6GB of RAM requirement (see the consumption for one container bellow), and an additional 256MB to 1GB of RAM for each individual Resque worker added. Once you tack on the requirements for the database, Redis, etc, we had to tell these clients that they would need at least 16GB RAM in order to run our product.

Moreover, we would have had to ship our entire codebase, with all our algorithms. Because Ruby is interpreted, this meant that we would need to obfuscate the code before each release.

Finally, the Enterprise version would almost certainly be quite a bit different from our SaaS offering. If we had stuck with a monolithic single app, this would have required a great deal of branching, or a lot of if/then modelling in our code to handle the different situations.

This Enterprise version is currently underway, and is being developed in tandem with our switch to a micro-services approach.

Our Experience

All of these considerations played a big role in forcing us to rethink our methods for developing Gemnasium. It was time to make a tough decision, we decided to explode the app into several pieces called microservices. Each microservice would be in charge of one group of tasks (auth, billing, color resolving, etc.).

The decision to change over was implemented over the course of the past year, with complete buy-in from the entire team. As this blog is being written, we are still migrating the code, extracting features one by one to our new microservices. So far, it has been an immensely positive experience, and we have all fallen in love with Go. Our past experiences with compiled languages such as C and C+ have made this transition feel almost like coming home.

Cautionary Notes

Bear in mind that while this transition was right for us at this point in our development, this article is not a ringing endorsement for micro-services or Go. We firmly believe that a Rails application is faster to bootstrap, and is a more cohesive starting point for a new project. A new boilerplate in Go will take longer to stand up, and will have a lot more lines of code. However, in Gemnasium’s case, our ever-expanding app was becoming too big for Rails, and after a certain point, Ruby was no longer the right choice for us. Careful analysis brought us to our decision, and it should be a part of yours.

Stay tuned for our next article, in which we will cover the Genesis of our transition project, including how we started our implementations, and how we decided on the tools to use.

Thanks for reading!