From Monolithic to Micro-services, Part 3

Genesis: Making the Transition

For those who have been following this blog series, you’ll know of Gemnasium founder Philippe Lafoucrière’s visit to the WAQ conference in Quebec city, and his stint as guest speaker with Jean-Philippe Boily of Metrics Watch. This conversation provided the impetus for this blog series, which covers Gemnasium’s shift from a monolithic app to a micro-services approach. For those unfamiliar with the topics at hand, I recommend a review of the previous two entries.

Having outlined the reasons for the transition, and the criteria we used in the selection of our toolset, it is now time to cover how we made the transition.

Choosing a Starting Point

Now that we had our toolset, OpenShift, it was time to decide upon process, and what to move first. This was not an easy decision, as micro-services architecture was completely new to us. We wanted a quick win in order to validate our choices made thus far, to give us confidence in our selected path, and to work out our processes going forward. We also wanted to be sure that nothing would be broken by our efforts, so we had to select a non-critical feature to start. We did, however, want to begin with something that was identifiable as distinctly Gemnasium.

Once we considered these factors, (a signature service that was not critical to the app) the answer as to what would be moved first became obvious: badges. Since the beginning of the project, Gemnasium had provided badges to our users, indicating in a small and portable image the status of a project. These badges were typically inserted into project ReadMe files, and thus each hit on the project page was a hit on our stack. Thanks to shields.io, these badges have now become pretty standard, and their appearance has been unified across heterogeneous services, such as Travis CI, Coveralls, Jenkins, and others.

Example:

https://gemnasium.com/mathiasbynens/breach_core.svg

This feature was small, relatively simple, and had clear code boundaries within our monolithic rails app. Badges rendering was done in Ruby-on-Rails, within the main app, and while it wasn’t slow, it certainly wasn’t as fast as it could be (a ~30 to 50ms / hit, when cache was invalidated). If something went wrong with the badges server, we could simply remove the routing to the micro-service within seconds, and be back to the original state of the app. These factors made badges the perfect test case for our new approach.

Modus Operandi

Having chosen our test service for breakout, we should take the time to explain the original app configuration, and the changes we ended up making. As a ‘monolithic’ app, this was the (very simplified) original configuration:


As you can see, we provided load balancing and redundancy within the application, but it was a self-contained single entity. Our database servers were mirrored, ensuring a full copy of our data was always available. Two load-balanced front end servers provided our services, with optimum load decided by our load-balancer. It was a simple, elegant solution, very suitable for a smaller application. Unfortunately, growth happens, and sometimes the answer is not simply to throw more resources at the problem. But we already covered the reasons for the change in our previous article.

Step 1:

To begin our migration to the new architecture, we had to first stand up our Openshift cluster, and connect it through our firewall to our stand-alone app. The red wires represent the flow of web traffic through our application. Once communication was opened between the Openshift cluster and our application servers, we were ready to make our first transition.

Next, we had to route traffic through the new Openshift cluster. This helped us to validate that Openshift could work as our new front-end, and handle the traffic our application generated, as well as ensuring there would be little to no downtime in the transition. This included connecting the databases to the new cluster, and testing that the data was accessible through Openshift. Both clusters had to operate in tandem, or the badge services could not be safely be removed/replaced without disruption.


Now we could create badges as a micro-service, and add it to the Openshift cluster. After just a few days of learning go, we had badges working as a micro-service. 

The results were impressive; not only was the rendering fast, but the resource consumption was remarkably low.

The time to render a badge (both for .png or .svg files) was so fast that we still haven’t had any need to implement a cache around this service. Rather than a 10 to 50 ms hit, it was reduced to between 1 and 5 ms. The low resource consumption (as evidenced in the screenshot above) allowed us to run 2 containers in a load-balancing configuration for redundancy, just in case a node was failing in our new cluster. The impact on the main app was also immediately visible. We had less requests to handle, and our CPU had more resources to spend on other features/pages.

Continuous Effort, Continuous Deployment

With our first service moved, and already seeing results, we knew we had made the right choice in toolset. Moving to OpenShift from our more ‘conventional’ architecture was far from a slam-dunk. We needed to put every component into a docker image, and run it as a container. It took a great deal of time to do so, but after a few months, our new architecture was ready.

The moment had arrived, and it was time to deploy our statically compiled Go program. We’ll explain in part 4 how we went from this executable to something running in our cluster. We now had our own Continuous Delivery platform, without a single line of code leaving our infrastructure, which was a big win. OpenShift proved smart enough to deploy only running instances of our services, and only when required. We added some probes in the badges service, so that traffic would be routed to it if (and only if) the health status was green. Updates are using a smart rolling strategy, where new containers are started, checked, and add to our LB (again if status is green), before starting to remove old versions:

This meant not only secure deploys, but also absolute zero-downtime deploys. As we were producing new versions several times a day, this was a strong requirement.

Since implementation, all changes approved in master branches (because we now have more than one project, due to ongoing efforts to deliver Gemnasium Enterprise), are deployed within a few seconds directly to production, saving us a great deal of time. Having every component in docker would make shipping to our clients easier and safer.

Stay tuned for part 4 of this journey, where we’ll cover the anatomy of a Micro-Service, and how we went from a simple executable to a fully integrated set of services running on a clustered server.