How We Used Amazon Web Services to Save $84k Per Year (And Improve Performance!) with a Heroku Hosted Ruby on Rails App

iRonin IT Team•2017-05-26

iRonin, case study, real estate, web, devops, development, security, infrastructure, aws, heroku

84K in savings a year? Yup, that’s how much we managed to save this one company by migrating their app from Heroku over to AWS and reconfiguring their code – all because we were fine tuning performance. Moral of the story here is: make sure the team you have on board to write your code are good at what they do – otherwise you might just need to get our DevOps and software developers team in to help sort things out later...

We love Heroku, you just need to use it right

Don’t get us wrong – we ❤️ Heroku! Heroku is an awesome `PaaS` (Platform as a Service). It’s a pleasure to work with and allows you to quickly bring web applications to users. You simply connect your code repository to Heroku and deploy your app with a single click. Heroku's application containers are called Dynos and allow to scale your application as needed by simply moving a slider or automatically by using an internal or external autoscaling service.

It’s super easy to deploy, easy to scale (given the right requirements and implementation), and everything works out of the box. It makes life as a Ruby on Rails developer really easy (it was built specifically for Ruby)- but you can also use it with other architectures like Java and Node.js now, too.

However, one of Heroku’s issues is that it still takes skill and work to keep the cost down as the deployed platform grows. Due to Heroku’s constraints (like the limit on the duration of a single web request to 30 seconds - and so large file transfers need to go straight to S3 instead), some apps will be simply more expensive to host on Heroku.

Other platforms give you more control over parameters by giving lower level access to the underlying infrastructure - like Amazon Web Services (AWS). With AWS you can set request timeout limits, which eliminates this particular issue, and can potentially save a lot of money.

Rapidly growing real-estate property maintenance app in need of TLC

One of our clients, a real-estate enterprise, has a dedicated real-estate property maintenance app based on Ruby on Rails. In the app, users - other real-estate companies managing dozens to hundreds of real-estate properties - can manage inspections, evaluations, and work orders, all in real-time from their desktop and mobile devices.

The property maintenance platform allows users to stay on top of their repairs and preventative maintenance - to optimize their asset values and time spent on maintenance activities. We didn’t build the app ourselves, however we were brought on board subsequently to help.

iRonin team took over to support the project with:

New features development
Day-to-day maintenance
Code improvements
AWS infrastructure setup and maintenance (DevOps)
Improving overall performance
Security audit

Unfortunately, the app had a history of love affairs with too many developers who paid little regard to what they were doing! The app was written against most of the good practices of writing maintainable and extendable Ruby on Rails code, and the Ruby and Ruby on Rails versions were outdated and would shortly run out of security support. This together with mixing backend and frontend code made it very difficult to upgrade the key libraries. A bit of a nightmare to say the least! The main issue, however, was the lack of regard shown to Heroku’s constraints and guidelines for writing performance-driven web applications.

Once the application started to grow rapidly over a certain amount of users and data, we noticed that hosting it on Heroku would make it difficult to scale up with reasonable cost. The crux of the problem was that some requests was taking too long for Heroku - one of the most important user actions in the app in fact. These were simply terminated before they completed successfully - extremely frustrating for the users!

We were caught out with this problem a couple of weeks into a planned few of months of code rewrite and improvements - not to mention we had to handle daily feature and bug fix requests. Due to poorly written legacy code and the critical condition of the system, there was no time to improve it within a couple of days to reduce the request times.

TL;DR; Currently the platform has more than a million users; over a thousand companies use it daily, to manage the whole housing process for their tenants. On rapid scaling of the product, web requests were taking too long and terminating before completion, leaving users frustrated.

We love Amazon Web Services too, and we have the skill to use it right

While we love Heroku, and choose it for the right applications, we love Amazon Web Services too.

AWS is a great alternative to Heroku when you want to fine tune details in the web server’s underlying infrastructure. It’s far better at tweaking features - Heroku’s all about simplicity, whereas AWS allows far more control.

AWS is Infrastructure as a Service (`IaaS`), as opposed to `PaaS`, and in general requires more know-how to set up and run, as not everything is done for you like on Heroku. Amazon Web Services provides a plethora of services for building web applications - file storage (something not available on Heroku at all), databases (relational, document, key-value stores), domain hosting, and push notifications. Heroku itself hosts some of their infrastructure on AWS! You can even build serverless platforms with AWS that are designed to scale from the start.

Step 1: Fine-tuning Heroku Dynos for concurrency and performance of web and background job processes

When more and more users started using the real-estate property management app in question, the platform started to be slower and required more manual maintenance work from sys admins to keep it healthy (e.g. restarting some services when they started to choke). The web requests would timeout, background job queues would quickly fill up, and angry platform users were calling - we had to act quick!

Usually, when you’re talking performance issues with Heroku, you have only handful of options and the usual answer is to throw more Dynos at the problem (which results in much higher bills).

Dynos are containers that run your processes and can be either run on shared hosts, such as the `Standard-1x` and `Standard-2x` Dynos, or can run on dedicated ones, with the `Performance-M` and `Performance-L` flavors. Each of these is priced differently, according to the specs. You can have up to 100 Dynos for your web application, however you’re limited to just 10 of the `Performance-M` and `Performance-L` types.

At the beginning, this is exactly what we tried to do - change the Dynos configuration. Since the app contains a few additional service layers with 6-7 background workers (where one of them was delegating to the others) we had to make sure scaling one Dyno wouldn't slow down or block any others.

We achieved partial success with fine tuning the type and number of Dynos. Ruby on Rails can handle web requests with different web servers (like Webrick, Passenger, Puma, Thin, or Unicorn). Some web servers can handle multiple requests with a single process (`Puma`, `Thin`, or `Passenger` in thread mode), others can’t (`Unicorn`, `Passenger` in process mode). The same is true about Sidekiq - a Ruby library for handling background tasks, its single process can handle multiple workers. However, the code we encountered in the app was not thread safe, so we couldn’t even use the multi-request single process web servers here.

The key instead is in the configuration - how many workers should a single process have? You need to look at the number of CPUs (and number of cores) used by each Dyno, as well as check memory usage and I/O blocking of a given worker to judge how many of them can run on a single Dyno.

Step 2: Moving the PostgreSQL database from Heroku to the Amazon Relational Database Service (RDS)

After a while a new problem arose - we couldn’t add more Dynos to support more workers indefinitely; Heroku’s database plans allow up to 500 concurrent connections; the more background workers we had, the more connections they opened.

We tried to use pgbouncer buildpack, but even after disabling prepared statements it didn't work for us (we were getting `ActiveRecord::StatementInvalid` errors for `array` fields).

Basically, at this point we hit a wall - when we scaled Dynos up, we were reaching 500 connections. When we decreased them the application was slowing down. We had to act quick in order to keep the application running under a high load without significant down time!

This is when we decided to move only the app's database out of Heroku - this was something we could do fast as it was loosely coupled to other infrastructure components. Interestingly, Heroku’s databases are hosted on AWS, too, and the only thing required by the web app was changing the database URL and credentials. We immediately went with Amazon RDS with constant replication (to have real-time backup in case of a failure) and automatic full daily backups.

Unfortunately, there was no way for us to migrate easily by pointing an AWS RDS slave at a Heroku master to replicate the data. Heroku simply doesn’t allow external databases to connect to it this way. So, instead of the quick, easy route, we had to create a dump of the data on Heroku and then manually move it into Amazon RDS - which made for 6 hours of downtime. However, since we had planned and prepped in advance, we turned on the maintenance mode during night hours and moved the data out. Since some of the iRonin team members are in a different time zone, we were able to do it during light traffic time and without a hassle for the team.

We created a really powerful instance (16 cores, 64GB of RAM) so we could tweak workers settings without worrying that the database could be the bottleneck - we could have more than 5000 concurrent connections in this setup without any problem 😎.

Amazon RDS’s Multi Availability-Zone feature allowed us to scale the instance up & down without any downtime - so we could focus purely on tweaking the application and workers. Amazon RDS gives access to database parameters that you would usually set in a local PostgreSQL’s `postgresql.conf` file. This meant that we could tweak lower-level Postgres settings if needed - which comes handy at times like this.

Step 3: Migrating the application from Heroku to Amazon Web Services

After the migration we could have much more database connections, but we still couldn’t achieve the performance we wanted - which is where Amazon Web Services comes in.

On Heroku, Dynos are placed on the instances that are shared with other Dynos from other apps (unless you use a `performance` Dyno). If you are unlucky and your Dyno is placed on the instance with some other heavy resource Dynos, your Dyno's performance can be worse. On the other hand, on AWS, we could have 1-2 instances with multiple processes running and use all of each instance's power.

Another problem we had to solve were requests with synchronous file upload. With 30 second timeouts on our Heroku app, it was throwing an error for users with slower connections. The app should upload files directly to `S3` without going through our backend - but it wasn't done like this before we took over the project.

It was important for the client to fix this as soon as possible. By increasing the timeout, it would allow us to focus on the rewrite of the file upload part. On AWS, we could increase the timeout without any problems - because we had direct access to the nginx configuration. It's not a fix for the problem by any means. However, unblocking the application for users bought the client some time to ease down angry customers - and for us to work on improvements. Coding is faster in a stress-free environment! 🤓

Since moving to AWS would kill 2 birds with 1 stone, we started experimenting. We set up our background workers on AWS Opsworks to check if they could perform better than the ones on Heroku. The difference here was significant - our AWS instance could perform ~80,000 app tasks (including a lot of PDF processing) in a few hours - in stark contrast to a few days, like on Heroku (Single `r4.2xlarge` EC2 instance vs 40x “Standard-2X dynos” and 2x 4 “Performance-M Dynos”).

This was our point of no return. We setup a production stack with 2 web instances (with load-balancer on top) and a single worker instance. We also had to migrate `Redis` - however, it wasn't complicated, since `ElasticCache` allows us to initialize `Elasticsearch` and a `Redis` instance from a snapshot on `S3` (we just uploaded the `Redis` dump there). We already have tasks to reindex the search database, so this wasn't a problem either.

After the migration we finally had everything on AWS infrastructure: the database, Redis, and the application with workers layers. In order to provision instances on AWS Opsworks, the iRonin team prepared all necessary cookbooks and deployment scripts used to automate setting up the servers for releasing new versions.

Costs savings and performance improvements

Before starting the move to AWS, bills from Heroku looked like this:

Standard 2x ($50 / month each): 40 = $2000 / month
Performance-L ($500 / month each): 8 dynos = $4000 / month
Performance-M ($250 / month each): 4 dynos = $1000 / month
RedisGreen $269 / month
Postgres Premium6 $3500 / month

For a total of $10,769 / month.

After the migration, costs from AWS were significantly lower:

m4.xlarge ($157 / month): 2 = $314 / month (web instances)
c3.2xlarge ($307 / month: 1 = $307 / month (for all background workers)
cache.m3.medium $65 / month
Postgres db4.mx10.large (40 vCPU and 160GB RAM with Multi A-Z and provisioned SSD to 2000 IOPS) $5875 / month

Which came out to $6561 / month - and we still had better performance!

That's a solid $50,500 of savings per year! Holy cow!

After running the above configuration on AWS for a few months we noticed that we hadn’t even used all the available resources:

After a couple of weeks we were able to scale the database instances down and save even more 💵 :

Switched from provisioned SSD to regular SSD since we only needed 415 IOPS and regular SSD provides 1050 for 350GB SSD)
Changed instance to db.r3.4xlarge (16 vCPU 122GB RAM with Multi A-Z)

The price for the full production stack should be around $3717 / month.

This saves us a further $2850 / month - or $34,200 per year!

Since we rolled out the new implementation, there’s been nothing but good reports from users about how smoothly the app is running! Not only this but there has been an uptick in the amount of new users picking up the app. And when your users are happy, and your user base is growing, you know you’re on the right track.

Conclusion - $84k worth of savings per year and happy users!

We love all hosting platforms and we believe all infrastructure providers have their merits - be it Amazon Web Services, Heroku, DigitalOcean or Linode; their features vary, as well as their price points and best use cases.

If you feel infrastructure hosting fees for your web applications are excessive, feel free to contact us to get feedback of how much we can save for you.

This may be an extreme case but we were able to save mind blowing $84,000 per year 🙌🏼 and bring significant performance improvements at a critical moment without spending weeks on rewriting the app, which is still in progress.

We also encourage you to contact us for code quality and web app security inspections - we are serious about both and we can demonstrate it's something worth looking into. Poor code can trip you up tomorrow, or in the future.

Read Similar Articles

Ruby on Rails Development Costs: How Much Does it Cost to Build a Rails App?

RRUG #16 - Rzeszów Ruby User Group meeting recap

OS Time Travel When Testing Ruby on Rails Web Applications For System-Wide Time Events

10 tips for finding the perfect Ruby on Rails job