Latest Updates: Our Blog

How We’re Speeding Up DocumentCloud

Posted
May 2nd, 2016

Tags
Code

Author
Anthony DeBarros

Improving DocumentCloud’s speed and reliability has been a major focus of our team’s development time in the last year, with as much attention spent on back-end processing as on more-visible changes such as re-styled notes, a mobile-ready page embed, and enhancements to our WordPress plugin.

We thought you’d like to know about some of that work — from caching to a beefier database server to compressing the files we serve — and how it plays out every day in faster load times, a more stable platform, and improved performance during high-traffic situations.

Here’s a rundown:

Content Delivery Network: Last fall, we began serving PDFs, text, and images from Amazon’s CloudFront content delivery network instead of directly from Amazon’s S3 storage. You might have noticed the change when we switched URL subdomains from s3.* to assets.*, as in https://assets.documentcloud.org/documents/282753/lefler-thesis.pdf.

One benefit of using CloudFront is that we now serve files from multiple places around the globe. That means our users and readers in Europe, Asia and South America now see faster load times. We’re very happy to support better performance this way as our worldwide user base grows.

Compressed assets: DocumentCloud assets are now served compressed, a step developers had requested to improve performance. That means web browsers don’t have to download as many bytes to load our viewer and the code around it. This speeds up viewing and saves us some money on data costs — double win.

Caching: We’ve improved caching on our main application server, which frees it up to do other work. We did this by using its NGINX web server as a reverse proxy cache, which keeps track of requests for documents and other resources and saves the responses. If we see multiple requests for the same document or resource, we serve the response from the cache rather than making the server generate the response all over again.

Previously, the server relied on Ruby on Rails’ built-in page caching. While that worked well in most instances, it wasn’t able to handle resources over a certain length, typically those containing a large number of options and parameters (such as certain API calls). Using NGINX as a reverse proxy cache removes that limit and considerably improves caching coverage and performance. For example, our new caching setup came in very handy when publicity around the Panama Papers pushed traffic to documentcloud.org up 25 times more than usual.

Database upgrade: Finally, we recently upgraded our database to PostgreSQL 9.4 and moved it to a more powerful server. Our database server gets a considerable workout recording data about each document uploaded, processed and served, and as our user base grows we were starting to hit limits on memory, storage and processing. Now, we have room to spare.

We have plenty of work ahead as we embark on improving the data model for handling accounts and create a new workflow for signing up. But we hope these less-visible back-end improvements help you get your work done faster and improve your readers’ experience with DocumentCloud.

One Response to 'How We’re Speeding Up DocumentCloud'

Subscribe to comments with RSS

  1. Great work!

    Looking forward to easier deployment. I suggest the gitlab omnibus package model as it appears nearly flawless.

    Thanks for all your hard work! Kudos!

    Rebecca

    Rebecca Wise

    3 May 16 at 6:52 pm

Leave a Reply