Performance & Usage at Instagram

Published in

Instagram Engineering

5 min readMar 4, 2016

At Instagram, we treat performance as a feature, and as we build products, we constantly think about how we can make things faster and more efficient. We’ve found that improving performance can actually drive usage, and by making small changes we can also improve user experience with the service. Here, we explore one case study where identifying and fixing low-hanging fruit reduced network usage, improved client-side reliability, reduced server-side latency, and in turn we saw an improvement in app-wide user experience.

Background

Before a mobile client downloads actual media (i.e. photos or videos) from our CDNs, it must first fetch JSON metadata (“media bundles”) from our Django webserver endpoints. Depending on the endpoint, the compressed response is typically 10–60 kB. Each bundle contains information like media id, metadata about the author, the number of likes, the caption, and the most recent comments (called “preview” or “summary” comments).

{
  “status”:”ok”,
  “items”: [{
    "id": "###",
    “author”: {
      “id”: “###”,
      “profile_pic_url”: …
    },
    "like_count": 500,
    “caption”: “tbt”,
    "comments": [{
      "id": "###",
      "text": "Great pic!”,
      “user”: {
        “id”: “###”,
        “profile_pic_url”: …
      },
      {(another comment)},…
    ],
  },
  {(another media bundle)},…
   ] }
}
Simplified illustrative example of a media bundle JSON

When you open Instagram to the main feed, you will notice that you only see up to three preview comments below each photo (in addition to the caption). In grid view (e.g. in the Search Explore tab or user profiles), no preview comments are visible at all.

3 preview comments per item are visible in feed, and no preview comments are visible in grid view.

However, we had been sending up to 20 comments to the client with each bundle. Originally, this was intended as an optimization to make the “View all comments” screen load faster. But when viewed holistically, this now seems like a poor trade-off for these reasons:

*Media are viewed more commonly than their comments, and we should optimize for the common case.

*Comment bundles are particularly heavy:

They contain the comment text itself as well as its id, timestamp, and author metadata (including profile picture URL).
The authors and content of the comments are usually unique from comment to comment, so compression performs poorly.

*Generating profile picture URLs is a CPU-inefficient operation because we must dynamically compute the correct CDN URL. The more comments we load, the more profile picture URLs we need to generate.

*When a user clicks on “View all # comments” we ask the server for new comments anyway!

For these reasons, reducing the maximum number of summary comments in each media bundle seemed like an obvious thing to do. But it was still unclear how much of a user-facing impact it would have. After all, this only makes a difference in the order of tens of kilobytes per payload, a difference that is dominated by the size of photo or video files. We hypothesized that the impact on network latency should be negligent — if a user were using a connection slow enough where downloading a few more kilobytes matters, just about any Internet service would probably be too difficult to use anyway. But, considering the possible bandwidth and CPU savings, we decided to do an experiment to see if there were in fact any user-facing effects.

The experiment

We ran an A/B experiment that reduced the maximum number of summary comments in each bundle from 20 to 5. This dropped the median response size of the main newsfeed endpoint from 15 KB to 10KB, while the median response size size of the “Explore Posts” endpoint dropped from 46 KB to 23 KB. This drop is even more pronounced when considering response sizes at higher percentiles: at the 95th percentile, median response size of the main feed endpoint dropped from 32 KB to 16 KB.

Results

As expected, reducing the size of the payload by a few kilobytes had no perceptible impact on network latency. But it had a surprising impact on memory usage: reducing the average memory usage for each feed screen ended up significantly improving the stability of the entire app. Android out-of-memory (OOM) errors dropped 30%! We hypothesize that the difference between platforms results from the Android market: some Android phones come with very low amounts of RAM, and correspondingly high memory pressure.

Median CPU usage on our most popular endpoints, like the main feed endpoint, dropped 20%! This translated into a median savings of 30ms in server-side wall time (and thus reduced end-to-end latency), and at the 95th percentile, we saved 70ms in server-side wall time. That makes a difference!

Infra improvements

When we launched this across all our users, CPU across our entire Django fleet dropped about 8% and egress dropped about 25%. Egress is a measure of site health, and such a drop would normally be alarming. But in this case, it’s a good sign that we’re reducing the load on our infrastructure!

Results

During the A/B test, we saw app-wide impressions across all mobile platforms increase by 0.7%, and likes increase by 0.4%. This was driven by increases of impressions on all surfaces — for instance, “Explore Photos” impressions increased over 3%, and user profile scrolls increased 2.7%. These trends continued over time, confirming that good performance brings users back.

Percent increase in user profile scrolls over 3-month period

Takeaways

Question baked-in assumptions. In this case, we asked, “Why do we send 20 comments per media bundle?” Sometimes, questioning baked-in assumptions can lead to identifying low-hanging fruit.
Measure. Before optimizing, take time to understand the potential impact. We saw how heavy comments were when we inspected the payload, and profiling led us to the realization that we could save CPU.
Optimize for the most common case. A cardinal rule of optimization. Here, we consciously chose to optimize media loads rather than comment loads.
Do the simple thing. “Do the simple thing first” is dogma at Instagram. After identifying the potential problem, we chose the simplest and most obvious course of action. And despite its simplicity, it yielded big results.
Empathize. This is oft-repeated at Facebook. We use powerful phones on powerful networks, so this change was personally imperceptible. Yet it still impacted many people. Again, it’s worth noting that the observed improvements on Android outpaced those on iOS. This makes sense — Android phones tend to be cheaper and less powerful.
Follow your nose. Here at Instagram NY, we’re chiefly responsible for ranking media (for instance, we personalize and rank media for “Explore Photos”). So a performance optimization like this wasn’t directly related to our work. But our intuition told us that this would be worth pursuing, and one of the best things about working at Instagram is that joining a specific team doesn’t constrain which parts of the code base we can touch. Within reason, we have the latitude to pursue anything we think is worthwhile.

Thanks to Lisa Guo, Hao Chen, Tyler Kieft, Jimmy Zhang, Kang Zhang, and William Liu.

Performance & Usage at Instagram

Background

The experiment

Results

Infra improvements

Results

Takeaways

Written by Instagram Engineering