Static Analysis at Scale: An Instagram Story

Published in

Instagram Engineering

13 min readAug 15, 2019

Instagram Server is entirely Python powered.

Well, mostly. There’s also some Cython, and our dependencies include a fair amount of C++ code exposed to Python as C extensions.

Our server app is a monolith, one big codebase of several million lines and a few thousand Django endpoints [1], all loaded up and served together. A few services have been split out of the monolith, but we don’t have any plans to aggressively break it up.

And it’s a fast-moving monolith; we have hundreds of engineers shipping hundreds of commits every day. We deploy those commits continuously, every seven minutes, typically deploying to production around a hundred times per day. We aim to keep less than an hour of latency between a commit landing in master and going live in production. [2]

It’s really, really difficult to keep this massive monolithic codebase, hurtling along at hundreds of commits per day, from devolving into complete chaos. We want to make Instagram a place where all our engineers can be productive and ship useful features quickly!

This post is about how we’ve used linting and automated refactoring to help manage the scale of our Python codebase. In the next few weeks, we’ll share more details of other tools and techniques we’ve developed to manage other aspects of our codebase’s quality.

If you’re interested in trying out some of the ideas mentioned in this post, we recently open-sourced LibCST which serves as the heart of many of our internal linting and automated refactoring tools.

[1] For more about how we use Django, watch the talk “Django at Instagram” we gave at Django Under the Hood conference in 2016.
[2] For more on our continuous deployment, watch the talk “Releasing the world’s largest Python site every seven minutes” we gave at PyCon 2019.

Linting: Documentation On-Demand

Linting helps developers find and diagnose issues or anti-patterns they may not have known about as they encounter them. This is important for us, because with hundreds of engineers it becomes increasingly difficult to disseminate these ideas.

Linting is just one of many forms of static analysis that we use within Instagram. The naive way of implementing a lint rule is with a regular expression. Regular expressions are easy to write, but Python is not a regular language, and so it’s difficult (or sometimes impossible) to robustly match a pattern in the code with a regular expression.

On the other end of the spectrum, we have tools like mypy and Pyre, two Python static type checkers, which can do whole-program analysis. Instagram uses Pyre. These tools are powerful, but hard to extend and customize.

When we talk about linting at Instagram, we’re usually referring to simple abstract syntax tree based lint rules. This is how we typically write custom lint rules for Instagram’s server codebase.

When Python executes a module, it starts by running a parser over the source code. That implicitly creates a parse tree, a kind of concrete syntax tree (CST). This parse tree is a lossless representation of the input source code. Every detail, like comments, parenthesis, and commas are preserved in this tree. We could regenerate the original source code using this tree.

Python’s parse tree (a kind of concrete syntax tree), as generated by lib2to3.

Unfortunately, this creates a complex tree, making it hard to extract the semantics we care about.

Python compiles the parse tree into an abstract syntax tree, or AST. This is a lossy conversion, and details about “syntactic trivia”, like comments, parenthesis, and commas are thrown away. However, the semantics of the code are preserved.

Python’s abstract syntax tree, generated by the ast module.

At Instagram, we’ve developed LibCST, which provides the best of both worlds. It provides a lossless representation like a concrete syntax tree (CST), but still makes it easy to extract semantics like an abstract syntax tree (AST).

LibCST’s concrete syntax tree representation.

Our lint rules use LibCST’s syntax tree to match patterns in code. As a high level representation, the syntax tree is easy to inspect and removes the problem of dealing with a non-regular language.

Let’s say that you’ve got a circular dependency in your module due to a type-only import. Python lets you solve this by putting type-only imports behind an if TYPE_CHECKING: guard to avoid actually importing anything at runtime.

# value imports
from typing import TYPE_CHECKING
from util import helper_fn

# type-only imports
if TYPE_CHECKING:
    from circular_dependency import CircularType

Later, someone adds another type-only dependency to the code and guards it. However, they might fail to realize that there was already a type guard in the module.

# value imports
from typing import TYPE_CHECKING
from util import helper_fn

# type-only imports
if TYPE_CHECKING:
    from circular_dependency import CircularType

if TYPE_CHECKING:  # Whoops! Duplicate type guard!
    from other_package import OtherType

We can prevent this redundancy with a lint rule!

We’ll start by initializing a counter of type checking blocks we’ve encountered.

class OnlyOneTypeCheckingIfBlockLintRule(CstLintRule):
    def __init__(self, context: Context) -> None:
        super().__init__(context)
        self.__type_checking_blocks = 0

Then, when we encounter a type-checking conditional, we’ll increment the counter, and verify that there is no more than one type checking block. If this happens, we’ll generate a warning at that location by calling our report helper.

def visit_If(self, node: cst.If) -> None:
    if node.test.value == "TYPE_CHECKING":
        self.__type_checking_blocks += 1
        if self.__type_checking_blocks > 1:
            self.context.report(
                node,
                "More than one 'if TYPE_CHECKING' section!"
            )

These lint rules work by traversing LibCST’s tree, and collecting information. In our linter, this is done using the visitor pattern. As you may have noticed, rules override visit and leave methods associated with a node’s types. Those visitors get called in a specific order.

class MyNewLintRule(CstLintRule):
    def visit_Assign(self, node):
        ...  # called first

    def visit_Name(self, node):
        ...  # called for each child

    def leave_Assign(self, name):
        ...  # called after all children

Visit methods are called before any of the node’s children are visited. Leave methods are called after all of the node’s children are visited.

Our philosophy at Instagram is to “do the simple thing first”. Our first custom lint rules were implemented in a single file, with a single visitor, using shared state.

A single file, with a single visitor, using shared state.

The single visitor class had to be aware of the state and logic of all of our unrelated lint rules, and it wasn’t always clear what state corresponded with what rule. This works fine when you’ve got a few custom lint rules, but we have nearly a hundred custom lint rules, which made the single-visitor pattern unmaintainable.

It’s difficult to figure out what state and logic is associated with each check being performed.

Of course, one possible solution to this problem is to define multiple visitors and to have each visitor re-traverse the entire tree each time. However, this would incur a large performance penalty, and the linter must remain fast.

Each lint rule can re-traverse the tree, executing in sequence over the file. However, this frequent re-traversal would incur a large performance penalty.

Instead, we took inspiration from linters in other language ecosystems, like JavaScript’s ESLint, and developed a centralized visitor registry.

A centralized visitor registry. We can efficiently determine which nodes each lint rule cares about, saving time on nodes they don’t care about.

When a lint rule is initialized, all of the rule’s method overrides are stored in the registry. When we traverse the tree, we look up all of the registered visitors and call them. If a method is not implemented, we don’t need to call it.

This reduces the computation cost of every new lint rule. Though we usually run the linter over the small number of recently changed files, we can run all of our new lint rules in parallel over Instagram’s whole server codebase in just 26 seconds.

Once we had a performant framework in place, we built a testing framework that works to enforce best practices by requiring tests for both false-positives, and false-negatives.

class MyCustomLintRuleTest(CstLintRuleTest):
    RULE = MyCustomLintRule
    
    VALID = [
        Valid("good_function('this should not generate a report')"),
        Valid("foo.bad_function('nor should this')"),
    ]
    
    INVALID = [
        Invalid("bad_function('but this should')", "IG00"),
    ]

Lint Fatigue

With nearly a hundred custom rules, pedantic lints can quickly turn into a waste of time for developers. Spending time fixing style nits and deprecated coding patterns gets in the way of more important progress.

We’ve found that with too many nags engineers start to ignore all lints, even the important ones. At a certain point, it doesn’t matter what good advice we present, it just gets ignored.

Let’s say we needed to deprecate a function named ‘fn’ for a better named function called called ‘add’. Unless developers are made aware of the fact that ‘fn’ is deprecated, they won’t know not to use it. Even worse, they won’t know what to use instead. So, we can create a lint. But any sufficiently large codebase is bound to have plenty of other lints already. Chances are that this important lint will get lost in the noise.

With too many nitpicks, the signal can get lost in the noise.

So, what can we do about it?

We can automatically fix many issues found by lint. Much like lint itself is documentation on demand, auto-fixers can provide fixes on-demand. Given the sheer number of developers at Instagram, it’s not feasible to teach each developer all of our best practices. Adding auto-fixers allows us to educate and on-board developers to new best practices on-demand. Auto-fixers also allow us to preserve developer focus by removing monotonous changes. Essentially, auto-fixers are more actionable and educational than simple lint warnings.

So, how do you build an auto-fixer? Syntax tree based linting gives us the offending node. There’s no need to duplicate discovery logic since the lint rule itself already exists! Since we know the node we want to replace, and we know its location in source, we can replace it with the updated function name safely! This works great for fixing individual lint violations as they are introduced, but when we introduce a new lint rule we may have hundreds of existing violations of it. Can we proactively resolve all the existing cases at once?

Codemods

A codemod is just a scriptable way to find issues and make changes to source code. Think of a codemod as a refactor on steroids: It can be as simple as renaming a variable in a function or as complex as rewriting a function to take a new argument. It uses the exact same concept as a lint, but instead of alerting the developer, it can take action automatically.

So, why would you write a codemod instead of a lint? In this example, we want to deprecate the use of get_global. We could use a lint, but the fix time is unbounded and spread across multiple developers. Even with an auto fixer in place, it can be some time before all code is upgraded.

We want to deprecate the use of get_global using instance variables instead.

To fix this, in addition to linting against get_global, we can also write a codemod! At Instagram, we believe that deprecated patterns and APIs left to slowly wither away take focus away from developers and reduce code readability. We’d rather proactively remove deprecated code than let it disappear over time. Given the sheer size of the code and number of active developers, this often means automating deprecations. If we can remove deprecated patterns from our code quickly, we keep all of Instagram productive.

Okay, so how do you actually codemod? How do you replace just the text you care about, while preserving comments, spacing, and everything else? You can use a concrete syntax tree like LibCST to surgically modify code while preserving comments and spacing. So, if I wanted to rename “fn” to “add” in the below tree, I would update the Name node to have the value “add” instead of “fn” and then write the tree back to disk!

We can perform a codemod by updating the Name node from “fn” to “add” before writing the tree back to disk. Check out LibCST’s documentation for a more detailed example.

Now that we know a bit about codemods, let’s look at a practical example. Instagram has been working hard on getting to a fully typed codebase, and we’ve made a lot of progress using codemods.

Given a body of untyped functions, we can try to generate return types by simple inference! For example, if a function returns only one primitive type, we assign it that return type. If a function returns the result of a boolean operation such as a comparison or “isinstance”, we assign it a bool return type. We found that in practice this is a pretty safe operation to perform across Instagram’s codebase.

Well, what if a function doesn’t explicitly return or it implicitly returns None? if a function doesn’t explicitly return a type, assign it a None type. Unlike the last example, this can be more dangerous due to common developer patterns. For example, I might throw a NotImplemented exception in a base class method and return a string in all subclass overrides. Its important to note that all of these techniques are a heuristic, but they turn out to be right often enough to be useful!

Enhancing Codemods with Pyre

Let’s take it a step further. Instagram uses Pyre, a full-program static type checker similar to mypy, to verify types in our codebase. What if we used Pyre’s output in order to make this codemod more powerful? As you can see in the below Pyre output snippet, Pyre gives us basically everything we need to fix up annotations automatically!

$ pyre
 ƛ Found 2 type errors!
testing/utils.py:7:0 Missing return annotation [3]: Returning `SomeClass` but no return type is specified.
testing/utils.py:10:0 Missing return annotation [3]: Returning `testing.other.SomeOtherClass` but no return type is specified.

When pyre runs it constructs a detailed analysis of control flow in each function. So, it can sometimes have a pretty good guess as to what an unannotated function should be returning. This means that if pyre thinks that a function returns a simple type, then we assign it that return type. However, now we potentially have to handle imports. Which means we have to know if something is already imported or is defined locally. I’ll touch a little bit on how we figure that out later.

What benefit do we get from automatically adding easy to infer types? Well, types are documentation! If a function is fully typed, developers don’t have to read the code to understand its pre- and post-conditions.

def get_description(page: WikiPage) -> Optional[str]:
    if page.draft:
        return None
    return page.metadata["description"]    # <- what type is this?

Lots of us have run into Python code like this, and Instagram’s code is no exception. If get_description was not typed, I might have to look at multiple source modules to figure out what exactly this function returns. Even for simpler functions that are easy to infer, function definitions with types are much easier for humans to read.

Also, Pyre doesn’t evaluate correctness inside a function body unless the function itself is fully annotated. In the below example, it would be nice to know that our call to some_function is going to crash before it gets to production.

def some_function(in: int) -> bool:
    return in > 0

def some_other_function():
    if some_function("bla"):  # <- should get a type violation
        print("Yay!")

In this case we will end up learning the hard way because some_other_function is missing a return annotation! If we had auto-inferred a “None” return type using our heuristic, we would find out about this issue before it causes a problem. This example is obviously contrived, but the issue remains an important one for Instagram. If you have millions of lines of code, you end up missing seemingly obvious problems like this during reviews.

At Instagram, these simple inferences got us an almost 10% boost to functions typed. That adds up to thousands upon thousands of functions that a human no longer has to fix up. The benefits of better typed code are obvious, but leads to yet another benefit: Having a fully-typed codebase unlocks even more advanced codemods.

If we trust our type annotations then we can use Pyre to unlock additional possibilities. Lets look back at our function renaming example. What if the function we’re renaming is a class method, not a global function?

If we pair type information we get from pyre with a rename codemod, we can suddenly fix up the call sites as well as the definition! In this example, since we know the left hand side of a.fn, we know that it is safe to rename to a.add.

More Advanced Static Analysis

Python has four scope types: global, class, function, and comprehension.

We can unlock more powerful codemods with scope analysis. Remember the problem from earlier, where adding type hints meant that we had to potentially add imports? If you have scope analysis then you know what types used in the file come from imports, which ones are defined locally, and which types are missing. Similarly, if you know that a global variable is shadowed by a function argument, you can avoid accidentally changing it if you are renaming that global variable.

One thing has become clear in our quest to fix all of the problems at Instagram: Finding the code we want to modify is often more important than the modification itself. Often, people want to do simple things, such as renaming a function, adding an argument to a method or splitting a module up. None of these is hard to do, but with the size of our codebase it becomes impossible for a human to find every line that needs modifying. That’s why pairing the ability to codemod with robust static analysis is so important. It allows us to narrow down what pieces of the code we should consider in order to make codemods safer and more powerful.

Many thanks to my team members: Jennifer Taylor (who co-wrote this article), Mark Vismonte, Jimmy Lai, Ray Zeng, Carl Meyer, Anirudh Padmarao, Keith Blaha, and Lee Bierman.

If you want to learn more about this work or are interested joining one of our engineering teams, please visit our careers page, follow us on Facebook or on Twitter.