Let your code type-hint itself: introducing open source MonkeyType

Instagram Engineering
Instagram Engineering
4 min readDec 14, 2017

--

Today we are excited to announce we’re open-sourcing MonkeyType, our tool for automatically adding type annotations to your Python 3 code via runtime tracing of types seen.

Motivation

At Instagram we have hundreds of engineers working on well over a million lines of Python 3. Every day we have new engineers joining the team from other projects and other languages who need to ramp up quickly and get productive in our codebase. And we’re constantly shipping new code into production, every few minutes, all day long, every day. So we’re keen to make our code easier for new developers to read and understand, as well as more amenable to static analysis that shrinks the domain of possible bugs. Type annotations and static type checking fit that bill.

Writing new code with type annotations is easy enough; most of our engineers are eager to do that. But the returns on static type checking are low until we reach a critical mass of type-annotated code, especially our core frameworks and libraries. In other words: we have a lot of existing code that needs type annotations added.

Our first forays into manually adding type annotations were discouraging. It can take hours to annotate a single module, sometimes painstakingly tracing through multiple layers of function calls and objects to understand the possible types at some call site. (This is, of course, the same pain that anyone trying to maintain that function might experience; that’s why we want to add type annotations!)

So we built MonkeyType. Instead of guessing or spelunking for the right types, let your test suite or (better!) your production system tell you what the real types are.

Usage

Sounds great! I’ve run pip install monkeytype. What's next?

Before MonkeyType can tell us anything useful, we need to let it trace some function calls. The simplest way to do this is with monkeytype run, which runs any Python script under MonkeyType tracing. For instance, you can easily run your test suite under MonkeyType:

$ monkeytype run runtests.py

(or monkeytype run `which pytest`, or whatever your preferred flavor.)

While your tests are running, MonkeyType inspects the argument types and return/yield type of every function call, recording them in a database. (By default it keeps them in a local SQLite database, but like just about everything MonkeyType does, this is configurable.)

Of course, your test suite may not provide the best type information — sometimes tests use fakes instead of the real types, and we’ve found plenty of cases where type checking revealed that our tests were accidentally passing in different types from production. So if you don’t want to annotate based on your test suite, you can record call traces from production runtime. For this use case, MonkeyType provides a context manager API:

from monkeytype import tracewith trace():
# ...

If you need even more flexibility, you can create your own CallTracer, install it with sys.setprofile(), and remove it when you're ready.

Once you've got some call traces recorded, you can generate a stub file for any module:

$ monkeytype stub some.module

If the stub looks reasonable and you want to apply the annotations directly to your code, MonkeyType will do that too:

$ monkeytype apply some.module

Review the now-annotated code in some/module.py, correct the annotations if needed, and commit it!

Because of the backing-store design, you don't have to record traces and annotate all at one go. You can collect traces into the database over a long period of time and gradually annotate more and more modules from the collected data as you're ready to do so.

With configurable type rewriters, you can easily tweak MonkeyType’s generated type annotations for your preferred type annotation style or specific edge cases in your codebase.

For lots more details on customization options, refer to the documentation.

Open Source

Check out the code, give it a spin, and let us know what you think! We’re looking forward to your bug reports and suggested improvements. If you’d like to contribute, we have a list of suggested starter tasks in the issue tracker.

MonkeyType does require Python 3.6+, and generates only Python 3 style type annotations (no type comments). If you aren’t quite there yet, you may want to start with our PyCon 2017 keynote on how we migrated those million lines of code to Python 3 last year.

How we use MonkeyType at Instagram

We choose a small random sample of production web requests and turn on MonkeyType tracing via a Django middleware. Traces are stored and retrieved from SCUBA, Facebook’s data analysis platform, using a custom CallTraceStore. We run tracing constantly, so we are always adding new type traces from production. Since production code changes frequently, this keeps our traces up to date.

Whenever an engineer wants to add type annotations to a module, they just run monkeytype stub and then monkeytype apply, fix any type errors revealed by the new annotations, and submit a diff!

Results

With MonkeyType’s help, we’ve already annotated over a third of the functions in our codebase, and we’re already seeing type-checking catch many bugs that would have otherwise likely shipped to production. Race you to 100%!

Carl Meyer is an engineer on Instagram’s infrastructure team.

--

--