Hardening Attack Surfaces with Formally Proven Binary Format Parsers

Nikhil Swamy (RiSE: Research in Software Engineering, Microsoft Research, Redmond, WA, USA)
Tahina Ramananandro (RiSE)
Aseem Rastogi (Microsoft Research, Bengaluru, Karnataka, India)
Irina Spiridonova (MSR Central Engineering, Microsoft Research, Redmond, WA, USA)
Haobin Ni (Department of Computer Science, Cornell University, Ithaca, NY, USA)
Dmitry Malloy (Microsoft, Redmond, WA, USA)
Juan Vazquez (Microsoft)
Michael Tang (Microsoft)
Omar Cardona (Microsoft)
Arti Gupta (Microsoft)

PLDI 2022

Final version at the ACM Digital Library
Presentation by Tahina Ramananandro at PLDI 2022
- PDF Slides
Paper preprint
GitHub repository (EverParse)

With an eye toward performance, interoperability, or legacy concerns, low-level system software often must parse binary encoded data formats. Few tools are available for this task, especially since the formats involve a mixture of arithmetic and data dependence, beyond what can be handled by typical parser generators. As such, parsers are written by hand in languages like C, with inevitable errors leading to security vulnerabilities.

Addressing this need, we present EverParse3D, a parser generator for binary message formats that yields performant C code backed by fully automated formal proofs of memory safety, arithmetic safety, functional correctness, and even double-fetch freedom to prevent certain kinds of time-of check/time-of-use errors. This allows systems developers to specify their message formats declaratively and to integrate correct-by-construction C code into their applications, eliminating several classes of bugs.

EverParse3D has been in use in the Windows kernel for the past year. Applied primarily to the Hyper-V network virtualization stack, the formats of nearly 100 different messages spanning four protocols have been specified in EverParse3D and the resulting formally proven parsers have replaced prior handwritten code. We report on our experience in detail.