Compiler/Interpreter Fuzzing#

PyRTFuzz (CCS'23)#

Python runtime consists of the interpreter and runtime libraries of the language.
[CPython] Since 2008, more than 1,000 bug-related issues have been reported annually, and the number of bugs reported per year has consistently remained close to 2,000 in the last five years.
[CPython] Our analysis revealed that most bugs (86.8%) occurred in the Python runtime libraries, while the remaining 13.2% occurred in the Python interpreter core.
[CPython] Furthermore, out of 165 modules extracted from the CPython source code, 164 modules were found to have reported bugs.
Semantically and syntactically correct programs are there (looking at CodeAlchemist).
Remaining challenges: 1) without paying sufficient attention to how these runtime APIs are used, 2) with no varying inputs, 3) A comprehensive approach to testing the Python runtime should address both the interpreter core and runtime libraries as well as interactions between the two.
Phase 1: Runtime API Description Extraction: Static extraction (AST) -> Untyped API description -> Dynamic refinement (unittest) -> Typed API description
Phase 2: Specification generation (Basic (OO/PO) + Extend (While/For/If/Call/With)) -> Python code generation (top-down wrapping, opt for API coverage/APP diversity/APP validity, with seamless data transfer).
Phase 3: Instrumentation (C + Python code), Custom Mutations (of input values).

JSfunfuzz: 2007, industrial, generation-based, SpiderMonkey JavaScript engine
LangFuzz: 2012, Usenix Security
TreeFuzz: 2016, industrial, generation-based
JVM testing: 2016, PLDI
Skyfile: 2017, SP, generation-based
Fuzzil: 2018, mutation-based, JS engine
DeepSmith: 2018, ISSTA, machine-learning-based
JVM testing: 2019 ICSE
CodeAlchemist: 2019, NDSS, generation-based, JS engine, both semantically and syntactically correct
Superion: 2019, ICSE, mutation-based