The finalizer of an object is invoked as soon as the refcount of the
object becomes 0. There are at most thousands of ways to trigger finalizers.
Weak reference prevents objected from not being collected.
As soon as an exception is raised, the control flow goes to the main loop
via setjmp/longjmp without unwinding (causing memory leaks => refcount-based
GC won't work => it needs mark-and-sweep-based GC) or explicity error
propagation (refcount-based GC is working => no memory leaks).
Exceptions are handled before executing each bytecode (after stack unwinding).
try: finally: can have double returns.
Generators have a lot of exception handling issues because generators handle
heap-based stacks as well.
Type is defined by its behavior that is defined by magic methods (duck typing).
No security policy is enforced when using magic methods.
Language
interoperability:
Language interoperability is the capability of two different programming
languages to natively interact as part of the same system and operate on the
same kind of data structures. There are three methods for interoperability.
Object models: COM (COM is a very ambiguous term. It can be a COM
format, a COM
port, and Common Object
Model, a binary interface introduced by Microsoft); Virtual Machine (a
specialised intermediate language that different languages compile down to):
Java Virtual Machine; foreign function interfaces (FFI), that is in the format
of a wrapper library, also called bindings: JNI/JNA/SWIG.
Challenges when implementing language interoperability
Object model differences: e.g., permit multiple inheritance or not
Memory model differences: e.g., have garbage collector or not
Mutability: e.g., immutable or not
Favocado (NDSS'21): JS engines are embedded in commercial software, e.g.,
Adobe Acrobat Reader, Foxit PDF Reader, Chromium, and WebKit. Even without
considering the binding layers, it is difficult to effectively fuzz JavaScript
engines in the first place. To generate syntactically correct JavaScript code as
test cases, modern JavaScript engine fuzzers use context-free grammars or
existing semantically correct test cases. While no JavaScript fuzzers generate
fully semantically correct code as test cases, some fuzzers can generate test
cases in a semantic-aware manner. However, the percentage of rejected test cases
that are generated by these semantic-aware fuzzers is still a significant
problem. Another challenge for effectively fuzzing the binding layer is the
enormous input space, e.g., in Chromium, there are more than 1,000 DOM binding
objects. Each DOM object may have a multitude of methods and properties, some
of which may require hard-to-satisfy arguments such as other DOM objects.
Drawbacks of Favocado
From Minerva: Although Favocado also generates semantically-correct test
cases, its typebased relations introduce redundant relations between APIs.
Besides, its conservative generation strategy only choose a few interface
objects in each test case, hindering the fuzzer from state space
exploration.
From SAGE: Favocado extracts semantic information from browsers’ source
code, it doesn't make good use of them. Its semantic information is
maintained in a JSON-like structure instead of CFG, which make it difficult
to effectively generate highly-structured inputs. Additionally, its flawed
implementation makes it challenging to bypass syntactic checks. Favocado can
only explore a limited set of browser backend logic.
Cooper (NDSS'22):
In this paper, we propose cooperative mutation, which modifies both the
script code and the program native input to trigger bugs in binding code. We
develop three novel techniques to enable practical cooperative mutation on
popular scripting languages: we first cluster objects into semantics similar
classes to reduce the mutation space of native inputs; then, we
statistically infer the relationship between script code and object classes
based on a large number of executions; at last, we use the inferred
relationship to select proper objects and related script code for targeted
mutation. We applied our tool, COOPER, on three popular systems that
integrate scripting languages, including Adobe Acrobat, Foxit Reader and
Microsoft Word. COOPER successfully found 134 previously unknown bugs.
TypeOracle (ICSE'23):
Full name: Operand-Variation-Oriented Differential Analysis for Fuzzing
Binding Calls in PDF Readers
The parameter types of a binding call are inferred by executing the
binding call with different values of different types and investigating
which types cause an expected effect on the instruction operands. The
inferred type information is used to guide the test generation in fuzzing.
Gramatron (ISSTA'21)
A fuzzer must generate syntactically valid inputs to fuzz beyond the
application parser. Previous fuzzers employ parse trees with CFG for input
generation and mutation. Existing grammar-aware fuzzers are ineffective at
synthesizing complex bug triggers due to biased sampling and localized
small-scale mutations. Gramatron performs an automatic two-step
transformation on the CFG to create a grammar automaton. First, it
transforms the CFG into its GNF which performs the grammar restructuring.
Second, it converts the GNF of the grammar into an automaton. Gramatron can
encode any input that abides by the grammar as an automaton walk. Grammar
automatons restructure the grammar enabling the fuzzer to perform unbiased
sampling from the input state space. This enable the fuzzer to generate
inputs with higher diversity more frequently. We also redesign the mutation
operators to operate on grammar automatons and perform aggressive changes
efficiently to discover bugs with complex triggers.