Paralegal’s dependency graphs are deigned to soundly reflect the dependency relationships of safe Rust. The analysis is a modified version of Flowistry (Crichton et.al. from PLDI 2022) and a formal proof of soundness for Flowistry is provided in their paper. The following paragraphs describe the Semantic Program Dependence Graph (SPDG), created with the Flowistry analysis, and its soundness properties. The descriptions are simplified for terseness with no loss in applicability.
Soundness: Both Flowistry and Paralegal statically model dependency relationships of data in a Rust program. Such a dependency relation is sound if each “true” dependency between any two data $a$ and $b$ in the program is reflected in the model. A dependency is “true” if there exists at least one execution trace where a change in the value of only $a$ would entail a change in the value of $b$.
Completeness: Analogously, the dependency model is complete if each “true” independence is reflected in the model.
SPDG: is a program dependence graph (PDG) that models the control- and data-flow of a program as nodes, representing program locations, and edges, representing dependencies, respectively labeled “control” or “data”. In the SPDG nodes additionally carry a set of semantically meaningful markers which the policies reference.
A snippet of the delete controller for the Plume web application
Paralegal’s extracted SPDG for the snippet above.
The nodes of Paralegal’s SPDG are call paths that uniquely identify a function call site relative to the entry point of the analysis. For visual aid a short example is shown on the left.
The SPDG captures both data and control flow and as a result has two types of edges. $n_1 \xrightarrow{\mathsf{data}} n_2$ describes a direct data dependency between the nodes $n_1$ and $n_2$ (blue in the example), meaning there is a value produced by $n_1$, some part of which which is received by $n_2$ with no other function call $n_3$ modifying it in between. Control flow edges, $n_1 \xrightarrow{\mathsf{control}} n_2$ (green in the example), mean that a value created by $n_1$ directly determines whether $n_2$ executes.
There are two more node types in the SPDG, a singleton “return” type node that represents what is returned from the function serving as entry point to the analysis and a “argument” type nodes that represent inputs to the entry point function (if any). For the purposes of soundness, we can disregard these two node types, as they act the same as call site nodes. Currently all non-function call operations are represented in the SPDG only as their data and control flow influence, i.e., as edges.
Paralegal’s SPDG construction is underpinned by the same reasoning employed in Flowistry. Whereas Flowistry constructs a matrix that associates each program location with all of its transitive dependencies however, Paralegal constructs a PDG that associates only locations of function calls with their direct dependencies. We have not yet formally proven that the same Soundness guarantees for safe Rust that was proven for Flowistry transfer to Paralegals PDG construction. As such this section discusses the soundness guarantees that Paralegal is engineered to achieve and we defer their formal proof to future work
With respect to a value created by a node $n_s$ the SPDG:
Soundness is only guaranteed for effects expressed in safe Rust or captured by type signatures and only within the program. In particular the following compromise soundness:
Data structures such as RefCell
and Mutex
allow mutation through immutable references. Those “true” dependencies are not captured.
let rc = RefCell::new(src);
let bm = rc.borrow_mut();
*bm = unseen;
read(rc.borrow());
In this small example the “true” influence from the unseen
value on read()
is lost (dotted line).
graph TD
src -->|"new()"| rc
rc -->|"borrow_mut()"| bm
unseen --> bm
src -->|"borrow()"| read["read()"]
bm -..-> read