probably.pgcl.ast. The abstract syntax tree (AST) with all its types that define it.
It is also possible to calculate weakest pre-expectations of programs (see probably.pgcl.wp) for some specific types of programs.
>>> fromprobably.pgcl.compilerimportcompile_pgcl>>> code='''# this is a comment... # A program starts with the variable declarations.... # Every variable must be declared.... # A variable is either of type nat or of type bool.... bool f;... nat c; # optional: provide bounds for the variables. The declaration then looks as follows: nat c [0,100]...... # Logical operators: & (and), || (or), not (negation)... # Operator precedences: not > & > ||... # Comparing arithmetic expressions: <=, <, ==...... skip;...... while (c < 10 & f) {... {c := unif(1,10)} [0.8] {f:=true}... }'''>>> compile_pgcl(code)Program(variables={'f': BoolType(), 'c': NatType(bounds=None)}, constants={}, parameters={}, instructions=[SkipInstr(), WhileInstr(cond=BinopExpr(operator=Binop.AND, lhs=BinopExpr(operator=Binop.LT, lhs=VarExpr('c'), rhs=NatLitExpr(10)), rhs=VarExpr('f')), body=[ChoiceInstr(prob=RealLitExpr("0.8"), lhs=[AsgnInstr(lhs='c', rhs=DUniformExpr(start=NatLitExpr(1), end=NatLitExpr(10)))], rhs=[AsgnInstr(lhs='f', rhs=BoolLitExpr(True))])])])
This module provides an API that combines parsing and type-checking all at once.
It is just a thin wrapper around the parser, type-checker, and constant substitution.
For our AST types, most “sum types” have a normal Python superclass (e.g. TypeClass for types), but also a Union type (e.g. Type).
Prefer using the Union type, as it allows for exhaustiveness checking by mypy.
In some rare cases, it’s necessary to tell mypy that an object of a superclass type is actually in the respective union.
Use cast methods, e.g. InstrClass.cast() to go from InstrClass to Instr.
Also note that isinstance checks must be done against the superclasses, because Python’s Union does not yet work with isinstance.
Bounds are only preserved for variables.
Values of bounded types are considered as unbounded until they are assigned to a bounded variable.
That is to say, bounds are lost in expressions such as in the example below:
>>> fromprobably.pgcl.parserimportparse_pgcl,parse_expr>>> fromprobably.pgcl.checkimportget_type>>> program=parse_pgcl("nat x [1,5]")>>> get_type(program,parse_expr("x + 5"))NatType(bounds=None)
In the AST, a bunch of different things that look like program expressions are
lumped together. Formally, let a state \(\sigma \in \Sigma\) consists of a
bunch of values for each variable, and be represented by a function of type
\(\sigma : \text{Var} \to \text{Value}\).
First, there are state expressions that, given a state \(\sigma\), compute
some value to be used later in the program itself: \(\Sigma \to
\text{Value}\). There are two types of expressions we call monadic expressions:
UniformExpr and CategoricalExpr. They map a state to a
distribution of values: \(\Sigma \to \text{Dist}[\Sigma]\). And
expectations are also expressed using Expr, but they are actually a
mapping of states to expected values: \(\Sigma \to \mathbb{R}\).
A decimal literal (used for probabilities) is an expression.
It is represented by either a Decimal (created from decimal
literals), or by a Fraction (created from a fraction of natural numbers).
Infinity is represented by Decimal('Infinity').
Warning
Note that the Decimal representation is not exact under arithmetic operations.
That is fine if it is used just as the representation of a decimal literal.
For calculations, please use to_fraction().
Return a list of all recursive operands of the same operator.
This really only makes sense if the operator is associative (see
Binop.is_associative()). Throws an error if the operator is not
associative.
Chooses a random integer within the (inclusive) interval.
As monadic expressions (see Expressions), uniform choice
expressions are only allowed as the right-hand side of an assignment
statement and not somewhere in a nested expression.
Return the distribution of possible values as a list along with
probabilities. For the uniform distribution, all probabilites are equal
to \(\frac{1}{\text{end} - \text{start} + 1}\).
Chooses a random real number within the (inclusive) interval.
As monadic expressions (see Expressions), uniform choice
expressions are only allowed as the right-hand side of an assignment
statement and not somewhere in a nested expression.
As monadic expressions (see Expressions), geometric choice
expressions are only allowed as the right-hand side of an assignment
statement and not somewhere in a nested expression.
Chooses a random geometrically distributed integer.
As monadic expressions (see Expressions), geometric choice
expressions are only allowed as the right-hand side of an assignment
statement and not somewhere in a nested expression.
As monadic expressions (see Expressions), geometric choice
expressions are only allowed as the right-hand side of an assignment
statement and not somewhere in a nested expression.
Chooses a random logarithmically distributed integer.
As monadic expressions (see Expressions), geometric choice
expressions are only allowed as the right-hand side of an assignment
statement and not somewhere in a nested expression.
Chooses one of a list of expressions to evaluate, where each expression has
some assigned probability. The sum of probabilities must always be exactly one.
It is represented by a (expression, probability) pair.
As monadic expressions (see Expressions), categorical choice
expressions are only allowed as the right-hand side of an assignment
statement and not somewhere in a nested expression.
A substition expression that applies a mapping from variables to expressions
to the target expression.
An important invariant is that all substitutions must be well-typed, i.e.
assign expressions to variables of the same type. Thus the expression with
the substitutions applied has the same type as the expression before.
It is not available in the pGCL program language, but can be generated by
program transformations to represent expectations, such as the weakest
preexpectation of a program (see probably.pgcl.wp).
An instruction that does not modify the program state, but only increases
the runtime by the value of the expression in the current state. Its only
use is its translation to TickExpr by weakest pre-expectations.
Updates the current distribution according to the observation (forward analysis only).
May result in an error if the observed condition has probability zero.
Want to calculate the weakest pre-expectation of a program?
You’re in the right module! You can read all about weakest pre-expectations of
probabilistic programs in [kam19].
The basic function is loopfree_wp(). It calculates weakest
pre-expectations of loop-free programs. one_loop_wp() can calculate
weakest pre-expectations of programs that consist of exactly one loop (see
One Big Loop). general_wp() is applicable to all pGCL programs, but
can produce extraordinarily ugly outputs.
Therefore the weakest pre-expectation of a program with tick instructions
can be obtained by simply ignoring all probably.pgcl.ast.TickExpr in
the returned expectation, i.e. replacing them all by zero.
At the moment, the returned expression is a tree and not a DAG.
Subexpressions that occur multiple times are deepcopied, even though that
is not strictly necessary. For example, jump:=unif(0,1);t:=t+1
creates an AST where the second assignment occurs twice, as different Python objects,
even though the two substitutions generated by the uniform expression could reuse it.
We do this because the probably.pgcl.substitute module cannot yet handle non-tree ASTs.
Wraps an expectation \(f\) (represented by
probably.pgcl.ast.Expr) and holds a reference to a variable
\(v\). Together they represent an expectation transformer, a function
\(\Phi : \mathbb{E} \to \mathbb{E}\) mapping expectations to
expectations.
An expectation is applied (\(\Phi(g)\)) by replacing \(f\) by
\(g\).
Transform the given expectation with this expectation transformer.
The result is the modified internal expression with the variable
replaced by the post-expectation, i.e. self.expectation.
Calling this method will change this object!
If substitute is set to False, then the expectation will not
be copied, but the same reference will be reused. Take care to
deepcopy the result before running any further substitutions!
The \(\mathrm{wp}\) semantics of loops require a least fixed-point. It is
undecidable to compute. We do not represent least-fixed points explicitely in
our AST (see probably.pgcl.ast), but instead represent the weakest
pre-expectation transformer of a single loop (without any more nested loops)
with optional initialization by a special LoopExpectationTransformer.
For such programs with exactly one loop, one_loop_wp() can calculate the
LoopExpectationTransformer.
The expectation transformer for a pGCL program with exactly one loop and
optional initialization assignments before the loop.
See one_loop_wp().
A loop’s expectation transformer is represented by the initialization
assignments themselves, an expectation transformer for the body, and a
done term which is the expectation for loop termination.
Note that done is not represented by an expectation transformer, but
rather just an expectation (probably.pgcl.ast.Expr). This is
because the term in the weakest pre-expectation semantics for termination of
the while loop never contains the post-expectation \(f\) (it is just
multiplied with it). A simpler representation allows for more convenient
further use. The \(\mathrm{wp}\) semantics of a while loop are shown
below:
Calculate the weakest pre-expectation transformer of a program that consists
of exactly one while loop with some optional assignments at the beginning.
That means the only supported programs are of the following form:
Linear arithmetic expressions\(e \in \mathsf{AE}_\text{lin}\) adhere to the grammar
\[\begin{aligned}
e \rightarrow &\quad n \in \mathbb{N} \quad && \text{\small{}(constants)} \\
\mid &\quad x \in \mathsf{Vars} && \text{\small{}(variables)} \\
\mid &\quad e + e && \text{\small{}(addition)} \\
\mid &\quad e \dot{-} e && \text{\small{}(monus)} \\
\mid &\quad n \cdot e && \text{\small{}(multiplication by constants)}
\end{aligned}\]
Monus means subtraction truncated at zero [4]. In Probably, we just use
probably.pgcl.ast.Binop.MINUS (and do not distinguish between minus
and monus).
Linear Boolean expressions\(\varphi \in \mathsf{BE}_\text{lin}\) adhere to the grammar
\[\begin{aligned}
\varphi \rightarrow &\quad e < e \quad && \text{\small{}(strict inequality of arithmetic expressions)} \\
\mid &\quad e \leq e && \text{\small{}(inequality of arithmetic expressions)} \\
\mid &\quad e = e && \text{\small{}(equality of arithmetic expressions)} \\
\mid &\quad \varphi \land \varphi && \text{\small{}(conjunction)} \\
\mid &\quad \varphi \lor \varphi && \text{\small{}(disjunction)} \\
\mid &\quad \neg \varphi && \text{\small{}(negation)} \\
\mid &\quad \mathsf{true} \\
\mid &\quad \mathsf{false}
\end{aligned}\]
Linear pGCL programs consist of statements which only ever use linear
arithmetic and linear Boolean expressions. Probabilistic choices must be done
with a constant probability expression.
The set \(\mathsf{Exp}_\text{lin}\) of linear expectations is given by the grammar
\[\begin{aligned}
f \rightarrow &\quad e\quad && \text{\small{}(arithmetic expression)} \\
\mid &\quad \infty && \text{\small{}(infinity)} \\
\mid &\quad r \cdot f && \text{\small{}(scaling)} \\
\mid &\quad [\varphi] \cdot f && \text{\small{}(guarding)} \\
\mid &\quad f + f && \text{\small{}(addition)}
\end{aligned}\]
where \(e \in \mathsf{AE}_\text{lin}\) is a linear arithmetic expression,
\(r \in \mathbb{Q}_{\geq 0}\) is a non-negative rational, and where
\(\varphi \in \mathsf{BE}_\text{lin}\) is a linear Boolean expression.
>>> from.parserimportparse_pgcl>>> program=parse_pgcl("bool b; while (true) { }")>>> check_is_linear_program(program)>>> program=parse_pgcl("while (true) { x := 2 * x }")>>> check_is_linear_program(program)>>> program=parse_pgcl("while (true) { x := x * x }")>>> check_is_linear_program(program)CheckFail(location=..., message='Is not a linear expression')
Linear expressions do not multiply variables with each other.
However, they may contain multiplication with constants or Iverson brackets.
Division is also not allowed in linear expressions.
Parameters:
context (Optional[Program]) – The context in which the expression is to be evaluated. Literals that are
parameters according to this context are not considered variables. Pass None if
no context is required. If the context is not None, it must not contain any constants
(see probably.pgcl.substitute).
>>> from.astimport*>>> from.parserimportparse_expectation>>> nat=NatLitExpr(10)>>> check_is_linear_expr(None,BinopExpr(Binop.TIMES,nat,nat))>>> check_is_linear_expr(None,BinopExpr(Binop.TIMES,nat,VarExpr('x')))>>> check_is_linear_expr(None,BinopExpr(Binop.TIMES,VarExpr('x'),VarExpr('x')))CheckFail(location=..., message='Is not a linear expression')>>> check_is_linear_expr(None,parse_expectation("[x < 6] * x"))>>> check_is_linear_expr(None,parse_expectation("[x * x]"))CheckFail(location=..., message='Is not a linear expression')>>> check_is_linear_expr(None,parse_expectation("x/x"))CheckFail(location=..., message='General division is not linear (division of constants is)')
Check whether the program consists of one big while loop with optional
assignment statements before the loop.
Every program can be converted into a program with just one big loop and a bunch
of initialization assignments before the loop using
probably.pgcl.cfg.program_one_big_loop().
>>> from.parserimportparse_pgcl>>> program=parse_pgcl("")>>> check_is_one_big_loop(program.instructions)CheckFail(location=..., message='Program must contain exactly one loop')>>> program=parse_pgcl("while (true) { while (true) { } }")>>> check_is_one_big_loop(program.instructions)CheckFail(location=..., message='Program must contain exactly one loop')>>> program=parse_pgcl("x := 5; while (true) { x := 2 * x }")>>> check_is_one_big_loop(program.instructions)isNoneTrue>>> program=parse_pgcl("x := 5; while (true) { x := 2 * x }")>>> check_is_one_big_loop(program.instructions,allow_init=False)CheckFail(location=..., message='Program must contain exactly one loop')
For most of the grammar, a simple run of the Lark parser suffices. The only
slightly tricky part is parsing of categorical choice expressions. To avoid
ambiguous grammar, we parse those as normal expressions of PLUS operators.
Individual weights attached to probabilities are saved in a temporary
LikelyExpr type which associates a single expression with a probability. Later
we collect all LikelyExpr and flatten them into a single
CategoricalExpr. LikelyExpr never occur outside of this parser.
Utilities to traverse (“walk”) an AST, i.e. visiting each node in the AST.
There are two directions (bottom-up and top-down) which you can select using Walk.
We also have probably.util.ref.Ref and probably.util.ref.Mut to represent mutable references to values.
These allow a walk of an AST while modifying it.
E.g. constant folding can be implemented as a bottom-up walk where each callback simplifies the current node.
A simple example with walk_expr() is below.
The returned iterable contains Mut[Expr] objects.
Use probably.util.ref.Mut.val to access the current node.
This framework allows easy and flexible traversal of ASTs while also changing direction along the way: Using probably.util.ref.Mut, we can modify the AST and then continue traversal with the new AST.
This makes it hard to forget calling a recursive traversal call, which could easily happen if every single traversal was implemented manually.
Check if the right-hand side type is compatible with the left-hand side type.
This function essentially defines the implicit type coercions of the pGCL language.
Compatibility is not only important for which values can be used as the right-hand side of an assignment, but compatibility
rules are also checked when operators are used.
The only rules for compatibility currently implemented are:
Natural numbers are compatible with variables of every other type of number (bounded or unbounded).
Otherwise the types must be exactly equal for values to be compatible.
>>> from.parserimportparse_pgcl>>> program=parse_pgcl("bool x; x := true")>>> check_instr(program,program.instructions[0])>>> program=parse_pgcl("nat x; x := true")>>> check_instr(program,program.instructions[0])CheckFail(location=..., message='Expected value of type NatType(bounds=None), got BoolType().')>>> program=parse_pgcl("const x := true; x := true")>>> check_instr(program,program.instructions[0])CheckFail(location=..., message='x is not a variable.')>>> program=parse_pgcl("nat x [1,5]; x := 3")>>> check_instr(program,program.instructions[0])>>> program=parse_pgcl("nat x [1,5]; nat y; x := x + 3")>>> check_instr(program,program.instructions[0])
>>> program=Program(list(),dict(),dict(),dict(),list())>>> check_expression(program,RealLitExpr("1.0"))CheckFail(location=..., message='A program expression may not return a probability.')
This module does variable substitution for expressions, expectations, and programs.
The purpose of this module is elimination of SubstExpr by applying the corresponding substitutions.
Note
The program must be well-typed.
In particular, variables must be defined before they are used and substitutions must be type-safe.
Warning
The input AST must be a tree and objects may not occur multiple times. This is a limitation of the current implementation.
General DAGs are not supported yet.
Warning
Substitutions reuse AST objects, and do not copy unnecessarily.
Thus, AST objects may turn up multiple times in the AST.
For later mutation of the AST, you may want to pass deepcopy = True to prevent spooky action action at a distance.
Symbolic variables
Each substitution function takes a parameter symbolic parameter that is a set of variables to be taken symbolic.
Symbolic variables are placeholders for arbitrary expressions, and thus it is not known which variables they contain.
Therefore the substitution cannot be done at this time.
Instead, a substitution applied to a symbolic variable will wrap the variable in a SubstExpr that contains all substitutions applied to this variable.
>>> fromprobably.util.refimportMut>>> from.astimportSubstExpr,NatLitExpr,VarExpr>>> from.parserimportparse_exprSimple substitution without symbolic variables:>>> expr_ref=Mut.alloc(SubstExpr({"x":NatLitExpr(2)},parse_expr("x + y")))>>> substitute_expr(expr_ref)>>> str(expr_ref.val)'2 + y'The same substitution with `y` being symbolic:>>> expr_ref=Mut.alloc(SubstExpr({"x":NatLitExpr(2)},parse_expr("x + y")))>>> substitute_expr(expr_ref,symbolic=set("y"))>>> str(expr_ref.val)'2 + ((y)[x/2])'
The implementation is much more sophisticated than the simple approach of doubly-recursive bottom-up substition application.
For comparison, the simple approach is available in tests/test_substitution.py and used to test this implementation.
In this module, we implement a different approach using generated local variables for each substitution.
First the binding phase, top-down (implemented in _Binder.bind()):
For each substitution expression and each substitution x/y, generate a new local variable t.
Run all current substitutions on y, generating y’.
Now remember two substitutions: x/t and t/y’.
Recurse into the substitutions’ children.
Then the resolve phase (direction is irrelevant, implemented in _Binder.resolve()):
On the generated substitutions for local variables, replace all simple ones of the form t1/t2, resolving t2 recursively. Now the object references to be inserted in the AST are final (not necessarily nested objects).
Now we update the objects to be inserted in the AST, i.e. all replacements of the form t1/r for general expressions r are updated by replacing all locals t_i occuring in r. All substitutions are now final.
Replace all locals in the expression itself.
Except for step 1 of the resolve phase, all phases clearly take linear time.
Step 1 of resolve is only run for simple chains of substitutions with no nested locals (e.g. generated from x[x/y][y/z]), and the step also amortizes lookups.
So basically we have linear runtime.
This implementation was very loosely inspired by an article [5] on “locally nameless” substitution based on the awesomely titled paper “I am not a Number — I am a Free Variable” [6].
Apply substitutions in this expression/expectation.
After execution, the expr_ref contains no SubstExpr anywhere except directly around symbolic variables.
Expressions and expectations become ugly after wp calculation?
Your program has dumb expressions a toddler could simplify?
This is the right place for you!
The multiplication operator (*) is implemented, and returns the product
of two objects. However, only at most one of the operands must have a
non-empty set of substitutions.
The multiplication operator is implemented (*), but only one operand may
include substitutions. This is always the case for transformers generated
from weakest pre-expectation semantics (see probably.pgcl.wp).
This module can build up control-flow graphs (CFGs) of pGCL programs and
decompile them back to programs again. The main purpose of this module is the
transformation of a program with arbitrarily nested loops into a program with
one big loop (program_one_big_loop()).
>>> fromprobably.pgcl.compilerimportcompile_pgcl>>> program=compile_pgcl("nat x; while (x <= 3) { while (x = 0) { x := x + 1; }; if (x = 0) { x := x + 1; x := x + 2; x := x + 3; } {} }")>>> graph=ControlFlowGraph.from_instructions(program.instructions)>>> graph.render_dot().source'digraph...'
You can see the control-flow graph for this program visualized using GraphViz
below. It was created with the ControlFlowGraph.render_dot() method. See
its documentation for more information on how to visualize a control-flow graph.
Flip the condition (by negation using
probably.pgcl.simplify.simplifying_neg()) and switch both branches
of this terminator. The operation does not change the semantics of the
terminator.
A BasicBlock has an attached marker type that is used for
back-conversion from a control-flow graph to a structured program, for
pretty-printing the graph, and for debugging.
All successor branches of forward blocks eventually converge on the same
block. These are created from any statement except loops. Loop head nodes
are the basic blocks created for the loop condition. These are the first
entry into a loop from the program start. Trampoline blocks are created by
the one_big_loop() transformation only.
A basic block in the control-flow graph consists of a series of
(unconditional) assignments that are executed in sequence, followed by a
Terminator that says where to go next (based on some condition).
To implement one_big_loop(), we cannot allow non-local gotos. Instead,
we build a jump table that branches on a variable.
Let’s look at a simple example. We start with two blocks: The x:=42
block (call it B1), and the y:=11 block (call it B2).
We’ll connect B1 and B2 with the jump table so that B2 executes after
B1. We add one trampoline for B1 and let start enter at that
trampoline. And then let B1 continue to a new trampoline to B2. Note
that the generated jump table is linear in the number of branches. See
finalize() as to why.
The goto3 block is unreachable and we write a self-loop basic block
that never exits to indicate that.
The generated jump table intentially only uses strict equality checks
against pc. While it is easily possible to generate a binary tree of
comparisons using \(\leq\) (therefore logarithmic size in the number
of jumps), we decide against it. Inequalities make it harder to see
which conditions exclude each other and this makes debugging e.g.
weakest preexpectation generation (see probably.pgcl.wp) a bit
harder.
Given a control-flow graph from a pGCL program with arbitrarily nested
loops, modify the graph in such a way that there is only one big loop that
branches to states of the loop. Use pc_var as the variable name for the
program counter variable pc in the generated jump table (it must not be
used elsewhere, but should be declared in the program already). For a CFG
from a well-structured pGCL program, the result is basically one big loop.
It is assumed all loop heads (see BlockType) are properly
annotated.
How does it work? This function basically just replaces each jump to a loop
header, and the program entry and exit with a trampoline
(JumpTable.trampoline()).
Let’s work through a (contrived) example. The program below uses two nested
(but redundant) loops.
After one_big_loop(), we get a much more complicated diagram (see
below). Now all loops go through the jump table basic blocks that branch on
pc.
A subtlety can be found in the image above: The first branch of the
generated jump table points to the program end and its terminator has been
flipped (Terminator.flip()) so that the true branch goes into the
loop and and the false branch exits it. We do this to exactly reproduce
the shape of a generated while loop. Also note that the goto5 is
unreachable. See JumpTable documentation for more information.
Unfortunately we cannot directly translate this control-flow graph back into
a well-structured pGCL program. By “direct translation” I mean translation
to a pGCL program while using each statement exactly once. Consider for
example the pc:=1 basic block. It is obviously needed before the big
loop to enter the proper state, but it is also needed after the inner while
loop exits to return back to the outer loop (br(x)27).
reduce_cfg_to_instrs() has extra functionality to handle this case and
will appropriately duplicate some trampoline blocks.
Reduce a control-flow graph to a program (i.e. list of instructions). The
given graph must have the same shape as one generated by
ControlFlowGraph.from_instructions(). This is the case for graphs
transformed by one_big_loop().
How Does It Work?
Recreating a structured program from a control-flow graph is a frighteningly
complex and not yet well studied problem. There is an ancient theorem called
Structured Program Theorem that essentially states “flowcharts”
(basically the class of control-flow graphs we care about) can be reduced to
structured programs in a very simple way with pretty unreadable output.
It is listed under “folk version of the theorem”, generating a single while
loop [1]:
This was not done here. Instead, we use a slightly smarter algorithm that
reconstructs hierarchical control structure using Dominator trees. Let’s
define some terms.
node \(d\) dominates a node \(n\)
A node \(d\) dominates a node \(n\) if every path from the entry
node of the control-flow graph to \(n\) must go through d. A node
\(d\)strictly dominates a node \(n\) if it dominates
\(n\) and \(n \neq d\).
dominance frontier
The dominance frontier of a node \(d\) is the set of all nodes
\(n\) such that \(d\) dominates an immediate predecessor of
\(n\), but does not strictly dominate \(n\).
region
A region is defined by an entry edge and exit edge, where the entry
edge dominates the exit edge, the exit edge post-dominates the entry
edge, and any cycle that includes one also includes the other.
For example, the basic block for the condition of an if statement dominates
all blocks in its true and false branches. Blocks for successor
statements to that conditional statement are also dominated. The same thing
holds for a while loop. So for CFGs generated from structured programs (and
specifically for those generated by
ControlFlowGraph.from_instructions()), it suffices to concatenate
statements into a sequence as long as each statement dominates the next one.
The set of basic blocks in such a sequence is called a region
(_Region, see also [fcd16]).
Fortunately, this implementation can be restricted to these
program-generated CFGs and those modified by one_big_loop(). That
means we can create regions without problems for almost all basic blocks.
The only difficulty are generated trampoline blocks. We handle those
separately in an ugly method (_join_regions()).
Restrictions and Generalizations
It is not completely clear what the above restrictions on input CFGs mean
formally. Ignoring trampoline blocks, the restrictions are more strict than
just having CFGs reducible[2]. On the other hand, a generalized
algorithm is a hard task, particularly for CFGs with arbitrary goto. The
relooper of Emscripten was an attempt in a production JS-to-LLVM
compiler [zak11]. Iozzelli has revisited the problem for the Cheerp
compiler using a dominator tree-based approach [ioz19]. In their article,
some ideas implemented here are discovered independently (e.g. our
JumpTable under the name “dispatcher”). Their general algorithms
may be of interest for an improvement of this method [3] at the cost of a
much more complex implementation, although the current one hopefully
suffices.
Use pc_var as the name of the generated variable (see JumpTable).
It must not be used already and will be added to the program’s declarations
by this function.