- Overall algorithm is
- identify identical subtrees in an AST, robust to alpha-renaming
- specifically hashing, as applied to Common Subexpression Elimination (CSE)
- 4 Types of equivalence for subexpressions
- Syntactic equivalence
- α-equivalence (alpha-equivalence)
- Focus of this paper
\x.x+y
equivalent to\p.p+y
- Graph equivalence
(let x=e1 in let y=e2 in x+y)
equivalent to(let y=e2 in let x=e1 in x+y)
equivalent to(e1+e2)
- let-expressions express lifetimes and evaluation order
- Semantic equivalence
(3+x+4)
equivalent to(x+7)
equivalent to(7+x)
- undecidable
- To avoid false positives (name overloading) in syntactic equality, assume that expressions are preprocessed “so that every binding site binds a distinct variable name”
- de Bruijn indexing is a form of nameless representation, to be insensitive to alpha-renaming
- Negative: terms need to be re-indexed when substituted into other expressions
- According to [14] http://reports-archive.adm.cs.cmu.edu/anon/2002/CMU-CS-02-120.pdf
- “Our experiment provides evidence, at least, that mixing [de Bruijn terms and Named terms] does not appear to be efficient enough to be effective.”
- “Given an expression e, in which every binding site binds a distinct variable name, identify all equivalence classes of subexpressions of e, where two subexpressions are equivalent if and only if they are α-equivalent. We wish to achieve this goal in a way that is: [compositional and sub-quadratic]”
- e-summary is “the result of processing an expression”
- “two subexpressions are alpha-equivalent iff their e-summaries are equal”
- Full algorithm
- Goal: convert an
Expression
into anESummary
, a pair ofStructure
andVariableMap
Structure
is the expression with anonymized variablesVariableMap
is a map of free variables and their directions to locate their position in theStructure
- These directions also form a tree (
PosTree
)
- These directions also form a tree (
- Time-complexity is reduced by tweaking the
VariableMap
merge in theApp
case to tend towards unbalanced trees. - The naive algorithm is
and the full algorithm is ESummary
s are only useful because they can be hashed- Hashing
Structure
is trivial, butVariableMap
requires leveraging a hash combiner that is associative, commutative, and invertible- They use
XOR
which despite being a cryptographically weak hash combiner, still achieves low hash collisions - Invertibility (involution?) is leveraged when removing an element from the map
- They use
- Goal: convert an
- This algorithm is generally faster compared to (Charguéraud, 2012) in synthetic and practical (ML expressions) benchmarks
Charguéraud, A. (2012). The Locally Nameless Representation. Journal of Automated Reasoning, 49(3), 363–408. https://doi.org/10.1007/s10817-011-9225-2
Maziarz, K., Ellis, T., Lawrence, A., Fitzgibbon, A., & Peyton Jones, S. L. (2021). Hashing modulo Alpha-Equivalence. Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 960–973. https://doi.org/10.1145/3453483.3454088