Symbolic Execution by Overloading __bool__
December 24, 2024

Symbolic Execution by Overloading __bool__

A few months ago I saw a talk about buildit, https://buildit.so/ A very clever project that implements staged metaprogramming as a C++ library. I like the core principles of using a mainstream language and not having to modify the compiler. That’s it, bro. One thing I learned is a neat trick to make non-overloadable syntax overloadable.

In hindsight, the interesting observation seems clear (and the best one is), that bool conversion yes Overloadable by writing __bool__ Functions on the Z3 category. With a little bit of skill, you can log all paths with a fairly pure piece of python code. In this way, you can simply implement symbolic execution of python code without the usually expected tedious operation or symbolic reflection of python code into pure z3 expressions.

I noticed some time after patching Yuan Okamir Metaprogramming in python z3 has a lot of the flavor of staged metaprogramming (surprised? maybe not much to observe).

Some metaprogramming frameworks or styles look very different from unstaged code. You explicitly call all sorts of strange functions to construct code objects.

A very cool style of staged metaprogramming takes non-staged code and adds some comments to get staged code. The more similar the staged code is to the original code, the better. It may require some language-level functionality such as quoting, overloading, or introspection to achieve this.

A classic example of staged metaprogramming is the expansion of power functions (Ershov). The following is a simple example of an unstaged recursive power function operating on regular Python integers.

def mypow(x : int, n : int):
    if n == 0:
        return 1
    else:
        return x * mypow(x, n-1)
    
assert mypow(2, 3) == 8
mypow(2,3)

You can use codes very similar to codegen code strings. python F string Features (are being constantly improved. I hope tag string) set up some good quoting mechanisms. What’s very cute is that the one below is exactly the same as the one above. Let’s explain the parameters n are called “compile time”/static and parameters x Known as “runtime”/dynamic.

def Code(t):return str
def mypow(x : Code(int), n : int) -> Code(int):
    if n == 0:
        return "1"
    else:
        return f"{x} * {mypow(x, n-1)}"

assert mypow("x",3)  == "x * x * x * 1"
mypow("x",3)

If we think of z3 expressions as “code”, something very similar but more structural happens. This actually gives us a tree to work with.

import z3
def code(typ):
    if typ == int:
        return z3.IntSort()
def mypow(x : Code(int), n : int) -> Code(int):
    if n == 0:
        return 1
    else:
        return x * mypow(x, n-1)

x = z3.Int("x")
assert mypow(x, 3).eq(x*(x*(x*1)))
mypow(x,3)

x·x·x·1

The “static” compile-time role is played by regular Python values, and the “dynamic” role is played by Z3 expressions. You can mix things up between compile time and run time, more or less in the solver. Things like unrolling loops have counterparts in unrolling quantifiers. Because z3 uses python overloading, the same code can be used in different ways through simple annotations.

I think there are two design points worth noting about symbolic execution/. Do you put all branches into the solver or just a single path/trace across many queries. The differences between these methods can be viewed as phase confusion. Is the return value of the branch condition a static Boolean value or a dynamic Boolean value?

Some things in python are indeed not overloadable. if-then-else piece, while,chain comparison, and or not None of the operators can be overloaded. So you need to change things around to use z3.If. This is a bit disappointing.

overload __bool__

But actually, you can overload these functions indirectly. Can be overloaded when the condition is not a Boolean value __bool__ Call a function on a category. You can patch one to z3.

This alone won’t fully help you achieve your goal, you’ll need to execute the relevant function multiple times to explore all paths. This is a version of symbolic execution, which is interesting in its own right.

You can probably do the same thing using the C++ Z3 bindings.

I find this way of using python code as a DSL particularly interesting. For example, I’m trying to use the “application” subset of python for logic, similar to how ACL2 uses the subset of common lisp https://www.philipzucker.com/applicative_python/ But I’m iterating over a python AST and it’s an ugly and verbose thing.

I’ve seen similar issues in the tensor compilation or MLIR world. People want python grammar, at least for fun. However, getting a maintainable system is quite difficult.

import random
from z3 import *
def symexec(*vs, limit=100):
    def wrapper(f):
        trace = []
        traces = []
        # a shared solver being pushed and popped is probably more efficient
        s = Solver()
        def branch(self):
            # branch gets called on every branch (every time a z3 expression is asked to be converted to a concrete bool)
            s.push()
            s.add(And(trace)) # TODO: I could move push pop around to avoid this full push
            # flip a coin. Probably that means any individual run will end.
            # If there are no loops, being more deterministic is fine.
            if random.random() < 0.5:
                take = True
            else:
                take = False
            # Is it possible to take the branch?
            s.add(self == take)
            res = s.check()
            s.pop()
            if res == sat:
                # It was possible to take the branch
                trace.append(self == take)
                return take
            else:
                # it was not possible to take the branch
                trace.append(self == (not take))
                return not take
        BoolRef.__bool__ = branch # monkey patch in the __bool__ overload
        for i in range(limit):
            if s.check() == unsat: # If no more branches possible, stop
                break
            trace = [] # reset the trace
            res = f(*vs) # run the function
            # res = z3.simplify(res) # might be nice.
            traces.append((trace,res)) # record the result of the run
            s.add(Not(And(trace))) # disallow exact trace again
        BoolRef.__bool__ = None
        return traces
    return wrapper




@symexec(z3.Int("x"))
def foo(x):
    if x > 3:
        if x == 4:
            return x
        else:
            return x - 2
    else:
        return x*3
    
foo
[([(x > 3) == True, (x == 4) == True], x),
 ([(x > 3) == True, (x == 4) == False], x - 2),
 ([(x > 3) == False], x*3)]

We can get details like comparator connections

@symexec(Int("x"), Int("y"))
def comparison_example(x,y):
    return y - 3 < x < y + 4 or x > 3
comparison_example
[([(y - 3 < x) == True, (x < y + 4) == True], x < y + 4),
 ([(y - 3 < x) == False, (y - 3 < x) == False], x > 3),
 ([(y - 3 < x) == True, (x < y + 4) == False], x > 3)]

or match the statement

@symexec(Int("x"))
def matcher(x):
    match x:
        case 0:
            return 1
        case 2:
            return x + 14
        case _:
            return x * 2 
matcher
[([(x == 0) == False, (x == 2) == False], x*2),
 ([(x == 0) == False, (x == 2) == True], x + 14),
 ([(x == 0) == True], 1)]

or bounded while

@symexec(Int("x"))
def bwhile(x):
    if x > 0 and x < 10:
        acc = 0
        while x > 0:
            x -= 2
            acc += x
        return acc
bwhile
[([(x > 0) == False], None),
 ([(x > 0) == True, (x < 10) == True, (x > 0) == True, (x - 2 > 0) == False],
  0 + x - 2),
 ([(x > 0) == True,
   (x < 10) == True,
   (x > 0) == True,
   (x - 2 > 0) == True,
   (x - 2 - 2 > 0) == False],
  0 + x - 2 + x - 2 - 2),
 ([(x > 0) == True,
   (x < 10) == True,
   (x > 0) == True,
   (x - 2 > 0) == True,
   (x - 2 - 2 > 0) == True,
   (x - 2 - 2 - 2 > 0) == True,
   (x - 2 - 2 - 2 - 2 > 0) == False],
  0 + x - 2 + x - 2 - 2 + x - 2 - 2 - 2 + x - 2 - 2 - 2 - 2),
 ([(x > 0) == True, (x < 10) == False], None),
 ([(x > 0) == True,
   (x < 10) == True,
   (x > 0) == True,
   (x - 2 > 0) == True,
   (x - 2 - 2 > 0) == True,
   (x - 2 - 2 - 2 > 0) == True,
   (x - 2 - 2 - 2 - 2 > 0) == True,
   (x - 2 - 2 - 2 - 2 - 2 > 0) == False],
  0 +
  x - 2 +
  x - 2 - 2 +
  x - 2 - 2 - 2 +
  x - 2 - 2 - 2 - 2 +
  x - 2 - 2 - 2 - 2 - 2),
 ([(x > 0) == True,
   (x < 10) == True,
   (x > 0) == True,
   (x - 2 > 0) == True,
   (x - 2 - 2 > 0) == True,
   (x - 2 - 2 - 2 > 0) == False],
  0 + x - 2 + x - 2 - 2 + x - 2 - 2 - 2)]

We can also seek infinite while, but then the output will be incomplete.

@symexec(Int("x"), limit=5)
def bfor(x):
    acc = 0
    while x > 0:
        x -= 1
        acc += x
    return acc
bfor
[([(x > 0) == False], 0),
 ([(x > 0) == True, (x - 1 > 0) == False], 0 + x - 1),
 ([(x > 0) == True,
   (x - 1 > 0) == True,
   (x - 1 - 1 > 0) == True,
   (x - 1 - 1 - 1 > 0) == False],
  0 + x - 1 + x - 1 - 1 + x - 1 - 1 - 1),
 ([(x > 0) == True,
   (x - 1 > 0) == True,
   (x - 1 - 1 > 0) == True,
   (x - 1 - 1 - 1 > 0) == True,
   (x - 1 - 1 - 1 - 1 > 0) == False],
  0 + x - 1 + x - 1 - 1 + x - 1 - 1 - 1 + x - 1 - 1 - 1 - 1),
 ([(x > 0) == True,
   (x - 1 > 0) == True,
   (x - 1 - 1 > 0) == True,
   (x - 1 - 1 - 1 > 0) == True,
   (x - 1 - 1 - 1 - 1 > 0) == True,
   (x - 1 - 1 - 1 - 1 - 1 > 0) == False],
  0 +
  x - 1 +
  x - 1 - 1 +
  x - 1 - 1 - 1 +
  x - 1 - 1 - 1 - 1 +
  x - 1 - 1 - 1 - 1 - 1)]

If I want to be careful, I should make sure the limit is not reached and return Forgotten if the result is None.

Techniques can be combined with hypotheses for concolic testing.

I might be able to recapture the loop by checking the stack of calls and noticing that we’ve returned to the position we saw previously __bool__. But it would be awkward. Perhaps one could also do this by logging more information in the reload, but that would require more work.

https://docs.python.org/3/library/codecs.html#codecs.register
https://docs.python.org/3/reference/import.html
https://news.ycombinator.com/item?id=41322758 python preprocessor. Demon macro.

I’ve seen people do some special things in jupyter. I really don’t like these

Nellie Astor. Rewrite if elif else -> if rewrite return null? Overloaded types work except and or not to And Or Not If override x <= y <= z

Some MLIR and other pytonh syntax using DSL

exo compiler and others.
https://cap.csail.mit.edu/sites/default/files/research-pdfs/Codon-%20A%20Compiler%20for%20High-Performance%20Pythonic%20Applications%20and%20DSLs.pdf Codon conversion to llvm
https://github.com/exaloop/codon

iftheelse can be converted to x if c else. No.

chocolate codon mypyc

Constructive style

Thermometer continuation

Dialectics and inquiry. Pierre Marie Pedro

Symbolic execution through overloading Bollinger value

Honestly, it’s probably possible to use C++ in this way. Symbolic execution of C++ is not easy. Of course Klee. But Klee worked really hard.

Symcc has some evaluation atmosphere

Partial assessment using the same idea? Tracking starts by setting up some variables. Konklik

Actually detecting the loop seems difficult because we don’t reach the loop again with the same expression.

Concolic By using assumptions. Decorator “partial evaluation”

What does Rosette really offer relative to python + z3? I’ve never been clear on this https://docs.racket-lang.org/rosette-guide/index.html
It does go beyond racket making dsl stuff if you’re interested in that.

Can be switched to non-based form.

It can be useful to tag code strings with additional metadata. If the type doesn’t make sense, you want it to fail early. But then you can’t use fstrings. 🙁

from dataclasses import dataclass
from typing import TypeVar
T = TypeVar('T')
@dataclass
class Code[T]:
    typ : T
    code : str

    def __add__(self):

def mypow(x:int,n:int) -> int:
    assert n >= 0
    if n == 0:
        return 1
    else:
        x * mypow(x,n-1)

# accumulator version?

# string version
# strings are a universal but somewhat structure free rep of code.
Code = str
def mypow2(n:int, x:Code) -> Code:
    if x == 0:
        return "1"
    else:
        f"{x} * {mypow2(x,n-1)}"

mypow = lambda x,n: 1 if n <= 0 else x * mypow(x,n-1)

mypow = Function("mypow", IntSort(), IntSort(), IntSort())
mypow_def = ForAll([x,n], mypow(n, x) == If(n <= 0, 1, x * mypow(n-1, x)))

# Partially evaled
def mypow(x:ExprRef, n:int) -> ExprRef:
    if n == 0:
        return IntVal(1)
    else:
        return x * mypow(x,n-1)


class Bool():
    def __bool__(self):
        print("asked")
        return True
    

b = Bool()
if b:
    print("yes")
else:
    print("no")

import z3

ctx = None
path = []
s = Solver()
def record_bool():
    s.check()
    path.append()

z3.BoolRef.__bool__ = lambda self, 
def reify(f, vs):
    global ctx
    ctx = [m]
    while True:
        retval = f(vs)
        
    ctx = None

@reify
def example(x):
    if x == 7:
        return 3
    else:
        if x > 14:
            return 4
        else:
            return 5
from z3 import *
import kdrag.smt as smt
traces = []
trace  = []
def reset():
    global trace

    trace = []
def record(self):
    s = Solver()
    for tr,res in traces:
        s.add(smt.Not(smt.And(tr))) # we can't repeat the same trace.
    s.add(smt.And(trace)) # currently this is true
    s.add(self) # bias towards taking the branch
    res = s.check()
    if res == sat:
        trace.append(self == True)
        return True
    else:
        trace.append(self == False)
        return False
BoolRef.__bool__ = record
x = Int("x")

def foo(x):
    if x > 3:
        if x == 4:
            return x
        else:
            return x-1
    else:
        return x*3

while True:
    trace = []
    res = foo(x)
    if len(traces) > 0 and all( c1.eq(c2) for c1,c2 in zip(traces[-1][0],trace)):
        break 
    traces.append((trace,res))
traces

# maybe a brute force is better?
# could rejoin cases
# could be way more efficient with sat solver queries.
# loop detection?

Hmm. In fact, maybe doing it in order is important to solve the problem correctly? No, I don’t think so.

[v0] []
[v0] [([(v0 > 3) == True, (v0 == 4) == True], v0)]
[v0] [([(v0 > 3) == True, (v0 == 4) == True], v0), ([(v0 > 3) == False], v0*3)]





[([(v0 > 3) == True, (v0 == 4) == True], v0),
 ([(v0 > 3) == False], v0*3),
 ([(v0 > 3) == True, (v0 == 4) == False], v0 - 2)]
@symexec(IntSort())
def comp(x):
    return x + 3 < x < x - 4 or x > 3
comp
[v0] []





[([(v0 + 3 < v0) == False, (v0 + 3 < v0) == False], v0 > 3)]

Some thoughts on compression to make the output prettier.

We might not be able to compile into an if then else tree, so the way we arrange it might be very different.

def lookup_trie(trie, key):
    node = trie
    for k in key:
        node = node.get(k)
        if node is None:
            return None
    return node

def store_trie(trie, key, value):
    node = trie
    for k in key:

        if k not in node:
            node[k] = {}
        node = node[k]
    node["result"] = value

def trace_trie(traces):
    trie = {}
    for trace,res in traces:
        print(trie)
        store_trie(trie, trace, res)
    return trie

def compress_trie(traces):
    # If only one path, compress away that node
    # If all paths have same result, compress away.
    # probably best to do bottom up
    pass

t = {}
store_trie(t, "abc" ,3 )

trace_trie(foo)
t
{}
{(v0 > 3) == True: {(v0 == 4) == True: {'result': v0}}}
{(v0 > 3) == True: {(v0 == 4) == True: {'result': v0}}, (v0 > 3) == False: {'result': v0*3}}



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Cell In[29], line 34
     31 t = {}
     32 store_trie(t, "abc" ,3 )
---> 34 trace_trie(foo)
     35 t


Cell In[29], line 22, in trace_trie(traces)
     20 for trace,res in traces:
     21     print(trie)
---> 22     store_trie(trie, trace, res)
     23 return trie


Cell In[29], line 13, in store_trie(trie, key, value)
     10 node = trie
     11 for k in key:
---> 13     if k not in node:
     14         node[k] = {}
     15     node = node[k]


TypeError: 'NoneType' object is not callable

2024-12-24 03:48:11

Leave a Reply

Your email address will not be published. Required fields are marked *