FilterHN

Accidentally writing a SAT solver

135 points

by max__dev

7 months ago

| past

| 9 comments

| blog.danielh.cc

| HN

▲

porcoda

7 months ago

[-]

Interesting post, but I’m not sure this really speaks to what goes into actually writing what would be considered a “fast” SAT solver. It seems more like a post about how SAT pops up in a lot of places if you look at them right. For the state of the art in what constitutes fast solvers, the annual SAT competition papers are quite interesting to read if you’re interested in the techniques people come up with to make them fast. A few years ago I was working through Knuth’s satisfiability book and writing my own solvers, and was always amazed how stunningly fast the SAT competition winners were compared to the ones I’d code up.

▲

dang

7 months ago

[-]

Ok, we've taken the fast bit out of the title above. It's still a good post!

▲

ComplexSystems

7 months ago

[-]

SAT turns up everywhere because it's almost universal kind of problem. Since it is NP complete, everything in NP can be transformed into an instance of SAT. Since P is a subset of NP, everything in P can be also be turned into an instance of SAT. Nobody knows if things in PSPACE can be, though.

▲

butokai

7 months ago

[-]

Add to this that propositional logic (the language in which we express SAT) is a versatile language to code problems in. Finding cliques in a graph is also NP complete, but it is less natural to use it as a language to code other problems.

▲

riku_iki

7 months ago

[-]

> Add to this that propositional logic (the language in which we express SAT) is a versatile language to code problems in

is it though? You can't express some basic loop in propositional logic, right?

▲

viraptor

7 months ago

[-]

Loops are part of a coded solution, not the problem. With SAT you encode the problem / solution space itself.

▲

CaptainNegative

7 months ago

[-]

> Since P is a subset of NP, everything in P can be also be turned into an instance of SAT.

This statement is kind of trivial. The same is true for any language (other than the empty language and the language containing all strings). The reduction is (1) hardcode the values of one string, y, that is in the language and another string, z, that is not in the language (2) solve the problem on the given input x in polynomial time poly(x) (3) return y if x is to be accepted and z otherwise.

The total running time is at most poly(x)+O(|y|+|z|) which is still poly(x) since |y| and |z| are hardcoded constant values.

▲

layer8

7 months ago

[-]

One would assume there are some low-hanging fruits that make up the bulk of the speed-ups, but maybe it’s really a huge pile of small incremental improvements?

▲

ted_dunning

7 months ago

[-]

My impression is that going from a very simple case like this to a moderately fast solver involves bringing in a fair number of intuitive improvements like memoization of anti-patterns and heuristic reordering of the search.

But getting a really fast solver requires multiple strategies that don't work well together (so they have to work semi-independently). This leads to other problems related to managing shared resources like how much you should let different strategies cache information at the expense of how much other strategies can cache information. Tuning all of these trade-offs is exceedingly difficult to do well since it depends a lot on the types of problems that you need to solve.

▲

accurrent

7 months ago

[-]

SAT shows up in a lot of problems. Im doing my PhD in multi-agent robotics after spending some time working on real life multirobot deployments. Ive been frustrated because most roboticists I talk to think SAT is a dead end, but we have been having insane advances in solver speeds over the years. I guess everyone is obsessed about the ML hypetrain right now, but where sat shines is when we need to orchestrate at a task level. I feel theres defintely work to be done to bridge both worlds.

▲

imtringued

7 months ago

[-]

Actually quadratic programming is all the rage these days since computers have gotten fast enough that you can run QP solvers in your control loop.

▲

anonymousDan

7 months ago

[-]

On a related note, anyone have any advice for getting started with something like Z3?

▲

sevensor

7 months ago

[-]

It really helps to have a hard problem to solve in the first place. Try scheduling all the classes in a university timetable into available classrooms, subject to constraints like maximum seating and not double booking an instructor. Now try to add constraints like “art classes must be in a studio.” Then to make it really fun, “no more than one fourth of the Electrical Engineering classes may be taught outside of the EE building.”

▲

mkl

7 months ago

[-]

And even better, consecutive tutorials must be in the same room or nearby rooms (tutors and teaching materials can't teleport), and there are two or three simultaneous sequences of tutorials in big courses (so consecutive tutorials actually can be in distant rooms if they're in different sequences), and the number of students requires that the university's rooms be in use 90% of the time, and medium to large lecture theatres must be in use 98% of the time.

This is an annual battle I engage in. The software sucks.

▲

sevensor

7 months ago

[-]

It does, but better scheduling will only go so far at an institution that underinvests in classroom space. Between “looks fancy” and “meets instructional needs,” donors prefer to have their names on “looks fancy” every time.

▲

mkl

7 months ago

[-]

It's a public university, donors are not a significant source of funding, and government funding has not increased commensurately with the number of students, which has gone up dramatically in recent years (mainly due to relative cost of living, so no guarantee it will last).

▲

adsharma

7 months ago

[-]

I find both z3 native syntax (smt 2, lispish) and z3py hard to use.

Here's an alternative syntax that uses python3 types. Works by transpiling to smt 2.

https://gist.github.com/adsharma/45fbb065a8fe793030e8360daeb...

https://github.com/py2many/py2many/blob/main/tests/cases/dem...

▲

Jtsummers

7 months ago

[-]

https://theory.stanford.edu/~nikolaj/programmingz3.html - This one was useful for me to get started with it a while ago.

▲

drdrey

7 months ago

[-]

https://microsoft.github.io/z3guide/docs/logic/intro/

▲

constructum

7 months ago

[-]

If you want to use Z3 with a .NET language, there is also this very useful file with examples of how to use the bindings:

https://github.com/Z3Prover/z3/blob/master/examples/dotnet/P...

The examples are in C#, but easy to adapt to other languages. I found them quite useful to write a program for symbolic execution in F# that calls Z3 to simplify conditionals, here is its interface to Z3:

https://github.com/constructum/asm-symbolic-execution/blob/m...

The bindings are perhaps a bit cumbersome, but rather easy to understand and work very well. Once you appropriately wrap what you need, you can forget about the bindings as well as avoiding SMT-LIB completely.

▲

cchianel

7 months ago

[-]

For constraint programming solvers, you need to define a model (i.e. what are the variables that the solver can change). Typically, a good model naturally enforces hard constraints. For instance, consider the employee scheduling problem, where you have a list of shifts that need to be assigned a single employee. Two possible models for it are:

- Use a boolean variable that is true if and only if a particular employee is assigned to a particular shift. For 2 shifts (A, B) and 2 employees (Amy, Beth), the variables would be Amy_A, Amy_B, Beth_A, Beth_B

- Use an int variable, where each employee is mapped to a number. For 2 shifts (A, B) and 3 employees (Amy, Beth, Carl), the variables would be A, B (which will have value 0 for Amy, 1 for Beth, 2 for Carl).

Using an int variable is usually better, since it automatically encodes the "each shift must have exactly one employee constraint" which would otherwise need to be added. That being said, sometimes the boolean model is used so a SAT solver can be used instead of a Integer Linear Programming Solver.

Typically, for theorem based solvers (such as Z3 or OR Tools), you add a group of similar constraints in a loop where you iterate through relevant variables. For instance, to add constraints for overlapping shifts, you would have a directory mapping each shift to the shifts its overlaps, and add a not equals constraint for each pair (since if they are equal, they have the same employee, and employees usually are unable to be at two places at the same time).

  for shift in shift_vars:
      for overlapping_shift in overlapping_shifts[shift]:
          solver.add(shift != overlapping_shift)

There are also local search solvers, such as Timefold, which allows you use your domain objects and functions directly in your constraints. For instance, the above constraint would look like this:

  @planning_entity
  @dataclass
  class Shift:
      employee: Annotated[Employee, PlanningVariable]
      start: datetime
      end: datetime
      
      def overlaps(self, other: 'Shift') -> bool:
          return self.start < other.end and other.start < self.end
  
  def no_overlapping_shifts(constraint_factory: ConstraintFactory):
      return (constraint_factory.for_each(Shift)
                                .join(Shift,
                                      Joiners.equal(lambda shift: shift.employee))
                                .filter(lambda a, b: a.overlaps(b))
                                .penalize(HardSoftScore.ONE_HARD)
                                .as_constraint('Overlapping Shift'))

Disclosure: I work for Timefold

▲

porcoda

7 months ago

[-]

Read up on smt-lib: learning how to encode problems in that is a good way to start. The Python z3 bindings are a good starting point to play with it too.

▲

Klaus23

7 months ago

[-]

https://smt.st/SAT_SMT_by_example.pdf

▲

3eb7988a1663

7 months ago

[-]

I wish I had something to offer, but I think there is little available but grit. The solvers are magical in what they can do, but structuring a problem into the DSL is an exercise in pain. Seemingly few available public examples of patterns you can crib.

▲

Neywiny

7 months ago

[-]

Huh. As a UMD grad who faced this same problem, it's an interesting post. What I'll say is that their ECE (which CS does not fall into, idk how they do it) department advisors gave us updated 4 year plans every semester or so. It meant I never had to worry about not having the right classes to graduate. And we had to get permission for every class which while annoying meant my advisor looked over every major-specific class. I can't even count how many people I know or know who know that wasted upwards of years on classes that didn't count. None of that for me.

On the other hand, I remember for my acceptance (which wasn't too the ECE program) I had to pick classes before I could confirm going to UMD? I don't remember why but I remember panicking because here I am, a high schooler still, picking college classes that would set my next 4 years. I was even more terrified when my first ever class not only was I late because it was in the back of a basement, but the instructor made some comment about it not being the right class for engineers or something. It was, though. You fix this and other problems: 1. I started using the UMD provided schedule planner and some 3rd party ones, with multiple backups for when they'd fill up 2. I made an app that showed most buildings' floor plans completely offline. No more wandering around during stressful times. 3. I did try and make a tree of classes using prerequisite days scraped from the SoC, but it wasn't a regular expression so I gave up immediately.

The blogger like CMSC430 and I agree it was a good class though I had a different professor.

▲

RestartKernel

7 months ago

[-]

I really like the styling of this blog. It's nice on the eyes, gets out of the way, and the collapsed containers for extra info is a nice touch. There's a bit of layout shift though, but that's about it.

▲

andai

7 months ago

[-]

I'm on mobile too, I disabled JS in my browser to test it out, the site loads fine and the expanding boxes work too (I think it's the <details> tag).

▲

jkaptur

7 months ago

[-]

> As a result, in order to determine if a formula is satisfiable, first convert it to conjunctive normal form, then convert the new formula into a course catalog.

I know this is a consequence of NP-completeness and so on and so forth, but I also find it a funny and charming way to phrase it. Once we've solved the fundamental problem (what courses to take), we're able to solve simple specializations and derivatives (boolean satisfiability).

▲

baol

7 months ago

[-]

Probably worth mentioning that there are well-known linear time algorithms to construct a solution for n-queen problem https://en.wikipedia.org/wiki/Eight_queens_puzzle#Existence_...

▲

Arcuru

7 months ago

[-]

True, finding one solution is easy but finding all the solutions can be a fun little optimization challenge.

I made a repo many years ago with a bunch of grab bag solutions for comparisons [1]; from dumb brute force to DLX (Knuth's Dancing Links) and a multithreaded bitwise backtracking algorithm. And one where I just hardcoded the answers because all the counts up to 27 are known.

So I'm all for just jumping to the existing known solutions, but it seems like the OP is having fun while they learn a little bit. They seem to just be a college freshman.

[1] - https://github.com/arcuru/nqueens

▲

Halian

7 months ago

[-]

For some reason, I thought this would have to do with the standardized test, lol.

▲

dang

7 months ago

[-]

[stub for offtopicness]

▲