Accidentally writing a SAT solver
135 points
28 days ago
| 9 comments
| blog.danielh.cc
| HN
porcoda
20 days ago
[-]
Interesting post, but I’m not sure this really speaks to what goes into actually writing what would be considered a “fast” SAT solver. It seems more like a post about how SAT pops up in a lot of places if you look at them right. For the state of the art in what constitutes fast solvers, the annual SAT competition papers are quite interesting to read if you’re interested in the techniques people come up with to make them fast. A few years ago I was working through Knuth’s satisfiability book and writing my own solvers, and was always amazed how stunningly fast the SAT competition winners were compared to the ones I’d code up.
reply
dang
20 days ago
[-]
Ok, we've taken the fast bit out of the title above. It's still a good post!
reply
ComplexSystems
20 days ago
[-]
SAT turns up everywhere because it's almost universal kind of problem. Since it is NP complete, everything in NP can be transformed into an instance of SAT. Since P is a subset of NP, everything in P can be also be turned into an instance of SAT. Nobody knows if things in PSPACE can be, though.
reply
butokai
20 days ago
[-]
Add to this that propositional logic (the language in which we express SAT) is a versatile language to code problems in. Finding cliques in a graph is also NP complete, but it is less natural to use it as a language to code other problems.
reply
riku_iki
20 days ago
[-]
> Add to this that propositional logic (the language in which we express SAT) is a versatile language to code problems in

is it though? You can't express some basic loop in propositional logic, right?

reply
viraptor
20 days ago
[-]
Loops are part of a coded solution, not the problem. With SAT you encode the problem / solution space itself.
reply
CaptainNegative
20 days ago
[-]
> Since P is a subset of NP, everything in P can be also be turned into an instance of SAT.

This statement is kind of trivial. The same is true for any language (other than the empty language and the language containing all strings). The reduction is (1) hardcode the values of one string, y, that is in the language and another string, z, that is not in the language (2) solve the problem on the given input x in polynomial time poly(x) (3) return y if x is to be accepted and z otherwise.

The total running time is at most poly(x)+O(|y|+|z|) which is still poly(x) since |y| and |z| are hardcoded constant values.

reply
layer8
20 days ago
[-]
One would assume there are some low-hanging fruits that make up the bulk of the speed-ups, but maybe it’s really a huge pile of small incremental improvements?
reply
ted_dunning
20 days ago
[-]
My impression is that going from a very simple case like this to a moderately fast solver involves bringing in a fair number of intuitive improvements like memoization of anti-patterns and heuristic reordering of the search.

But getting a really fast solver requires multiple strategies that don't work well together (so they have to work semi-independently). This leads to other problems related to managing shared resources like how much you should let different strategies cache information at the expense of how much other strategies can cache information. Tuning all of these trade-offs is exceedingly difficult to do well since it depends a lot on the types of problems that you need to solve.

reply
accurrent
20 days ago
[-]
SAT shows up in a lot of problems. Im doing my PhD in multi-agent robotics after spending some time working on real life multirobot deployments. Ive been frustrated because most roboticists I talk to think SAT is a dead end, but we have been having insane advances in solver speeds over the years. I guess everyone is obsessed about the ML hypetrain right now, but where sat shines is when we need to orchestrate at a task level. I feel theres defintely work to be done to bridge both worlds.
reply
imtringued
20 days ago
[-]
Actually quadratic programming is all the rage these days since computers have gotten fast enough that you can run QP solvers in your control loop.
reply
anonymousDan
20 days ago
[-]
On a related note, anyone have any advice for getting started with something like Z3?
reply
sevensor
20 days ago
[-]
It really helps to have a hard problem to solve in the first place. Try scheduling all the classes in a university timetable into available classrooms, subject to constraints like maximum seating and not double booking an instructor. Now try to add constraints like “art classes must be in a studio.” Then to make it really fun, “no more than one fourth of the Electrical Engineering classes may be taught outside of the EE building.”
reply
mkl
20 days ago
[-]
And even better, consecutive tutorials must be in the same room or nearby rooms (tutors and teaching materials can't teleport), and there are two or three simultaneous sequences of tutorials in big courses (so consecutive tutorials actually can be in distant rooms if they're in different sequences), and the number of students requires that the university's rooms be in use 90% of the time, and medium to large lecture theatres must be in use 98% of the time.

This is an annual battle I engage in. The software sucks.

reply
sevensor
20 days ago
[-]
It does, but better scheduling will only go so far at an institution that underinvests in classroom space. Between “looks fancy” and “meets instructional needs,” donors prefer to have their names on “looks fancy” every time.
reply
mkl
19 days ago
[-]
It's a public university, donors are not a significant source of funding, and government funding has not increased commensurately with the number of students, which has gone up dramatically in recent years (mainly due to relative cost of living, so no guarantee it will last).
reply
adsharma
20 days ago
[-]
I find both z3 native syntax (smt 2, lispish) and z3py hard to use.

Here's an alternative syntax that uses python3 types. Works by transpiling to smt 2.

https://gist.github.com/adsharma/45fbb065a8fe793030e8360daeb...

https://github.com/py2many/py2many/blob/main/tests/cases/dem...

reply
Jtsummers
20 days ago
[-]
https://theory.stanford.edu/~nikolaj/programmingz3.html - This one was useful for me to get started with it a while ago.
reply
drdrey
20 days ago
[-]
reply
constructum
20 days ago
[-]
If you want to use Z3 with a .NET language, there is also this very useful file with examples of how to use the bindings:

https://github.com/Z3Prover/z3/blob/master/examples/dotnet/P...

The examples are in C#, but easy to adapt to other languages. I found them quite useful to write a program for symbolic execution in F# that calls Z3 to simplify conditionals, here is its interface to Z3:

https://github.com/constructum/asm-symbolic-execution/blob/m...

The bindings are perhaps a bit cumbersome, but rather easy to understand and work very well. Once you appropriately wrap what you need, you can forget about the bindings as well as avoiding SMT-LIB completely.

reply
cchianel
20 days ago
[-]
For constraint programming solvers, you need to define a model (i.e. what are the variables that the solver can change). Typically, a good model naturally enforces hard constraints. For instance, consider the employee scheduling problem, where you have a list of shifts that need to be assigned a single employee. Two possible models for it are:

- Use a boolean variable that is true if and only if a particular employee is assigned to a particular shift. For 2 shifts (A, B) and 2 employees (Amy, Beth), the variables would be Amy_A, Amy_B, Beth_A, Beth_B

- Use an int variable, where each employee is mapped to a number. For 2 shifts (A, B) and 3 employees (Amy, Beth, Carl), the variables would be A, B (which will have value 0 for Amy, 1 for Beth, 2 for Carl).

Using an int variable is usually better, since it automatically encodes the "each shift must have exactly one employee constraint" which would otherwise need to be added. That being said, sometimes the boolean model is used so a SAT solver can be used instead of a Integer Linear Programming Solver.

Typically, for theorem based solvers (such as Z3 or OR Tools), you add a group of similar constraints in a loop where you iterate through relevant variables. For instance, to add constraints for overlapping shifts, you would have a directory mapping each shift to the shifts its overlaps, and add a not equals constraint for each pair (since if they are equal, they have the same employee, and employees usually are unable to be at two places at the same time).

  for shift in shift_vars:
      for overlapping_shift in overlapping_shifts[shift]:
          solver.add(shift != overlapping_shift)
There are also local search solvers, such as Timefold, which allows you use your domain objects and functions directly in your constraints. For instance, the above constraint would look like this:

  @planning_entity
  @dataclass
  class Shift:
      employee: Annotated[Employee, PlanningVariable]
      start: datetime
      end: datetime
      
      def overlaps(self, other: 'Shift') -> bool:
          return self.start < other.end and other.start < self.end
  
  def no_overlapping_shifts(constraint_factory: ConstraintFactory):
      return (constraint_factory.for_each(Shift)
                                .join(Shift,
                                      Joiners.equal(lambda shift: shift.employee))
                                .filter(lambda a, b: a.overlaps(b))
                                .penalize(HardSoftScore.ONE_HARD)
                                .as_constraint('Overlapping Shift'))
Disclosure: I work for Timefold
reply
porcoda
20 days ago
[-]
Read up on smt-lib: learning how to encode problems in that is a good way to start. The Python z3 bindings are a good starting point to play with it too.
reply
Klaus23
20 days ago
[-]
reply
3eb7988a1663
20 days ago
[-]
I wish I had something to offer, but I think there is little available but grit. The solvers are magical in what they can do, but structuring a problem into the DSL is an exercise in pain. Seemingly few available public examples of patterns you can crib.
reply
Neywiny
19 days ago
[-]
Huh. As a UMD grad who faced this same problem, it's an interesting post. What I'll say is that their ECE (which CS does not fall into, idk how they do it) department advisors gave us updated 4 year plans every semester or so. It meant I never had to worry about not having the right classes to graduate. And we had to get permission for every class which while annoying meant my advisor looked over every major-specific class. I can't even count how many people I know or know who know that wasted upwards of years on classes that didn't count. None of that for me.

On the other hand, I remember for my acceptance (which wasn't too the ECE program) I had to pick classes before I could confirm going to UMD? I don't remember why but I remember panicking because here I am, a high schooler still, picking college classes that would set my next 4 years. I was even more terrified when my first ever class not only was I late because it was in the back of a basement, but the instructor made some comment about it not being the right class for engineers or something. It was, though. You fix this and other problems: 1. I started using the UMD provided schedule planner and some 3rd party ones, with multiple backups for when they'd fill up 2. I made an app that showed most buildings' floor plans completely offline. No more wandering around during stressful times. 3. I did try and make a tree of classes using prerequisite days scraped from the SoC, but it wasn't a regular expression so I gave up immediately.

The blogger like CMSC430 and I agree it was a good class though I had a different professor.

reply
RestartKernel
20 days ago
[-]
I really like the styling of this blog. It's nice on the eyes, gets out of the way, and the collapsed containers for extra info is a nice touch. There's a bit of layout shift though, but that's about it.
reply
andai
20 days ago
[-]
I'm on mobile too, I disabled JS in my browser to test it out, the site loads fine and the expanding boxes work too (I think it's the <details> tag).
reply
jkaptur
20 days ago
[-]
> As a result, in order to determine if a formula is satisfiable, first convert it to conjunctive normal form, then convert the new formula into a course catalog.

I know this is a consequence of NP-completeness and so on and so forth, but I also find it a funny and charming way to phrase it. Once we've solved the fundamental problem (what courses to take), we're able to solve simple specializations and derivatives (boolean satisfiability).

reply
baol
20 days ago
[-]
Probably worth mentioning that there are well-known linear time algorithms to construct a solution for n-queen problem https://en.wikipedia.org/wiki/Eight_queens_puzzle#Existence_...
reply
Arcuru
20 days ago
[-]
True, finding one solution is easy but finding all the solutions can be a fun little optimization challenge.

I made a repo many years ago with a bunch of grab bag solutions for comparisons [1]; from dumb brute force to DLX (Knuth's Dancing Links) and a multithreaded bitwise backtracking algorithm. And one where I just hardcoded the answers because all the counts up to 27 are known.

So I'm all for just jumping to the existing known solutions, but it seems like the OP is having fun while they learn a little bit. They seem to just be a college freshman.

[1] - https://github.com/arcuru/nqueens

reply
Halian
17 days ago
[-]
For some reason, I thought this would have to do with the standardized test, lol.
reply
dang
20 days ago
[-]
[stub for offtopicness]
reply
IshKebab
20 days ago
[-]
Backtracking is not a fast SAT solver.
reply
dang
20 days ago
[-]
reply
sangnoir
20 days ago
[-]
OTOH, you can't write a Boolean SAT solver without backtracking.
reply
efangs
20 days ago
[-]
sorry this is not fast
reply
dang
20 days ago
[-]
reply
thimkerbell
20 days ago
[-]
"* SAT solver * is a computer program which aims to solve the Boolean satisfiability problem", has nothing to do with the SAT aptitude tests.
reply