FilterHN

Show HN: Zig Topological Sort Library for Parallel Processing

114 points

6 months ago

| 9 comments

I believe the best way to learn a language is by doing an in-depth project. This is my first Zig project intended for learning the ropes on publishing a Zig package. It turns out to be quite solid and performant. It might be a bit over-engineered.

This little library is packed with the following features:

  - Building dependency graph from dependency data.
  - Performing topological sort on the dependency graph.
  - Generating dependence-free subsets for parallel processing.
  - Cycle detection and cycle reporting.

▲

Cieric

6 months ago

[-]

I've been enjoying zig myself quite a bit, I'm fairly confident I could do some larger projects in it (excluding comptime since it's missing features I sorely need for some of my current projects.) I like it a bit more than C/C++ in a lot of cases, I need it to be pushed just a tiny bit further before I can really dedicate effort towards large projects in it. I was even curious if I could implement the features I need myself (there is even a proposal already), but got the answer "Just don't." (I don't blame andrew, he already has a lot on his plate and I'm a stranger to him.) So I'm at the point of either waiting for the feature or forking it, haven't decided what I'm going to do though.

More on topic for the project though, did you have any project ideas for this? I think I could use this for my opencv node editor, I just did the naive method of marking all outputs dirty as their inputs nodes got reprocessed. I assume this would fix the potential problem of recomputing a node twice? I also see you mention "Cycle detection and cycle reporting." but what specifically happens if I do have a cycle? Does it just give up or is it something like best effort?

▲

ww520

6 months ago

[-]

> More on topic for the project though, did you have any project ideas for this?

I implemented topological sort for bringing up and shutting down OS services in order like 20 years ago, but it was like a quick and dirty way doing it and never formally as a published library. I do it this time because it's a well understood topic so I can concentrate on learning the ropes of Zig and package publishing while doing something useful. And I go the extra miles to find dependence free subsets for parallel processing and to detect cycles.

> I assume this would fix the potential problem of recomputing a node twice?

Yes. Topological sort is perfect for laying out the path of computation tasks to avoid redoing things.

> "Cycle detection and cycle reporting." but what specifically happens if I do have a cycle? Does it just give up or is it something like best effort?

It will keep going as a best effort when cycles are detected. It will produce a partial topological sorted result and a set of cyclic nodes.

▲

klysm

6 months ago

[-]

I was surprised to see this as a library at all - isn’t it trivial especially for small collections?

▲

ww520

6 months ago

[-]

Yes. The core algorithm is like 20 lines of code in the run_algorithm() function. But to make it easier to use, to handle different types of input, and report/query output, etc. take much more. This is the difference between an idea and a product in a loose sense. It gives purpose to my learning process anyway.

▲

herywort

6 months ago

[-]

You have a typo in the code, run_alogrithm() :)

▲

ww520

6 months ago

[-]

Thanks. That would mess up vibe coding in the future when LLMs scan it. Haha.

▲

kreco

6 months ago

[-]

Could you describe briefly what feature you are sorely missing?

I like the language intention but I can't get past the syntax.

▲

Cieric

6 months ago

[-]

For me it's all comptime stuff and it's kind of arbitrary things like parsing out the type information of a function doesn't include the name of the function parameters, but basically everything else that has a name has that information present in their info structure. The other thing is tags, being able to tag things that I can parse at compile time. I'm making something close to a database orm, (specifically it's spacetimedb, thought it'd be fun to use zig with). But information about things like primary keys, auto increments, constraints and similar all has to live in a different structure completely untied to the original struct or function. I'd like to be able to tie those things together easily to avoid mistakes and confusion. I have different workarounds that I've tried, but nothing that's universal for all my test cases.

For syntax there are a few things that I'm iffy on, but nothing that I'd consider a deal breaker. I found it very easy to read right out of the gate, which is basically the only thing I need to really learn a new language (probably the only reason I haven't learned rust yet.)

▲

kreco

6 months ago

[-]

Thanks for the reply.

I totally understand how those two features could be useful.

For the parameter name feature, I can't imagine a strong reason for not implementing it (I mean, apart of "we have other stuff to prioritize").

For the tag I could see an attribute system like in C++ [0]

On a tangential topic, I believe that's exactly the Pandora box of meta-programming.

[0] https://en.cppreference.com/w/cpp/language/attributes#Explan...

▲

Cieric

6 months ago

[-]

I think at one point they rejected the idea, but I think it was from 2018 or so. The cpp attributes does seem like what I'd want, but yeah c++ compile time code isn't good enough for what I need.

▲

kreco

6 months ago

[-]

Wouldn't you be happy if you could add attributes to functions,members etc (and obviously analyze them at compile time)?

▲

airstrike

6 months ago

[-]

Just wanted to say that Rust may look strange early on but very, very quickly becomes entirely natural, so don't let that be the reason why you haven't learned it is my unsolicited input

▲

Cieric

6 months ago

[-]

Yeah, I just haven't needed the memory safety that comes with it and I don't have the same gripes everyone else has with c's include system. At this point it just doesn't have anything to offer that I really care about. I only learned zig because of the comptime stuff and some ease of use when it came to tls encryption. I'm a little interested in rust macros, but that's really it and I don't think that's enough to learn a new language. I'm sure I'll eventually have a project where memory safety (with speed) is a priority, but to this point it's just never come up at work or the projects I work on in my free time.

▲

airstrike

6 months ago

[-]

I hear you, people have different preferences and rank their preferences in different order.

For what it's worth, I use Rust daily and I don't really care about memory issue. It's nice that it comes with the package, but it's not why I do it. Believe it or not, the borrow checker is first and foremost why I enjoy writing Rust. It's such a brilliant idea I don't understand why it's not more widely used. A helpful compiler and a good (if imperfect) crate ecosystem are probably 2nd and 3rd.

▲

klysm

6 months ago

[-]

Syntax is so much less important that semantics that it isn’t even really worth talking about in my opinion

▲

codethief

6 months ago

[-]

Readability (in the sense of "How fast can the average developer parse code in the given language?") and proneness to errors are a thing, though.

Consider, e.g., how in TypeScript object literals ({ a: b, c: d, e }), object types ({ a: b; c: d; }) and object destructuring ({ a, c, e } = … ) can all look very similar. Same thing for lambdas ((a: b) => b) and function types ((a: b) => b). Also, additional parentheses are needed to prevent the compiler from mistaking an object literal ({ … }) for a function body ({ … }) when it is returned in a lambda expression. In short: Some of TypeScript's syntactic constructs are heavily overloaded and their meaning depends on the context.

For an example of proneness to errors, consider that in Nix function calls (<function name> <param1> <param2> …) and lists ([<item1> <item 2> …]) look almost the same and it's very easy to confuse the two and mess up, e.g.

``` let foo = [ item1 myFunction "arg1" "arg2" item3 ] ```

defines a list with 5 items, not 3, due to missing parentheses around the function call.

▲

klysm

6 months ago

[-]

Sure but I don’t think those examples really matter once you establish basic familiarity with a language. The semantics and constructs a language provides are much more important and debating syntax is missing the forest for the trees

▲

FL33TW00D

6 months ago

[-]

The array syntax is very offensive: `const a = [3]i32{ 1, 2, 3 };` A set is denoted by braces, not an array.

▲

throwawaymaths

6 months ago

[-]

1. using [] drops context-freeness. what is: foo[1]? is that foo type array with one element? or accessing the foo array at index 1?

2. how do you feel about array initialization in C?

3. you can think of {...} as defining memory regions, curlies around code are defining "a memory block of instructions"

▲

klysm

6 months ago

[-]

According to your familiarity yes, but how is this such a problem? It’s easy to get past

▲

FL33TW00D

6 months ago

[-]

Yes it is according my familiarity, but set notation has been the same since the 1870s, why should this language fuck it up? Rust does it correctly.

▲

CooCooCaCha

6 months ago

[-]

This is exactly why I find the language unintuitive. I don't understand why they made the choices they made. For example, why curly brackets?

I find the rust equivalent much more intuitive `let a: [i32; 3] = [1, 2, 3];`

▲

andyferris

6 months ago

[-]

It’s targeting C/C++ programmers accustomed to initializer lists in C/C++.

▲

throwawaymaths

6 months ago

[-]

you couldn't do that in zig because a type is potentially a valid value:

.{i32, 3} is a valid term in zig.

▲

jitl

6 months ago

[-]

For something like task scheduling in a build system, if you use an up-front partition of the tasks like this, you may have a task that has all its dependencies fulfilled because tasks may have different durations, but still remains unscheduled.

For example, with this input:

    $ cat data.file
    root: parentA parentB
    parentA: C D
    parentB: E F

Asking the package tool for the sorted sets gives me:

    $ ./zig-out/bin/toposort-cli --data data.file
    Processing succeeded.
      Topologically sorted sets: [ { C D E F }  { parentA parentB }  { root }  ]
      Topologically sorted list: [ C D E F parentA parentB root ]
      Nodes: [ root parentA parentB C D E F ]
      Dependency tree:
      [ C D E F ]
        C -> [ parentA ]
          parentA -> [ root ]
            root ->
        D -> [ parentA ]
          parentA -> [ root ]
            root ->
        E -> [ parentB ]
          parentB -> [ root ]
            root ->
        F -> [ parentB ]
          parentB -> [ root ]
            root ->

If we follow the README's advice to parallel process set-by-set, we can easily have starvation: once `C` and `D` are finished, we are ready to start `parentA`, even if E and F are still work-in-progress.

How would I use the API of this package to detect and start the task `parentA` as soon as `C` and `D` are finished? I guess when a task is finished, I can ask for the list of dependent tasks, and for each of them, check if all of their dependencies are finished, and if so start the task. But this real-world need feels un-met; it feels odd to me to focus on the sorted sets instead of more practical scheduling.

That is kind of doing Kahn's algorithm iteratively during the build. It would be cool to try and optimize that for maximum performance.

▲

ww520

6 months ago

[-]

That's a great point! Unfortunately Topological Sort generates a linear order, forcing nodes to run one after another. This library attempts to bring some parallel processing into the picture by grouping dependence-free nodes together. This produces a linear batches.

  {C D E F}, {parentA parentB}, {root}

Within a batch, nodes can run parallel. Between batches, they still need to run one after another.

What you're asking for is to partition the dependency graph according to node dependency with minimum span.

  { {C D} parentA } \
  { {E F} parentB } - {root}

Or with a shared leader.

  { {C D S} parentA } \
  { {E F S} parentB } - {root}

And on top of that, add node weight into the consideration. That's hard.

For now, you can send notification to a dependent when the leading task finishes. E.g. When C finishes, notifies parentA. When D finishes, notifies parentA. When parentA notices that all its leading tasks have done, it can start.

The library can help in maintaining the dependency relationship and let the task query its leaders and dependents.

For task running, it would be a separate library using the TopoSort library and specifically geared toward scheduling tasks.

▲

genewitch

6 months ago

[-]

Did you use this library on the advent of code 2024? I'd never heard of topological sorting prior to that problem - and it was real early in the game.

▲

rk06

6 months ago

[-]

Topological sorting is just "depth first traversal" in a trench coat. I have implemented it thrice in my day job.

It is actually more commonly implemented than any other algorithm in CS course

▲

ww520

6 months ago

[-]

I didn’t take part in advent code. Topological sorting is a really old algorithm. Anything dealing with dependence would need it, like makefile.

▲

emmanueloga_

6 months ago

[-]

For those that have not implemented toposort or don't remember it: 1) only directed graphs without cycles can be topologically sorted (DAGs) 2) there can be more than one topological order and 2) reverse post order of depth first traversal from every unvisited node shields a topological order.

In JavaScript:

    function toposort(graph = { a: ['b', 'c'], b: ['d'] }) {
      const order = [];
      const visited = new Map();
      
      function dfs(node) {
        const status = visited.get(node);

        if (status === 1) throw new Error("Cycle found.");
        if (status === 2) return;

        visited.set(node, 1); // In progress.
        const adjacent = graph[node] ?? [];
        for (const neighbor of adjacent) dfs(neighbor);
        visited.set(node, 2); // Done.

        order.unshift(node); // Reverse post order.
      }

      for (const node in graph) {
        if (!visited.has(node)) dfs(node);
      }

      return order;
    }

▲

chubot

6 months ago

[-]

Hm this reminds me that the Python stdlib has grown a module for topological sorting fo a graph:

https://docs.python.org/3/library/graphlib.html

I haven't used it yet, I'd be curious if anyone has

▲

nerdponx

6 months ago

[-]

I used it to build a (now permanently unfinished) lightweight DAG runner that uses only the Python standard library, with the intention that you can just copy the .py file into a project and use it without installing anything other than Python on your system. I think it might be of niche use to some people, but I personally wasn't even dogfooding it, I was just scratching an itch.

The purpose of adding it to the standard library was to implement linear MRO for classes: https://bugs.python.org/issue17005

▲

jessekv

6 months ago

[-]

I built one too! I started with networkx but migrated to graphlib with python 3.9.

I run it in production for many customers. It's great to have this in the std lib.

▲

paulddraper

6 months ago

[-]

It’s a common algorithmic need.

Not as common as array sort. But still common.

▲

tediousgraffit1

6 months ago

[-]

I really like this, this is the perfect size project for exploring a new piece of tech. I especially like that you implemented an actual cli and not just tests.

▲

ww520

6 months ago

[-]

Yes. The CLI is important. It's great for running tests and driving development. It also forces me to dog-food my own library to think in the shoes of the users of the library. In fact a number of features were added due to shortcomings exposed while implementing the CLI.

▲

duped

6 months ago

[-]

> Generating dependence-free subsets for parallel processing.

Unsure how this is defined (*) but graph cutting approaches to concurrent task scheduling is both pessimistic (poor utilization of available resources) and (iirc) NP-hard, so you pay an big cost upfront.

On the other hand, if you know the indegree/outdegree of each node at the time they are visited (meaning the graph is static) you can run Kahn's algorithm concurrently and put each node into a shared work queue. You can optimize that further by having per-thread work queues and stealing between them. Depending on what the nodes represent there are even more optimizations and heuristics, concurrent task scheduling is a hard problem.

* imagine the graph

(a, b) (a, c) (b, d) (c, d)

Is it possible to get nodes b and c in parallel "subsets" in your library?

▲

ww520

6 months ago

[-]

Yes. It would produce dependence-free subsets. I just ran your sample (assuming a,b means a depends on b).

  Topologically sorted sets: [ { d }  { b c }  { a }  ]
  Topologically sorted list: [ d b c a ]
  Nodes: [ a b c d ]
  Dependency tree:
  [ d ]
    d -> [ b c ]
      b -> [ a ]
        a ->
      c -> [ a ]
        a ->

The dependence-free subset finding is probably not exhausting and optimal. I haven't gone through formal proofing. It's opportunistic and best effort at best currently.

▲

duped

6 months ago

[-]

How are the subsets defined?

▲

ww520

6 months ago

[-]

At every round of the algorithm, all nodes with 0 in-degree (i.e. they are not depending on anyone) are collected as a dependence-free subset.

They serve as the root set to the rest of the graph for the current round. The depending nodes reached from root set have their in-degree decremented. When their in-degrees reach 0, they are added to the next root set.

I'm using double-buffering to maintain the current root set for processing and to collect the next root set for the next round, instead of using a queue as in Kahn's algorithm. At the end of the round, I simply swap the double-buffers. It's very efficient. When the next root set is empty, all nodes have been processed.

▲

duped

6 months ago

[-]

Do you define "in degree" as the number of incoming edges from nodes that have not been visited/sorted yet?

I believe what you've implemented is equivalent to Kahn's algorithm. Your double buffer is the queue.

▲

ww520

6 months ago

[-]

It's a variant to the Kahn's algorithm. Here's the description in the comment on the function.

  // This is a variant to the Kahn's algorithm, with additions on
  // finding dependence-free subsets and finding the cyclic nodes.
  //
  // This algorithm iteratively finds out the root sets of the graph.
  // 1. Find the first root set of the graph.
  // 2. Remove the nodes of the root set from the graph.
  // 3. Find the next root set. Go to 2 until the graph is empty.
  // A root set consists of nodes depending on no other nodes, i.e.
  // nodes whose incoming link count is 0.

The successive root sets form a topological order.

The in-degree of a node is the count of the incoming links of the leading nodes that the node is depending on. The in-degree can change as other nodes are removed from the graph.

Kahn's algorithm uses a queue to hold the pending nodes and works on one node at a time. My insight is that it's simpler to treat the root nodes as sets, and to partition the root nodes into the current root set and the next root set, which work nicely with the double-buffer mechanism. As a benefit, the nodes in a root set are dependence-free with regard to the current round and can be used for parallel processing.

▲

hbbio

6 months ago

[-]

Congrats! I like Zig a lot even though I never implemented a full project with it.

FWIW we had to build a related lib in TypeScript: https://github.com/okcontract/graph as part of the runtime of https://github.com/okcontract/cells

▲

ww520

6 months ago

[-]

Nice! I like how it generates DOT data. May be I'll add support for it in the future.

▲

sharmasachin98

6 months ago

[-]

Curious: have you benchmarked it against any similar tools in other languages (like Go or Rust) for comparison?

▲

ww520

6 months ago

[-]

No. I haven't got the chance to benchmark against others. When I ran the benchmarks in the library, I could process 1 million pairs of dependency in 10's of milliseconds, in my 5-year old laptop.

  Testing 1000000 items, 1-to-1 chaining dependency.
    Add dependency 1000000 items. Time: 93ms, 10645885 items/s, 93 ns/item
              Sort 1000000 items. Time: 113ms, 8795778 items/s, 113 ns/item

  Testing 1000000 items, 1-to-4 chaining dependency.
    Add dependency 1000000 items. Time: 87ms, 11428323 items/s, 87 ns/item
              Sort 1000000 items. Time: 44ms, 22508986 items/s, 44 ns/item

  Testing 1000000 items, 1-to-10 chaining dependency.
    Add dependency 1000000 items. Time: 102ms, 9748793 items/s, 102 ns/item
              Sort 1000000 items. Time: 31ms, 31707077 items/s, 31 ns/item

  Testing 1000000 items, 1-to-10 chaining dependency, with max_range set.
    Add dependency 1000000 items. Time: 25ms, 39460028 items/s, 25 ns/item
              Sort 1000000 items. Time: 31ms, 31633556 items/s, 31 ns/item

▲

jedisct1

6 months ago

[-]

Good job! This is really cool!