FilterHN

10 months ago

[-]

The problem with this article, though it presents model checking and state space exploration well, is that it doesn't present weak memory models[1], which are the reality on all multiprocessor/multicore systems these days. Thus is perpetuates misconceptions about concurrent programming. This is a poor example of model checking because it assumes sequential consistency and actually confuses people by reinforcing incorrect assumptions.

[1] With weak memory hardware, it is effectively impossible to write correct programs without using at least one of atomic read-modify-write or barriers, because both compilers and hardware can reorder both reads and writes, to a maddening degree.

(I previously posted this comment as a reply to a comment that got flagged.)

lala_lala

10 months ago

[-]

As I read it, it felt more like an exploration from state space stand point (and not an example of model checking) which to me sounded quite reasonable. Unusual and intuitive I'd say.

fn-mote

10 months ago

[-]

The author does start talking about model checking in the third paragraph and go on using "SPIN", so there's a significant part that is interested in model checking, anyway.

I can see where the parent is coming from.

I think you can both be right - it can be valuable in any case.

10 months ago

[-]

> Thus is perpetuates misconceptions about concurrent programming ... it assumes sequential consistency and actually confuses people by reinforcing incorrect assumptions.

That's not quite true. The article explicitly mentions problems with interleaving of instructions between two processes (aka threads) which is the fundamental problem of concurrency. If you consider only a single thread in isolation its program order is always maintained even though compiler optimizations and hardware OOO execution actually execute the instructions in data order.

Consistency Model - https://en.wikipedia.org/wiki/Consistency_model

10 months ago

[-]

> The article explicitly mentions problems with interleaving of instructions between two processes (aka threads) which is the fundamental problem of concurrency.

Again, with weak memory models, writes from another thread can be observed out of order due to hardware reordering, even without compiler optimizations. With weak memory models, there can be behaviors that correspond to no valid interleaving of two threads.

I am explicitly pointing out that what you are assuming, along with the author, what is often called sequential consistency (https://en.wikipedia.org/wiki/Sequential_consistency) hasn't been true of hardware for decades. It's a common misconception that most systems obey sequential consistency. Your comment just seems to repeat that again, and it isn't true. See here (https://preshing.com/20120930/weak-vs-strong-memory-models/) for a deeper explanation.

10 months ago

[-]

I used the phrase "single thread in isolation" to simply point out that parallelization is going on underneath (via optimization/OOO/superscaler) for that thread even in a single-threaded scenario but not that it always implies sequential consistency in a multi-threaded scenario. That was the thing i disagreed with in your comment where you said the author assumed sequential consistency only which is not what i got. In my linked to previous comment i explicitly said this;

Once you start thinking about a program as a sequence of loads/stores (i.e. reads/writes to shared memory) and note that Pipelining/OOO/Superscalar are techniques to parallelize these (and other) instructions for a single thread of control you start getting an idea of how sequential order can be preserved though the actual execution is not quite so.

threatofrain

10 months ago

[-]

IMO some of the confusion is in conflating execution with the natural properties of a program. Concurrency is the property of a program or algorithm, and this property allows you to execute the program with partial or complete re-ordering (which is the peak of what some people call embarrassingly parallel). How you wish to execute on this property is up to you.

10 months ago

[-]

Concurrency should always be studied from higher-level logical program/algorithm execution properties with a focus on both shared-memory and message-passing architectures. Only after the above should one even look at lower-level hardware memory consistency model, cache coherence model, atomic instructions, memory barriers etc. For example, learning to use a logical shared-memory "Mutex" is far easier than the fact that it is internally implemented using atomic instructions, memory barriers etc.

Some relevant details in a previous comment of mine - https://news.ycombinator.com/item?id=42422867

See also this book - https://news.ycombinator.com/item?id=42472334

10 months ago

[-]

> Concurrency should always be studied

I disagree. If we want people to understand, then we should teach computer systems bottom-up and thus understand the bottom-level behaviors before being surprised that our bad (i.e. wrong) abstractions are leaking later. If you want to train programmers to accomplish parallel programming tasks via plugging together nice abstractions, that's all well and good, but that's a completely different kind of instruction than trying to teach them how things work. If we only do the latter then we'll never have experts again.

10 months ago

[-]

In my previous comment i linked to an earlier comment of mine which already pointed it out, so this is not revelatory. W.r.t. concurrency however, the logical approach should precede lower-level implementation approach since there is lots more complexity involved in the latter. Mutexes/Semaphores/Condition Variables etc. are not bad/wrong abstractions but necessary correct ones. They are fundamental to understanding/managing concurrency itself.

vacuity

10 months ago

[-]

The issue is that logically correct algorithms can indeed be made, but actually making them work in real conditions is its own problem. It's not necessarily wrong to explore more exotic algorithms, but a focus on algorithms that people should actually use in their given situations seems higher priority. We should be working on bringing useful concurrency to our programs right now.

And while I will agree that mutexes and the like are useful in their own right, they are not the most fundamental for understanding concurrency.

cogman10

10 months ago

[-]

> If you consider only a single thread in isolation its program order is always maintained even though compiler optimizations and hardware OOO execution actually execute the instructions in data order.

It's not in most programming languages. That's actually a big part of the linked Wikipedia article is when and how program order is allowed to be modified.

A great portion of the JVM memory model definition is dedicated to laying out exactly when and how read/write barriers are established and when/how ordering is enforced. Without any sort of barrier (in most cases) the JVM is fairly free to completely reorder program execution.

10 months ago

[-]

Not what i meant - https://news.ycombinator.com/item?id=42474751

cjfd

10 months ago

[-]

These kinds of analyses are all nice and funny and so on but the issue here is that on real computers it does not quite work like this. One thing is that an optimizer may change the order in which statements are executed and then all guarantees go out of the window. Another is when writing to main memory the hardware may also reorder writes. So the whole process is filled with gotchas on every level. What one should remember is that if multiple threads do things with the same memory location where one of the 'things' that are done is writing, it is always wrong. The way to fix it then is to use the appropriate protection. E.g., use a mutex or use an atomic variable.

[1] https://github.com/amilajack/reading/blob/master/Computer_Sc...

vishnugupta

10 months ago

[-]

Heh this reminds me of a really good book, The Art of Multiprocessor Programming [1]. It has two parts, the first one goes quite a hit into the theory. And the second part begins with a real example where the theory doesn’t hold and it’s aptly titled “Welcome to The Real World”.

ignoramous

10 months ago

[-]

I find Andrew McFadden's Symmetric Multi-Processor Primer for Android (https://developer.android.com/training/articles/smp) a good warm-up to the book.

kolektiv

10 months ago

[-]

Is there a deadlocked thread that should have rendered the 'n'?

pphysch

10 months ago

[-]

> If you’re implementing without formally verifying your solution through model checking, you only think you’re implementing it correctly.

This is a comforting idea, but enumerating every possible state of a real-world (integrated/distributed) software system is a fool's task.

Spend less time on formal verification (fancy word for perfectionism) and more time on risk analysis and cybernetics. Instead of praying that nothing ever goes wrong, plan for faults and design systems that integrate human and machine to respond rapidly and effectively.

"Correctable" is a much more practical target than "always correct".

https://divan.dev/posts/go_concurrency_visualize/

divan

10 months ago

[-]

A bit more visual guide:

Visualizing Concurrency in Go (2016)

10 months ago

[-]

Therein lies SIMDs advantage.

The instruction pointer is all synchronized, providing you with fewer states to reason about.

Then GPUs mess that up by letting us run blocks/thread groups independently, but now GPUs have highly efficient barrier instructions that line everyone back up.

It turns out that SIMDs innate assurances of instruction synchronization at the SIMD lane level is why warp based coding / wavefront coding is so efficient though, as none of those barriers are necessary anymore.

CyberDildonics

10 months ago

[-]

Optimizing isn't either SIMD or large scale multi-threading. You need both, which is why CPUs and GPUs both use both techniques.

10 months ago

[-]

Fair enough.

GPUs in particular have a very hyperthread/SMT like model where multiple true threads (aka instruction pointers) are juggled while waiting for RAM to respond.

Still, the intermediate organizational step where SIMD gives you a simpler form of parallelism is underrated and understudied IMO.

10 months ago

[-]

SIMD and thread-level parallelism solve completely different problems.

10 months ago

[-]

Ish.

We use threads to solve all kinds of things, including 'More Compute'.

SIMD is limited to 'More Compute' (unable to process I/O like sockets concurrently, or other such thread patterns). But as it turns out, more compute is a problem that many programmers are still interested in.

Similarly, you can use Async patterns for the I/O problem (which seems to be more efficient anyway than threads).

--------

So when we think about a 2024 style program, you'd have SIMD for compute limited problems (Neural Nets, Matricies, Raytracing). Then Async for Sockets, I/O, etc. etc.

Which puts traditional threads in this weird jack of trades position: not as good as SIMD methods for raw compute. Not as good as Async for I/O. But threads do both.

Fortunately, there seem to be problems with both a lot of I/O and a lot of compute involved simultaneously.

10 months ago

[-]

It's not just I/O, it's data pipelining. Threads can be used to do a lot of different kinds of compute in parallel. For example, one could pipeline a multi-step computation, like a compiler: make one thread for parsing, one for typechecking, one for optimizing, and one for codegening, and then have function move as work packages between threads. Or, one could have many threads doing each stage in serial for different functions in parallel. Threads give programmers the flexibility to do a wide variety of parallel processing (and sometimes even get it right).

IMHO the jury is still out on whether async I/O is worth it, either in terms of performance or the potential complexity that applications might incur in trying to do it via callback hell. Many programmers find synchronous I/O to be a really, really intuitive programming model, and the lowest levels of the software stack (i.e. syscalls) are almost always synchronous.

[0] https://news.ycombinator.com/item?id=42221316

vacuity

10 months ago

[-]

The ability to directly program for asynchronous phenomena is definitely worth it[0]. Something like scheduler activations, which imbues this into the threading interface, is just better than either construct without the other. The main downside is complexity; I think we will continuously improve on this but it will always be more complex than the inevitably-less-nimble synchronous version. Still, we got io_uring for a reason.

10 months ago

[-]

Fair. It's not like GPUs are entirely SIMD (and as I said in a sibling post, I agree that GPUs have substantial traditional threads involved).

-------

But let's zoom into Raytracing for a minute. Intel's Raytrace (and indeed, the DirectX model of Raytracing) is for Ray Dispatches to be consolidated in rather intricate ways.

Intel will literally move the stack between SIMD lanes, consolidating rays into shared misses and shared hits (to minimize branch divergence).

There's some new techniques being presented here in today's SIMD models that cannot easily be described by the traditional threading models.

tansan

10 months ago

[-]

Better title:

> How concurrecy works: A text guide

suprfsat

10 months ago

[-]

> concurrecy

10 months ago

[-]

Coincidentally, i am currently going through Distributed Algorithms: An Intuitive Approach (second edition) by Wan Fokkink. It seems really good and is a catalog of classic algorithms for both Shared-Memory and Message-Passing architectures. It is a rather slim book with explanations and precise mathematical notation. Pseudo-code for a subset of these algorithms are given in the appendix.

The 1st edition divides the algorithms under two broad sections viz; "Message Passing" (eg. chapter Mutual Exclusion) and "Shared Memory" (eg. chapter Mutual Exclusion II) while the 2nd edition removes these section headings and simply groups under functional categories (eg. both the above under one chapter Mutual Exclusion).

Book website - https://mitpress.mit.edu/9780262037662/distributed-algorithm...

selimthegrim

10 months ago

[-]

Thank you for the book recommendations.

tapland

10 months ago

[-]

Is text written an in an image field more visual than other text?

jazzyjackson

10 months ago

[-]

I don't know, are arrows textual?

tapland

9 months ago

[-]

Yes <- No

7bit

10 months ago

[-]

This is a text book, not a visual guide. Visual does not mean text in images, which this is.

drewcoo

10 months ago

[-]

This again? Didn't expect that.

neonsunset

10 months ago

[-]

space state The really visual of explosion is demonstration nice!

rubee64

10 months ago

[-]

*concurrency

charlieflowers

10 months ago

[-]

The initial comments here surprise me. I’ve never seen such poor quality comments on such a good article on Hacker News. The article is top notch, highly recommend. It explains from first principles very clearly what the (a?) mechanism is behind model checking.

So is the ability to compute the entire state space and prove that liveness and/or safety properties hold the single main basis of model checker’s effectiveness? Or are there other fundamental capabilities as well?

10 months ago

[-]

The problem with this article is that it doesn't present weak memory models[1], which are the reality on all multiprocessor/multicore systems these days. Thus is perpetuates misconceptions about concurrent programming. This is a poor example of model checking because it assumes sequential consistency and actually confuses people by reinforcing incorrect assumptions.

[1]: https://github.com/awslabs/aws-lc-verification

spinningslate

10 months ago

[-]

Agree. The remaining comments boil down to:

1. "It's not visual, it's text". Yeah, but: how many "visual" representations have no text? And there _are_ visuals in there: the depictions of state space. They include text (hard to see how they'd be useful without) but aren't solely so.

2. "Meh, verification is for well paid academics, it's not for the real world". First off, I doubt those "academics" are earning more than median sw devs, never mind those in the SV bubble. More importantly: there are well-publicised examples of formal verification being used for real-world code, see e.g. [1].

It's certainly true that verification isn't widespread. It has various barriers, from use of formal maths theory and presentation to the compute load arising from combinatorial explosion of the state space. Even if you don't formally verify, understanding the state space size and non-deterministic path execution of concurrent code is fundamentally important. As Dijkstra said [2]:

> our intellectual powers are rather geared to master static relations and that our powers to visualise processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static process and the dynamic program, to make the correspondence between the program (spread out in space) and the process (spread out in time) as trivial as possible.

He was talking about sequential programming: specifically, motivating the use of structured programming. It's equally applicable to concurrent programming though.

[2]: https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.p...

CyberDildonics

10 months ago

[-]

Yeah, but: how many "visual" representations have no text?

What does this mean and how does it excuse a 'visual' article being all text?

I doubt those "academics" are earning more than median sw devs

Do you think the point might have been more about the pragmatism of formal verification?

cess11

10 months ago

[-]

OK, great, formal verification gives some guarantees, that sounds nice and all. It's just that all the academic math nerds that are fluent in the checking tools are busy getting payed much more than average software developers, and I have a suspicion that they'll get less done per year in terms of features and refactorings of the code that immediately makes the monies flow.

I find the BEAM/OTP process model relatively simple to use, and it moves some of the friction from logic bugs to a throughput problem.

neonsunset

10 months ago

[-]

Such model has usability shortcomings for tightly interleaving concurrent code often encountered in high-performance applications or even in business code which does not want to maintain its worker pool or spool up a new BEAM process every time it needs to send two HTTP requests at the same time.

cess11

10 months ago

[-]

By "at the same time" do you mean that the execution starts at exactly the same time on two adjacent cores, or do you mean that they might as well be in the same queue and done sequentially?

What do you mean by "tightly interleaving"? It sounds like a technical term of some importance but when I web search it seems people are saying it means that the order of execution or data access is arbitrary, which you can let the built-in preemptive multitasking or a queue with a worker pool figure out for you.

In the alternatives you have in mind, how good is the monitoring, observability and IPC tooling? What's their hardware clustering story? I've done IPC in POSIX and that was absolutely dreadful so I hope you're working with something better than that at least.

[1]: https://news.ycombinator.com/item?id=40427935

spinningslate

10 months ago

[-]

> even in business code which does not want to maintain its worker pool or spool up a new BEAM process every time it needs to send two HTTP requests at the same time.

Just... no. First off, I'll bet a vanishingly small number of application developers on BEAM languages do anything to manage their own worker pools. The BEAM does it for them: it's one of its core strengths. You also imply that "spinning up a new BEAM process" has significant overhead, either computationally or cognitively. That, again, is false. Spinning up processes is intrinsic to Erlang/Elixir/Gleam and other BEAM languages. It's encouraged, and the BEAM has been refined over the years to make it fast and reliable. There are mature, robust facilities for doing so - as well as communicating among concurrent processes during their execution and/or retrieving results at termination.

You've made clear before that you believe async/await is a superior concurrency model to the BEAM process-based approach. Or, more specifically, async/await as implemented in .Net [1]. Concurrent programming is still in its infancy, and it's not yet clear which model(s) will win out. Your posts describing .Net's async approach are helpful contributions to the discussion. Statements such as that quoted above are not.

EDIT: fixed typo.

neonsunset

10 months ago

[-]

> You also imply that "spinning up a new BEAM process" has significant overhead, either computationally or cognitively. That, again, is false.

Spawning 1M processes with Elixir is anywhere between ~2.7 and 4 GiB with significant CPU overhead. Alternatives are much more efficient at both: https://hez2010.github.io/async-runtimes-benchmarks-2024/tak... (a revised version).

Go, for example, probably does not want you to spawn the Goroutines that are too short-lived as each one carries at least 2KiB of allocations by default which is not very kind to Go's GC (or any GC or allocator really). So it expects you to be at least somewhat modest at such concurrency patterns.

In Erlang and Elixir the process model implementation follows similar design choices. However, Elixir lets you idiomatically await task completion - its approach to concurrency is much more pleasant to use and less prone to user error. Unfortunately, the per-process cost remains very Goroutine-like and is the highest among concurrency abstractions as implemented by other languages. Unlike Go, it is also very CPU-heavy and when the high amount of processes are getting spawned and exit quickly, it takes quite a bit of overhead for the runtime to keep up.

Sure, BEAM languages are a poor fit for compute-intensive tasks without bridging to NIFs but, at least in my personal experience, the use cases for "task interleaving" are everywhere. They are just naturally missed because usually the languages do not make it terse and/or cheap - you need to go out of your way to dispatch operations concurrently or in parallel, so the vast majority doesn't bother, even after switching to languages where it is easy and encouraged.

(I'm not stating that async/await model in .NET is the superior option, I just think it has good tradeoffs for the convenience and efficiency it brings. Perhaps a more modern platform will come up with inverted approach where async tasks/calls are awaited by default without syntax noise and the cost of doing "fork" (or "go" if you will) and then "join" is the same as with letting the task continue in parallel and then awaiting it, without the downsides of virtual threads in what we have today)