I'm building a clarity-first language (compiles to C++)
35 points
4 days ago
| 17 comments
| github.com
| HN
HendrikHensen
3 hours ago
[-]
If this is to be a real, (relatively) widely-used language, I would make some tough choices on where to innovate, and where to just leave things the same.

One thing I noticed in the example is `num target`, especially because the focus is on "clarity". When I read the example, I was sure that `num` would be something like the JavaScript `Number` type. But to my surprise, it's just a 64-bit integer.

For an extremely long time, languages have had "int", "integer", "int64", and similar. If you aim for clarity, I would strongly advise to just keep those names and don't try to invent new words for them just because. Both because of familiarity (most programmers coming to your language will already be familiar other languages which have "int(eger)"), and because of clarity ("int(eger)" is unambiguous, it is a well defined term to mean a round number; "num" is ambiguous and "number" can mean any type of number, e.g. integer, decimal, imaginary, complex, etc).

The most clear are when the data types are fully explicit, eg. `int64` (signed), `uint64` (unsigned), `int32`, etc.

reply
hedayet
2 hours ago
[-]
[author here] That’s a good point. I can see why int might be clearer than num, especially given the long history of that naming. I’ll think about it.
reply
Nevermark
2 hours ago
[-]
Definitely int for signed numbers. But I would call it "int64".

Clarity means saying what you mean. The typename int64 could not be clearer that you are getting 64 bits.

This is consistent with your (num32 -->) "int32".

And it would remain consistent if you later add smaller or larger integers.

This also fits your philosophy of letting the developer decide and getting out of their way. I.e. don't use naming to somehow shoehorn in the "standard" int size. Even if you would often be right. Make/let the developer make a conscious decision.

Later, "int" could be a big integer, with no bit limit. Or the name will be available for someone else to create that.

I do like your approach.

(For unsigned, I would call them "nat32", "nat64", if you ever go there. I.e. unsigned int is actually an oxymoron. A sign is what defines an integer. Natural numbers are the unsigned ones. This would be a case of using the standard math term for its standard meaning, instead of the odd historical accident found in C. Math is more universal, has more lasting and careful terminology - befitting universal clarity. I am not a fan of new names for things for specialized contexts. It just adds confusion or distance between branches of knowledge for no reason. Just a thought.)

reply
jiggawatts
1 hour ago
[-]
I would recommend outright copying Rust.

Among other things, it's a systems programming language and hence its naming scheme is largely (if not entirely) compatible with modern C++ types.

I.e.:

    +----------------+-------------------------+------------------------------+
    | Rust           | Modern C++              | Notes                        |
    +----------------+-------------------------+------------------------------+
    | i8             | std::int8_t             | exact 8-bit signed           |
    | u8             | std::uint8_t            | exact 8-bit unsigned         |
    | i16            | std::int16_t            | exact 16-bit signed          |
    | u16            | std::uint16_t           | exact 16-bit unsigned        |
    | i32            | std::int32_t            | exact 32-bit signed          |
    | u32            | std::uint32_t           | exact 32-bit unsigned        |
    | i64            | std::int64_t            | exact 64-bit signed          |
    | u64            | std::uint64_t           | exact 64-bit unsigned        |
    | i128           | (no standard type)      | GCC/Clang: __int128          |
    | u128           | (no standard type)      | GCC/Clang: unsigned __int128 |
    | isize          | std::intptr_t           | pointer-sized signed         |
    | usize          | std::uintptr_t          | pointer-sized unsigned       |
    | f32            | float                   | IEEE-754 single precision    |
    | f64            | double                  | IEEE-754 double precision    |
    | bool           | bool                    | same semantics               |
    | char           | char32_t                | Unicode scalar value         |
    +----------------+-------------------------+------------------------------+
reply
amluto
5 hours ago
[-]
> 3. Values follow a strict rule: primitives pass by value, containers pass by read-only reference. This prevents accidental aliasing/mutation across scopes and keeps ownership implicit but predictable.

There are plenty of languages where functions cannot mutate their parameters or anything their parameters reference — Haskell is one example. But these languages tend to have the ability to (reasonably) efficiently make copies of most of a data structure so that you can, for example, take a list as a parameter and return that list with one element changed. These are called persistent data structures.

Are you planning to add this as a first-class feature? This might be complex to implement efficiently on top of C++’s object model — there’s usually a very specialized GC involved.

reply
hedayet
4 hours ago
[-]
[author here] ROX avoids implicit structural sharing and persistent data structures. Allocation and mutation are explicit - if I want a modified container, I construct one.

This is intentionally more resource-intensive. ROX trades some efficiency for simplicity and predictability.

The goal is clarity of logic and clarity of behavior, even at slightly higher cost. And future optimizations should preserve that model rather than hide it.

reply
pjmlp
1 hour ago
[-]
Brainstorming a bit, you could get into that via hazardous or deferred pointers, but yeah I guess it falls down into specialized GC kind of solution.
reply
cyber_kinetist
4 hours ago
[-]
I've seen some C++ libraries that implement persistent data structures like immer (https://github.com/arximboldi/immer) - but seems it requires the use of the Boehm GC (which is notorious to be slow, since it is a conservative GC and cannot exploit any of the specific semantics/runtime characteristics of the language you're making).
reply
eager_learner
5 hours ago
[-]
Comments like amluto's above, are the reason my time spent on HN is not wasted.
reply
paulddraper
4 hours ago
[-]
I don’t see the relevance of special GC.

But yes you need immutable data structures designed for amortized efficient copy.

reply
nynx
4 hours ago
[-]
This is an interesting line in the readme:

> The language forces clarity — not ceremony.

I find this statement curious because a language, like this, without ability to build abstractions forces exactly the opposite.

reply
saghm
3 hours ago
[-]
Yeah, this seems to be a common thing nowadays, although often with the value cited as "simplicity". I've always found it a bit odd because it seems to me like there are tradeoffs where making things at one level of granularity more clear or simple (or whatever you want to call it) will come at the cost of making things less clear and simple if you zoom in or out a bit at what the code is doing. Assembly is more "clear" in terms of what the processor is doing, but it makes the overall control flow and logic of a program less clear than a higher level language. Explicitly defining when memory is allocated and freed makes the performance characteristics of a program more clear, but it's "ceremony" compared to a garbage collected language that doesn't require manually handling that by default.

I think my fundamental issue with this sort of prioritization is that I think that there's a lot of value in being able to jump between different mental models of a program, and whether something is clear or absolutely ridden with "ceremony" can be drastically different depending on those models. By optimizing for exactly one model, you're making programs written in that language harder to think about in pretty much every other model while quickly hitting diminishing returns on how useful it is to try to make that one level of granularity even more clear. This is especially problematic when trying to debug or optimize programs after the initial work to write them is complete; having it be super clear what each individual line of code is doing isolation might not be enough to help me ensure that my overall architecture isn't flawed, and similarly having a bunch of great high-level abstractions won't necessarily help me notice bugs that can live entirely in one line of code.

I don't think these are specific use cases that a language can just consider to be outside of the scope in the same way they might choose not to support systems programming or DSLs or whatever; programmers need to be able to translate the ideas of how the program works into code and then diff between them to identify issues at both a macro and micro level regardless of what types of programs they're working on.

reply
hedayet
3 hours ago
[-]
[author here] That’s a very good point - "not ceremony" was poorly phrased.

ROX does introduce more explicitness, which indeed introduces more ceremony. The goal isn’t to reduce keystrokes; it’s to reduce hidden behaviour.

A better framing would be: ROX prioritizes clarity over convenience. Explicitness may cost more keystrokes, but it eliminates hidden behavior. [README updated]

reply
raverbashing
1 hour ago
[-]
Yup exactly this

It's the "C is a simple language" BS again

Using a circular sawblade without the saw is as simple as it gets as well

The simpler it is the more you get annoyed at it, the more it is easier to shoot yourself in the foot with it, because the world is not perfect

Abstractions are great and I'm dying on this hill

"getError" what year is it again?

reply
norman784
1 hour ago
[-]
Most of what the section "Why ROX exists" reminds me of Rust and Zig, where both are more explicit (but Zig even more where there aren't hidden allocations, while Rust hides it).

Said that I really miss all the i{8|16|32|64|128|size}, u{8|16|32|64|128|size} and f{32|64} in other languages, I think having num and num32 is a mistake (IMHO) and naming like them like Rust/Zig provides more clarity (and it's also more concise).

For the "repeat" keyword I find it odd, because most other languages uses "for" but I can understand the reason and might be able to get used to it.

Otherwise I find always interesting new programming languages, what new they bring to the table and the lessons learned from other PLs.

reply
ivanjermakov
1 hour ago
[-]
> In ROX: <...> Nothing implicit happens behind your back.

> You write the logic. The language stays out of the way.

Writing business logic and everything being explicit are polar opposite. For the programming language to stay out of the way it should more resemble concise version on English with little to no language constructs.

reply
sesm
5 hours ago
[-]
> Lists are accessed only via .at()

If clarity is the goal, then data structures that support access by index should be called `arrays` or `vectors`

reply
hedayet
4 hours ago
[-]
[author here] What @joshuamorton said + my rationale was - for natural language users too, a "list" should be more intuitive than `array` or `vector`

I'm more than happy to be corrected though.

reply
anon291
3 hours ago
[-]
The idea of being inspired by natural language is completely at odds with also desiring clarity first
reply
rkeene2
4 hours ago
[-]
Also why num/num32 for Integer types, and no floating point type
reply
hedayet
4 hours ago
[-]
[author here] Very good questions; I definitely would like to revisit num32 very shortly. I'd say the initial rational of having num32 is not coherent right now, but I'll have to verify removing the support.

we have floating point type(It was missing from the type list in readme. I have just updated that seeing this comment. thank you!)

reply
rurban
43 minutes ago
[-]
There cannot be any num32. num is a number, which can be fixed size integers, floating point numbers (of fixed size or not) or bigints. Some also add decimals

num32 being i32 or f32 makes no sense

reply
teiferer
3 hours ago
[-]
Well, clarity would be achieved with a name like u64. Is num signed? What's the range? Is it integers or floating point? All these things are hidden. With u64 there would be no questions open. (Well a few maybe, like overflow behavior, but can't have it all..)
reply
joshuamorton
4 hours ago
[-]
This is very language dependent. People coming from python or Java would call them lists.

Vectors are a mathematical concept unless you use c++.

reply
vlovich123
3 hours ago
[-]
In Java it’s called Vector / list refers to linked list. Python doesn’t have a linked list type so it’s kinda irrelevant. But also not every language has to be Algol centric even though Algol has largely dominated the design space of popular languages due to familiarity.
reply
brabel
2 hours ago
[-]
> In Java it’s called Vector / list refers to linked list

What?!! No! Vector is almost never used in Java code. When you need index-based access, ArrayList is the much more common one, and it does implement List. So I would agree with parent commenter that List is the equivalent in Java.

A List in Java is a container that allows iterating over items efficiently, but does not necessarily provide efficient random access: https://docs.oracle.com/javase/8/docs/api/java/util/List.htm...

If you care about why Vector is nearly never used: it is synchronized by default, making it slower and more complex than ArrayList. Most Java programmers would prefer to implement synchronization themselves in case multi-threading is required since it nearly always involves having to synchronize multiple list operations at the same time, which cannot be done with Vector.

It's the same reason no one uses StringBuffer, but StringBuilder.

reply
bsaul
1 hour ago
[-]
sidenote : are we going to still see new languages appear after AI becomes the one that writes the code ?

I'd say that for a new language to appear in that new world, it would need to offer new compile-time properties that AI could benefit from. Something like expressing general program properties / invariants that the compiler could check and the AI could iterate on.

reply
Knork-and-Fife
3 hours ago
[-]
If the only loop is `repeat i in range(start, end, step)` , how do you do a loop like "Keep reading from a buffer until it's empty"? I.e. any loop when you can't know the number of iterations needed when the loop starts?
reply
hedayet
2 hours ago
[-]
yes, support for unbounded loop is definitely something on my roadmap towards v1.
reply
skottenborg
3 hours ago
[-]
Though it may be somewhat annoying, I think forcing named parameters would be a big win for clarity. Is this something you have considered?
reply
ad_hockey
3 hours ago
[-]
This looks interesting! As a Go user I definitely see the value in boring but predictable languages. Does Rox have any support for concurrency?
reply
leecommamichael
4 hours ago
[-]
I’d be curious to hear the author’s thoughts on Odin. Odin seems to have meet many of the same goals as ROX. I am not implying the author shouldn’t keep going with their language.
reply
dusanstanojevic
4 days ago
[-]
Very interesting, I've read your readme and your core principles really resonate with me. How is memory managed?
reply
hedayet
4 days ago
[-]
Great question - we keep memory management intentionally simple.

1. There’s no manual memory management exposed at the language level (no pointers, no allocation APIs). I intend to keep it this way as long as possible.

2. Containers (list[T], dictionary[K,V]) compile directly to C++ STL types (std::vector, std::unordered_map).

3. Values follow a strict rule: primitives pass by value, containers pass by read-only reference. This prevents accidental aliasing/mutation across scopes and keeps ownership implicit but predictable.

Anything created in a scope is destroyed when that scope ends (standard C++ RAII). So in practice, memory management in Rox is C++ lifetime semantics underneath, but with a stricter surface language to reduce accidental complexity.

reply
teraflop
5 hours ago
[-]
That sounds like it's basically impossible to implement your own non-trivial data structures. You can only use the ones that are already in the standard library.

For instance, how would you represent a binary tree? What would the type of a node be? How would I write an "insert node" function, which requires that the newly-created node continues to exist after the function returns?

I'm not necessarily saying that this makes your language bad, but it seems to me that the scope of things that can be implemented is much much smaller than C++.

reply
anon291
3 hours ago
[-]
The best imperative language is Haskell do notation which offers everything you support here.
reply
OsrsNeedsf2P
5 hours ago
[-]
This is great. I look forwards to more "strict" languages whose deterministic compilers will give LLMs a tight feedback loop to catch bugs.
reply
Panzerschrek
5 hours ago
[-]
Does it have destructors?
reply
hedayet
4 hours ago
[-]
[author here] as of today - no. I'm super keen to keep the concept of destructors and GC hidden from this language interface.
reply
globalnode
1 hour ago
[-]
one thing ive always wondered about these type of projects is how do you debug them? gdb at runtime? printf statements? i mean when im debugging python i mainly rely on print() and log files so i guess that would work. its been a long time since i used an ide to step through a program anyway, i think way back with borland turbo c/c++ i used to step through statements to see how it all worked but things are much too complex for that now.
reply
tovej
3 hours ago
[-]
You have to unwrap every array access? That does not feel clear to me at all. Also this would make every hot loop slower.

The amount of safety features here seems excessive. The language is stricter than Rust. It's not very "clear" either. For some reason the author has decided to rename concepts that are familiar to programmers, making it more didficult to switch to for experienced programmers (repeat instead of for, num instead of... float I assume?), but the langauge isn't really beginner-friendly either, due to the strict semantics.

This feels like vibe-coded slop. Why is this on the front page? HN has fallen off.

reply