Show HN: Lisp in C#
144 points
1 month ago
| 10 comments
| github.com
| HN
codr7
1 month ago
[-]
Author here.

I'm afraid I've been out of the C# loop too long to know what's fast and what isn't these days.

Now that maybe I have the attention of some serious C# nerds, any assistance in making this thing run faster would be much appreciated.

It's not terrible atm, given a managed host language, but I'm sure there are plenty of knobs left to turn.

See the benchmarks section in the README for more info, and the same benchmarks ported to Python in python/fib.py.

Oh, and there's some undocumented yet potentially useful stuff in /Libs; strings, terminal control and IO mainly.

reply
lloydatkinson
1 month ago
[-]
I haven't had chance to look over the code but I know that using Span for parsing might be a nice thing to try out.

Have you considered making it compile to IL? Or if not that, using the Roslyn API to compile it to C# which is then automatically faster as a result of being compiled. Then getting AOT (ahead-of-time compilation) at build time gives you improved perf basically for free.

I see neon has contributed a perf change, I know that he's an expert at using https://github.com/dotnet/BenchmarkDotNet so hopefully you'll be able to do some scientific tests soon.

I would love to see another language here for the *Common* Language Runtime (CLR, the core of .NET).

reply
WorldMaker
1 month ago
[-]
A fun middle ground to compiling to IL is using System.Linq.Expression<T> as a compiler. This was the middle ground that the "DLR" (Dynamic Language Runtime) used to great effect. System.Dynamic is still in the BCL and still useful, even though the dreams of IronPython and IronRuby and interop between them are sort of dead/dormant. System.Dynamic.DynamicMetaObject is still a fun thing to play with in how much power it gives you do some surprisingly optimized dynamic language things, all using System.Linq.Expression<T> as the higher level intermediate language than raw IL.
reply
codr7
1 month ago
[-]
Interesting, Linq.Expression actually looks doable within my complexity budget.
reply
pjc50
1 month ago
[-]
Bit of a tradeoff there; so long as it's using its own bytecode, the sharpl executable itself can easily be AOT. Once you start trying to create your own assembly at runtime and run that, it's a LOT of work to get the host to still AOT (because you have to include the dotnet runtime anyway to run the inner assembly!)

And AOT isn't a lot of free perf; it's mostly equivalent to JIT, the big advantage is faster and smaller startup.

(I have used the Roslyn compile API; it's pretty cool but you do have to do more setup. e.g. https://joshvarty.com/2016/01/16/learn-roslyn-now-part-16-th... )

reply
buybackoff
1 month ago
[-]
For small short-lived scripts the compilation (in any native form) time may be much bigger than bytecode execution time. And if a platform does not support dynamic code generation the end result is non-functional code or interpreter inside interpreter for LINQ expressions (as they can still run as interpreted, at least they try). See a comment in Jint readme about that.

I tried to compare some script languages implemented in C# with Roslyn script compilation. Same code to sum up 1M doubles. Roslyn takes almost 100ms to compile the code.

This may become especially painful when source code changes on every execution and reusing compilation result is not possible for some reason, e.g. user input or parametrized string template.

reply
codr7
1 month ago
[-]
Sure thing, Span is one of the features I haven't had time to dig into yet.

Parsing is a one time thing though.

Yes, different levels of compiling to C# are definitely on the table as options; for performance, but also to hopefully be able to generate self contained executables.

Even just generating my own bytecode for faster startup.

reply
codr7
1 month ago
[-]
By all means, keep them coming people :)

We just roughly doubled the speed by changing one line of code, that means there's plenty more fun before it gets really tricky.

My issue isn't really profiling, its not knowing enough about the platform to do anything constructive with the hot spots.

reply
mst
1 month ago
[-]
I don't know the platform and would -guess- that this won't actually help, but it's sufficiently fascinating I thought I'd share anyway: https://trout.me.uk/lisp/vlist.pdf (roughly "stitching together efficient cons lists out of arrays")

(and /lisp/ has an index if you want to see the other random lisp-related stuff I've collected copies of so I can find it again later)

reply
codr7
1 month ago
[-]
Interesting, thanks.
reply
neonsunset
1 month ago
[-]
https://github.com/codr7/sharpl/pull/1

Was: 686 98 1195

Now: 226 79 293 (with net9.0 preview: 201 70 269, another release another free >10%)

The reason for such a significant difference is that `ArrayStack<(int, int)>` only implements `IEnumerable<T>`, which prevented the Enumerable.Last(stack) call from seeing that the type has an indexer which can be used to quickly access the last element instead of traversing it in its entirety.

Now, it still requires JIT (or, in this case, compiler back-end and ILC) to reason about the actual type of ArrayStack to optimize away type tests, inline Last() call and devirtualize indexer access, but the better option is simply replacing it with just [^1] which does the same on any indexable type.

Generally speaking, it's recommended to use out of box collections whenever appropriate like Stack, List, etc. which already implement all the necessary interfaces which the standard library takes advantage of unless there's a specific need to do otherwise.

Also, it is always nice to have .net's aot emit "canonical" native binaries, so it took me about 45s to find the bottleneck by bumping up the numbers in benchmarks.sl and clicking "Sample" in macOS's Activity Monitor.

All in all, the code in the project is terse and thanks for showcasing it!

reply
codr7
1 month ago
[-]
Nice work, thanks!

I'll have a look later, the only part I didn't get was [^1], could you elaborate?

About the ArrayStack, I wanted a stack backed by a fixed array, because my suspicion was using a fixed array for fundamental collections would be faster. Hence the VM.Config object to set the max sizes.

There is also a DynamicArrayStack that supports reallocation, which is currently used by OrderedMap.

The plan is to eventually benchmark the alternatives, but start with the simplest thing that could possibly work.

reply
neonsunset
1 month ago
[-]
Here, calling `.Last()` on `ArrayStack<T>` traverses it from the start. `.Last()` is a static extension method defined on `System.Linq.Enumerable` class and has a signature `static T Last<T>(this IEnumerable<T> source)`.

Internally, `.Last()` will try to optimize for the common cases where a type implements `IList<T>` and uses an indexer to simply get the last element. However, because ArrayStack<T> does not implement IList<T>, .Last() does not know that this is possible, therefore costs O(n) as noted above.

Instead, we can simply use an index operator `[^1]` which gets the first element from end, which is short-hand for `[stack.Count - 1]`.

Other than that, it’s a good idea to lean towards out-of-box tools to avoid investing effort into reinventing another language within C# and use spans for slicing data types - you almost never need to call methods like Array.ConstrainedCopy - this is something quite ancient. The idiomatic way of copying a portion of array today is `source.AsSpan(start, length).CopyTo(dest)`, slicing destination as well if you need so. The prime slice types in .NET are Span<T> and ReadOnlySpan<T>, and can wrap memory of any origin.

reply
codr7
1 month ago
[-]
Just the kind of information/expertise I was looking for, awesome!
reply
zebracanevra
1 month ago
[-]
And you can find that specilisation here: https://source.dot.net/#System.Linq/System/Linq/Last.cs,80
reply
Capricorn2481
1 month ago
[-]
Why does the implementation of .Last not just check if .Count? It seems like there are things that don't implement IList but can still be indexed by the last entry?
reply
neonsunset
1 month ago
[-]
In a strongly typed compiled language, how would you do so (in a type-safe and performant way) only knowing that the type is some IEnumerable<T> implementation and not a particular shape that may or may not adhere to the `T this[int i]` and `.Count //of T` contract?
reply
Capricorn2481
1 month ago
[-]
Is using reflection for a quick property check that much of a performance hit? After all, avoiding it leads to footguns like this where someone didn't realize they were traversing an entire array.

I'm not well versed on this topic.

reply
neonsunset
1 month ago
[-]
Thank you too!
reply
adzm
1 month ago
[-]
Alternatively ArrayStack could be modified to implement IList right?
reply
codr7
1 month ago
[-]
Right, already tried that; Last() is still slightly slower (I guess because of more layers of indirection).
reply
diggan
1 month ago
[-]
Interesting .gitignore addition. As someone who basically never writes C# (besides for some mods for games sometimes), is that the default ephemeral files that Visual Studio Code/some other tool writes when dealing with C# projects? Seems like an awful amount of trash if that's the case.
reply
neonsunset
1 month ago
[-]
As description in PR indicates, it’s just a default catch-all gitignore that you can add with ‘dotnet new gitignore’ that covers all kinds of tools and build artifacts, I see no reason to customize it.

For large amounts of trash in project you would need to look for other languages, like JS ecosystem or certain Java-related projects :)

reply
diggan
1 month ago
[-]
> As description in PR indicates, it’s just a default catch-all gitignore that you can add with ‘dotnet new gitignore’ that covers all kinds of tools and build artifacts, I see no reason to customize it.

Ah, someone should just come up with a huge file so we can ignore every possible combination at this point :)

> like JS ecosystem

Heh, JS is probably the language that generates the least trash by default as it's just a index.html file + your single JavaScript file, nothing else required or generated at all, and now you have a interactive website :)

reply
WorldMaker
1 month ago
[-]
GitHub doesn't have it all in one huge file, but has a somewhat comprehensive template repo of many ecosystems: https://github.com/github/gitignore

(This is what is used to populate the templates if you ask GitHub to include a gitignore when creating a new repo, or if you add a new file to a Repo and name it .gitignore and get the template selector to show up.)

I believe `dotnet new gitignore` basically shares the same template, even, as this file: https://github.com/github/gitignore/blob/main/VisualStudio.g...

reply
diggan
1 month ago
[-]
Just seem a bit backwards to start with a gitignore that ignores every possible tool one could use with a particular language, rather than going directory/file/pattern by directory/file/pattern. Hence my proposal for these people to take the contents of the github/gitignore repository, fold it all into one big file they can reuse in all their repos, and call it a day.

Like why is there a `.DS_Store` (which is specific to macOS) in the `core.gitignore` file? That belongs in the global `.gitignore` macOS users should have in their home directory, rather than including it in project specific .gitignore.

But I digress, not exactly a huge problem and I won't lose any sleep over it :)

reply
WorldMaker
1 month ago
[-]
One of the problems in having a single gitignore to ignore all the possible things that there's no strict superset without overlaps. Visual Studio uses files called .user that contain user-specific metadata that should never be committed to source control and other systems might use .user for components that should be committed to source control. Breaking it down by language ecosystem seems an adequate compromise as few repos use multiple languages and when they do there's often a folder hierarchy to respect with nested gitignores.
reply
codr7
1 month ago
[-]
If only we could get a tiny Lisp loving push from the HN crew, wink, nudge.

I don't much like the odds of Lisp vs. Nintendo hardware kickstarters.

reply
incanus77
1 month ago
[-]
I am here for it.
reply
coder94
1 month ago
[-]
There's also Clojure CLR (lisp) https://github.com/clojure/clojure-clr
reply
codr7
1 month ago
[-]
Yeah, I admire Rich Hickey a lot, it took a lot of guts; and I learned a lot from Clojure and his talks; but I unfortunately find the language a bit too opinionated and dogmatic.
reply
davidelettieri
1 month ago
[-]
reply
codr7
1 month ago
[-]
Right, I recognize the name.

Curious though, I'm pretty sure it emits MSIL rather than its own bytecode like sharpl? So that would be one difference, in most cases an advantage because of performance.

The other obvious difference is I'm not aiming for any standards, quite the contrary; this is about being so fed up with the alternatives (including Scheme) that spending the rest of my life getting it just right looks like a reasonable deal.

reply
keithnz
1 month ago
[-]
years ago someone posted http://norvig.com/lispy.html here on HN

I wrote a lisp in C# based on that, it was only a 100+ ish lines of code. It was a great way to get into Lisp.

reply
dualogy
1 month ago
[-]
There's also Make-A-Lisp and, unlike most write-you-a-lisp/scheme-s out there, that one also covers TCO interpretation, quasiquotation/unquote for macros and their expansion, and goes up to self-hosting: https://github.com/kanaka/mal/

I just went through it over 2-3 days, great practice IMHO to do once in your life from start to finish: https://github.com/metaleap/go-lisp

reply
KineticLensman
1 month ago
[-]
Concur, although it took me two to three months, not days (was working full time is my excuse). Biggest grief point was getting macro expansion to work correctly; TCO worked almost first time, IIRC.
reply
codr7
1 month ago
[-]
Unquoting took me several implementations to wrap my head around and consistently get good enough. As did register allocation, which is more of a VM issue.
reply
KineticLensman
1 month ago
[-]
Interesting. I'm about to start doing a MAL-like again, but this time (I used C# before) using Rust and building a VM as in Crafting Interpreters rather than following the MAL guide. Macros are one of the things that I anticipate being a challenge.
reply
dualogy
1 month ago
[-]
AFAICT, their expansion still happens during an interpretation cycle — but when you "interpret" AST into your byte-code.
reply
codr7
1 month ago
[-]
Or compile, since it's usually a transformation.

I like to call that the emit phase, as in emitting byte code.

The interesting thing is that it only happens once, before the code is evaluated.

reply
codr7
1 month ago
[-]
Yes, I am aware.

I started out designing Forth interpreters, actually calculators and template engines, but Forth was a natural progression.

My design differs a lot from idiomatic Lisp implementations (likewise Forth), and I do sometimes wonder what it would look like if you started in that end and worked your way towards supporting all the features of sharpl.

reply
codr7
1 month ago
[-]
Having thought a bit about it, one motivation for choosing the route I did; rather than the more idiomatic one; is that it relies on pipeline of source transformations to get to optimal code.

The initial expansion is often pretty sloppy.

I don't do many source transformations, everything is designed to emit pretty optimal byte code on the first pass.

reply
default-kramer
1 month ago
[-]
Cool, thanks for showing. Are you planning for any sort of FFI/interop in either direction? I don't see it in your TODO, but I also don't understand why you would write it in C# unless you had this in mind.
reply
codr7
1 month ago
[-]
Given how easy it is to expose functionality from the host language using existing facilities [0], and how complicated a reflection based FFI could get; I would rather not, at least not right now. It's also one of those decisions that will drastically limit my choices for future evolution of the implementation.

The general idea is to use both languages together, as complements; not calling one from the other.

https://github.com/codr7/sharpl/blob/main/src/Sharpl/Libs/Co...

reply
default-kramer
1 month ago
[-]
Ah, this is actually what I had in mind by "interop". So from C# I can evaluate some Lisp code and then test that the result is `PairType` (for example)? Maybe this is so obvious it goes without saying, but I didn't see any examples of that.
reply
codr7
1 month ago
[-]
I understand.

Sure thing, VM.Eval("your code") returns a value if one is produced.

  if (vm.Eval("1:2") is Value v and v.Type == Libs.Core.Pair) {
    Console.WriteLine("Pair: " + v.Cast(Libs.Core.Pair));
  }
Have a look in the main Program.cs to see how to get a VM up and running.
reply
Zambyte
1 month ago
[-]
Cool! I didn't get a chance to run it but I dug around the code a little bit. I noticed there is a Macro class, but no mention of macros in the README. Are macros working?
reply
codr7
1 month ago
[-]
Within the host language for now, they more or less just need to be plugged in.

So far there has been no shortage of more important features at the scale of programs it's currently viable for.

reply
mst
1 month ago
[-]
Separate from anything else, I'm ... concerned ... at the idea of ints being iterable, because it seems like something I'd be much more likely to invoke accidentally than intentionally and then wonder wtf my program was doing. I'd prefer to have to write something like

    (reduce + (range 1 3) 0)
and if you find yourself wanting the natural number iteration regularly maybe

    (^upto (n) (range 1 (- n 1)))
as sugar.

This concern brought to you by e.g. the great pain induced by the difference between

    for x in "foo":
and

    for x in "foo",:
in python, for example.

It may turn out in practice that a lispy language and/or programmers who make different mistakes to me will make it not an issue ... but were it -my- project I'd probably comment out the iterator implementation for int and see if its absence annoyed me enough to decide it was worth bringing back.

(when perpetrating language myself I often find that some of my favourite bits of clever don't pass the 'annoyed me enough' test and end up as documentation examples or similar in the end instead ... hopefully you have better luck ;)

reply
kazinator
1 month ago
[-]
Integers are also iterable in TXR Lisp. But in a different way; you get an infinite sequence starting from the value.

  1> [mapcar list '(a b c) 10]
  ((a 10) (b 11) (c 12))
This crops up on a regular basis in my coding, removing verbosity; I don't regret the decision. It's one of the "take alongs": something to repeat in a future language.
reply
mst
1 month ago
[-]
I think I'd rather have e.g.

    [mapcar list '(a b c) [count-from 10]]
or so. They look great in examples, but once you're doing

    [mapcar list '(a b c) x]
then I still suspect that you'll confuse yourself every so often by passing the wrong x.

OTOH if you've got the entire codebase in your head that isn't really an issue; it's coming back to tweak something a few months later where I usually find such features give me an unexpectedly ventilated foot :)

reply
codr7
1 month ago
[-]
The duality of meaning (is it from or to the specified value) actually speaks against the feature for me, you just ruined it :)

Or fixed it, depending on which way you lean.

reply
kazinator
1 month ago
[-]
We can't get rid of plurality of meaning because syntax isn't semantics. The same syntax can be assigned completely different semantics.

In Lisps, we program syntax with different semantics regularly, so the plurality is part of the daily existence. (defclass a (b c)) and (list a (b c)) have the same shape, but are completely different things.

10 is a piece of syntax, but how do we assign its semantics when it is an argument to a parameter that expects a sequence? That is up in the air the same way. Is it an error, like in most Lisp-like languages? does it count up to 10 from 1? Below 10 from zero? From 10 ad infinitum?)

The choice becomes a paradigm in the language that everyone has to understand.

What is undesirable is inconsistencies. If certain functions counted up to the number, but others from the number, arbitrarily, that would be objectively worse than consistently doing one or the other.

reply
kazinator
1 month ago
[-]
It would not be often useful to have it count up from 1 or zero up to or below the value; I would not have designed it that way. In many situations you don't know the upper limit of what is enumerated; it comes implicitly from the lengths of other sequences or in other ways.

It's also less inefficient, because the value has to be converted to an iterator object that knows about the range, and keeps track of the state.

In TXR Lisp, certain objects are self-iterable, like characters, numbers and good old conses.

  1> (iter-begin "abc")
  #<seq-iter: a031a10>
  2> (iter-begin 3)
  3
  3> (iter-begin '(a b c))  
  (a b c)
To iterate a string, we need to obtain a seq-iter object, but for 3 and (a b c), we do not. With these objects, we have all the state we need in order to iterate.

  4> (iter-more 3)
  t
  5> (iter-item 3)
  3
  6> (iter-step 3)
  4
  7> (iter-more '(a b c))
  t
  8> (iter-item '(a b c))
  a
  9> (iter-step '(a b c))
  (b c)
  10> (iter-more *1)
  t
  11> (iter-item *1)
  #\a
  12> (iter-step *1)
  #<seq-iter: a031a10>
  13> (iter-item *1)
  #\b
iter-step may or may not destructively update its argument, so you always have to capture the return value and forget about the original.

You can see how for a list, iter-more is equivalent to (not (null ...)), iter-item is just car, and iter-step is just cdr.

There is also lazy processing in TXR Lisp. E.g. lazy mapcar which is mapcar*. The following will only read the first few lines of the syslog, returning instantly:

  14> (take 3 [mapcar* cons 1 (file-get-lines "/var/log/syslog")])
  ((1 . "Jul 23 00:09:54 sun-go systemd[1]: openvpn@client.service: Service hold-off time over, scheduling restart.")
   (2 . "Jul 23 00:09:54 sun-go systemd[1]: openvpn@client.service: Scheduled restart job, restart counter is at 1044619.")
   (3 . "Jul 23 00:09:54 sun-go systemd[1]: Stopped OpenVPN connection to client."))
reply
codr7
1 month ago
[-]
Counting from 0 up to a value is the standard for loop, or numbering things (possibly with using the value with an offset).

But like I said, I'm not so sure there is a right way to interpret it any more.

reply
codr7
1 month ago
[-]
Noted, I feel like I need more use cases and experience to say for sure.

Some way of forming ranges is planned anyways.

reply
mst
1 month ago
[-]
Yeah, 'int is iterable' triggered "oh that's cute!" immediately followed by "... and the exact sort of cute I often find irresistably tempting myself and then regret later."

My current thing in progress has basically all of its temptation points already spent because I decided I was going to make it fexpr based, but I'm very definitely having fun with that so far - "no special forms required" makes for some interesting possibilities.

(kernel-lisp and kernel-thesis in /lisp/ are worth looking at if fexprs sound interesting, though I'm on the pragmatics side of things and will not pretend I came anywhere close to understanding the $vau calculus part)

reply
codr7
1 month ago
[-]
A certain level of helpfulness is nice though, Ruby and to a somewhat lesser extent Perl get that part right. I guess Perl could be seen as a warning example of what happens if you go all in.

Yep, I've been on a ride down the fexpr implementation hole previously; which is another reason I've been delaying user macros; still processing the experience.

reply
mst
1 month ago
[-]
There's a bunch of stuff in perl that's best restricted to one-liner or short script usage - i.e. the sort of cases where you're using it for the early inspiration super-awk purposes.

Writing perl as a programming language is, I find, a very different dialect, and perhaps amusingly one in which I rely on function composition, closures and block scoping heavily due to also loving lisp - javascript's 'let' behaves very similarly to perl's 'my' and I find it ... difficult ... to deal with python or ruby for any length of time since their scoping is "stuff randomly pops into existence at function scope" and, just, aaaaa.

Perl of course can't really have macros, but it does have the ability to define custom (also block scoped :) keywords which can allow one to achieve similar results - see e.g. https://p3rl.org/Keyword::Declare for a demonstration built on top of that functionality (there's also a code rewriter called Babble but I got distracted so while it works, it's woefully underdocumented, keep forgetting to get back to that).

I've always had a fondness for the "building the language up towards the problem" style of programming (I wrote the first ever proof of concept for custom perl keywords, though that code has happily long since been obsoleted by people who knew what they were doing) which has led me to 'fexprs for making DSLs' since those are usually building up a config structure or similar which makes fexprs' being tricky to optimise less of an obstacle.

(also with fexprs I don't have to limit myself to 'block in front' ala perl or 'block at the end' ala ruby/elixir (though it's amazing how much mileage elixir gets out of its macros, Kernel.ex is well worth a read if you're so inclined))

reply
healeycodes
1 month ago
[-]
You can annotate the code blocks in the README to get generic lisp syntax highlighting.

```lisp

(+ 1 2)

```

reply
codr7
1 month ago
[-]
Thanks.

It's unfortunately enough of a bad fit for me to prefer no highlighting; my own fault, the price you pay for adding syntax.

reply
zczc
1 month ago
[-]
So, it is a shortcut around the tenth rule: "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp" [1]

[1] https://en.wikipedia.org/wiki/Greenspun's_tenth_rule

reply
codr7
1 month ago
[-]
Observant :)

Polyglot projects have been sort of a thing lately, because micro services made them doable I guess; but I feel the combination of a capable and portable host language and a scripting language implemented in that language captures the best of both worlds while adding nice super powers on top.

reply
pjmlp
1 month ago
[-]
You mean rebranding distributed computing and OS IPC for newer generations?
reply
codr7
1 month ago
[-]
It's the same old ideas, over and over again.

But it's not a circle, it's a spiral; we learn something new every time.

reply
pjmlp
1 month ago
[-]
Sometimes, and suddenly modular monoliths become a thing, as the microservices generation learns why we weren't doing SUN RPC, TOOL, DCOM and CORBA for every little piece of application functionality.
reply
codr7
1 month ago
[-]
We could definitely do a better job of learning from those who came before, it would save a lot of time and effort.

Evolution is messy, every combination has to be tried, every minor detail thoroughly investigated from all angles.

reply