The Easiest Way to Build a Type Checker
69 points
3 days ago
| 7 comments
| jimmyhmiller.com
| HN
thomasikzelf
4 hours ago
[-]
I recently implemented a Hindley Milner type checker. The algorithm itself is not necessarily super difficult (once you get your head around it ofcourse) but the way it is explained is. It seems like HM is mostly explained by people with a mathematics background that already know the proper notation. I wonder how much overlap there is between people that know the notation and do not know how HM works, probably not much.

Anyway nice demo. Bi-directional is already quite powerful!

reply
azhenley
6 hours ago
[-]
I recently made a toy type checker for Python and had a lot of fun.

https://austinhenley.com/blog/babytypechecker.html

reply
bbminner
3 hours ago
[-]
I have not looked into the HM algorithm much, but is there (an educational or performance wise) advantage over implementing a (dumb) SAT solver and expressing a problem as a SAT problem? It always seemed like the "natural representation" for this kind problem to me. Does knowing that these are types _specifically_ help you somehow / give you some unique insights that won't hold in other similar SAT problems?
reply
recursivecaveat
2 hours ago
[-]
Keep in mind one of the most important attributes of a good compiler is clearly explaining to the user what caused compilation failure and why. If you try to solve in a very abstract and general space it could be challenging to give an actionable error message.
reply
Quekid5
2 hours ago
[-]
Yup, that's basically it. "SAT says no" isn't a very useful error message.
reply
octachron
1 hour ago
[-]
For a bounded size of types of sub-expressions, HM inference is quasi-linear in the size of the program, because the constraints appearing in the HM algorithm are only equality between meta-variables.A NP-complete SAT solver is not really a good fit for this kind of simple constraints. Even more so when typechecking often represents a significant part of compilation time.

(Of course the tricky part of the definition above is that the size of types can theoretically be exponential in the size of a program, but that doesn't happen for programs with human-understandable types)

reply
remexre
2 hours ago
[-]
how would you encode a program like

    function f<T>(x: T) { return x; }
    function g(x: number) { return { a: f(x), b: f(x.toString()) }; }
in sat?

if that's easy, how about length and f in:

    function append<T>(xs: list<T>, ys: list<T>) {
      return match xs {
        Nil() -> ys,
        Cons(hd, tl) -> Cons(hd, append(tl, ys)),
      };
    }
    function flatten<T>(xs: list<list<T>>) {
      return match xs {
        Nil() -> Nil(),
        Cons(hd, tl) -> append(hd, flatten(xs)),
      };
    }
    function map<T, U>(f: (T) => U, xs: list<T>) {
      return match xs {
        Nil() -> Nil(),
        Cons(hd, tl) -> Cons(f(hd), tl),
      };
    }
    function sum(xs: list<number>) {
      return match xs {
        Nil() -> 0,
        Cons(hd, tl) -> hd + length(tl),
      };
    }
    function length<T>(xs: list<T>) { return sum(map((_) -> 1, xs)); }
    function f<T>(xs: list<list<T>>) {
      return length(flatten(xs)) === sum(map(length, xs));
    }
hm-style inference handles polymorphism and type application without a complicated sat encoding
reply
mrkeen
6 hours ago
[-]
I grabbed the code from the article and annotated it with the different cases from the famous picture*

  switch (expr.kind) {
    case "number"/"string"/"var":
      ... [Var]
    case "call":
      ... [App]
    case "function":
      throw new Error("...[Abs]")
    case "let":
      ... [Let]
Looks like most of the hard work's done, and probably wouldn't be too tricky to get [Abs] in there too!

* https://wikimedia.org/api/rest_v1/media/math/render/svg/10d6...

reply
nkrisc
5 hours ago
[-]
In the small example type checker given, would a function of type A -> B -> C be represented something like this?

    { kind: "function", arg: A, returnType: { kind: "function", arg: B, returnType: C}}
Or is that case simply not covered by this bare-bones example? I can't parse the code well enough just in my mind to tell if that would work, but I think it would work?

EDIT:

I just noticed the working demo at the bottom that includes an example of a multi-argument function so that answers whether it works.

reply
IshKebab
3 hours ago
[-]
Nice! I think it's pretty widely agreed that requiring type annotations at the function level is a good thing anyway. Apparently it's considered good practice in Haskell even though Haskell doesn't require it.

I've also worked with OCaml code that didn't do it and you lose a lot of the advantages of static typing. Definitely worse.

Rust got it right.

reply
dragonwriter
56 minutes ago
[-]
Type annotations on functions in Haskell (or similar languages) are useful for:

1. leveraging the type checker to verify (aspects of) the correctness of your function, and

2. communicating intent to humans

I've found in my own explorations with Haskell that its useful to write with functions with them, then verify that they work, and then remove them to see what the inferred would be (since it already compiled with the annotation, the inferred type will either be identical to or more general than the previously declared type), and generally (because it is good practice to have a declared type), replace the old declared type with the inferred type (though sometimes at this point also changing the name.)

reply
toolslive
2 hours ago
[-]
what if your IDE can show the type of any expression as a tooltip ? Would you still think the same?
reply
ufo
1 hour ago
[-]
In Haskell, type error messages are always like "types A and B should be equal, but they are not". The problem is that, without type annotations, the compiler cannot know if it is A or B that is wrong, which can result in confusing error messages.

For example, suppose that you have a bug in the body of a function, but did not provide a type annotation for it. The function might still compile but not with the type you want. The compiler will only notice something is amiss when you try to call the function and it turns out that the function's inferred type doesn't fit the call site.

Basically, global type inference in the absence of type annotations means that changes in one part of the file can affect inferred types very far away. In practice it's best to use type annotations to limit inference to small sections, so that type errors are reported close to what caused them.

reply
Quekid5
1 hour ago
[-]
I can't speak for the parent poster, but for global function declarations, yes, absolutely.

It's infuriating when a type error can "jump" across global functions just because you weren't clear about what types those functions should have had, even if those types are very abstract. So early adopters learned to sprinkle in type annotations at certain points until they discovered that the top-level was a good place. In OCaml this pain is somewhat lessened when you use module interface files, but without that... it's pain.

reply
Quekid5
1 hour ago
[-]
> I think it's pretty widely agreed that requiring type annotations at the function level is a good thing anyway. Apparently it's considered good practice in Haskell even though Haskell doesn't require it.

In Haskell-land: At the global scope, yes, that's considered good practice, especially if the function is exported from a module. When you just want a local helper function for some tail-recursive fun it's a bit of extra ceremony for little benefit.

(... but for Rust specifically local functions are not really a big thing, so... In Scala it can be a bit annoying, but the ol' subtyping inference undecidability thing rears its ugly head there, so there's that...)

reply
ufo
1 hour ago
[-]
Languages with local type inference can sometimes omit type annotations from lambdas, if that lambda is being returned or passed as an argument to another function. In those situations we know what the expected type of the argument should be and can omit it.
reply
Quekid5
49 minutes ago
[-]
Yeah, that's true and that's a good convenience even if it's not full inference. In the case of Scala, the parameter types may often be required, but at least the return type can be omitted, so there's that.
reply
tayo42
6 hours ago
[-]
I thought the implementation here was how hindley Milner worked? I guess not?
reply
tekknolagi
5 hours ago
[-]
No, HM is unification based and requires no annotations at all.
reply
skybrian
4 hours ago
[-]
Apparently it only gets away without annotations if the language doesn’t support subtyping? Here’s an explanation about why bidirectional type checking is better for that:

https://www.haskellforall.com/2022/06/the-appeal-of-bidirect...

It seems to me that type-checking that relies on global constraint-solving is usually a bad idea. Annotated function types result in less confusion about what a function does.

reply
ufo
1 hour ago
[-]
Indeed. Unification-based type inference doesn't work great when the type constraints are inequalities.
reply