> Language servers are powerful because they can hook into the language’s runtime and compiler toolchain to get semantically correct answers to user queries. For example, suppose you have two versions of a pop function, one imported from a stack library, and another from a heap library. If you use a tool like the dumb-jump package in Emacs and you use it to jump to the definition for a call to pop, it might get confused as to where to go because it’s not sure what module is in scope at the point. A language server, on the other hand, should have access to this information and would not get confused.
You are correct that a language server will generally provide correct navigation/autocomplete, but a language server doesn’t necessarily need to hook into an existing compiler: a language server might be a latency-sensitive re-implementation of an existing compiler toolchain (rust-analyzer is the one I’m most familiar with, but the recent crop of new language servers tend to take this direction if the language’s compiler isn’t query-oriented).
> It is possible to use the language server for syntax highlighting. I am not aware of any particularly strong reasons why one would want to (or not want to) do this.
Since I spend a lot of time writing Rust, I’ll use Rust as an example: you can highlight a binding if it’s mutable or style an enum/struct differently. It’s one of those small things that makes a big impact once you get used to it: editors without semantic syntax highlighting (as it is called in the LSP specification) feel like they’re naked to me.
Wow! That is an incredibly good reason. Thank you very much for telling me something I didn’t know. :)
UPDATE: I've added a paragraph talking about the ability of rust-analyzer. Thank you again!
So when we just have AI write it, it means we've avoided the thinking part, and so the written article will be much less useful to the reader because there's no actual distillation of thought.
Using voice to article is a little better, and I do find that talking out a thought helps me see its problems, but writing it seems to do better.
There's also the problem that while it's easy to detect AI writing, it's hard to tell the difference between someone who thought it out by talking and had AI write it versus someone who did little thinking and still had AI write it. So as soon you you smell the whiff of AI writing, the reasonable expectation is that there's less distillation of thought.
If we know the text is hand-authored, then we have a signal that at least one person believed the content was important enough to put meaningful effort into creating it. That's a sign it might be worth reading.
If it's LLM-authored, then it might still be useful, or it might be complete garbage. It's hard to tell because we don't know if even the "author" was willing to invest anything into it.
Anyway, I wrote a little more about that here: https://lambdaland.org/posts/2025-08-04_artifical_inanity/
Intent matters a ton when reading or writing something.
Hmm, the strong reason could be latency and layout stability. Tree-sitter parses on the main thread (or a close worker) typically in sub-ms timeframes, ensuring that syntax coloring is synchronous with keystrokes. LSP semantic tokens are asynchronous by design. If you rely solely on LSP for highlighting, you introduce a flash of unstyled content or color-shifting artifacts every time you type, because the round-trip to the server (even a local one) and the subsequent re-tokenization takes longer than the frame budget.
The ideal hygiene could be something like -> tree-sitter provides the high-speed lexical coloring (keywords, punctuation, basic structure) instantly and LSP paints the semantic modifiers (interfaces vs classes, mutable vs const) asynchronously like 200ms later. Relying on LSP for the base layer makes the editor feel sluggish.
Tree-sitter has okay error correction, and that along with speed (as you mentioned) and its flexible query language makes it a winner for people to quickly iterate on a working parser but also obviously integration into an actual editor.
Oh, and some LSPs use tree-sitter to parse.
> pacman -Ssq tree-sitter
tree-sitter
tree-sitter-bash
tree-sitter-c
tree-sitter-cli
tree-sitter-javascript
tree-sitter-lua
tree-sitter-markdown
tree-sitter-python
tree-sitter-query
tree-sitter-rust
tree-sitter-vim
tree-sitter-vimdoc
Where's R, YAML, Golang, and several others?For others, this is a sub optimal answer, but I’ve played with generating grammars with latest llms and they are surprisingly good at doing this (in a few shots).
That being said, if you’re doing something more serious than syntax highlighting or shipping it in a product, you’ll want to spend more time on it.
https://github.com/tree-sitter/tree-sitter/wiki/List-of-pars...
awk bash bibtex blueprint c c-sharp clojure cmake commonlisp cpp css dart dockerfile elixir glsl gleam go gomod heex html janet java javascript json julia kotlin latex lua magik make markdown nix nu org perl proto python r ruby rust scala sql surface toml tsx typescript typst verilog vhdl vue wast wat wgsl yaml
[1]: https://github.com/tree-sitter-grammars/tree-sitter-yaml
Since it comes from `tree-sitter-grammars/tree-sitter-yaml`, it may be quick to integrate the official repo.
I use tree-sitter for developing a custom programming language, you still need an extra step to get from CST to AST, but the overall DevEx is much quicker that hand-rolling the parser.
Could you elaborate on what this involves? I'm also looking at using tree-sitter as a parser for a new language, possibly to support multiple syntaxes. I'm thinking of converting its parse trees to a common schema, that's the target language.
I guess I don't quite get the difference between a concrete and abstract syntax tree. Is it just that the former includes information that's irrelevant to the semantics of the language, like whitespace?
An example: in a CST `1 + 0x1 ` might be represented differently than `1 + 1`, but they could be equivalent in the AST. The same could be true for syntax sugar: `let [x,y] = arr;` and `let x = arr[0]; let y = arr[1];` could be the same after AST normalization.
You can see why having just the AST might not be enough for syntax highlighting.
As a side project I've been working on a simple programming language, where I use tree-sitter for the CST, but first normalize it to an AST before I do semantic analysis such as verifying references.
AST is just CST minus range info and simplified/generalised lexical info (in most cases).
Any tips for keeping the grammar sizes under control? I'm distributing a CLI tool that needs to support several languages, and I can see the grammars gradually bloating the binary size
I could build some clever thing where language packs are opt-in and distributed as WASM, maybe. But that could be complex
- i got a hint of language server and tree sitter thanks to this wonderfully written post but it is still missing a lot of details like how does the protocol actually look like, what does a standard language server or tree sitter implementation looks like
- what are the other building blocks?
Let me be blunt: any article posted here should provide more information, or more in-depth analysis than Wikipedia. Since I'm not a compiler person, I might be too harsh to suggest that the article does not provide more in-depth analysis (because it is definitely shorter than it) than the Wikipedia article -- I apologize if that's the case.