FilterHN

starkparker

1 month ago

[-]

This is also how you handle adding code blocks in GitHub comment suggestions, fwiw.

    ````suggestion
    This example should instead be:

    ```basic
    10 PRINT "LOL"
    20 GOTO 10
    ```
    ````

culi

1 month ago

[-]

Does this work with arbitrary depth? Can I do

  `````ts
  const Accordian = styled.details<{ isAprilFirst: boolean; }>`
  ````css
    accent: rebeccapurple;
    font-family: ${
  ```js
      props => props.isAprilFirst ? 'Comic Sans' : 'Time New Roman'
  ```
    };
  ````
  `;

  `````

hun3

1 month ago

[-]

Sure, as long as you don't run out of `s

Helmut10001

1 month ago

[-]

Yes, this is also how JupyterBook [1] does it (I think v1 uses Myst Markdown parser). I found this to work excellent!

[1]: https://jupyterbook.org/

behnamoh

1 month ago

[-]

what if you want to show ````? should you add ````` tags then?

1 month ago

[-]

Yes; TFA contains enough explanation to make it clear how to extend this arbitrarily.

chuckadams

1 month ago

[-]

Markdown's parser seems to be a fascinating anomaly: a specification that consists entirely of exceptions and corner cases.

velcrovan

1 month ago

[-]

"Markdown" doesn't have a specification, only a syntax description which is ambiguous in many places, and a reference implementation written in perl 22 years ago and totally neglected since.

CommonMark is a comprehensive specification which also has a reference implementation and a test suite.

metalliqaz

1 month ago

[-]

lol this is a great wording for something I've not been able to express before

I sometimes wonder... is it Markdown's specification chaos the reason for its success? Maybe it was just barely enough spec to be usable but also small enough to allow anyone to make an implementation that seemed right. No qualifications to fail. Thus, it proliferated.

The xkcd[1] problem is a darn shame, though. At least CommonMark exists for people who want to point to a "Standard"

[1] https://xkcd.com/927/

TimorousBestie

1 month ago

[-]

Markdown is definitely a case of “worse is better” and it helped that it was half-canonicalizing ASCII formatting workarounds that had been in common use for decades.

1 month ago

[-]

… except its link syntax. That is an abomination that had never existed and should never have existed.

1 month ago

[-]

What’s wrong with the link syntax and what would an alternative be?

1 month ago

[-]

Two things wrong with it:

1. People have a hard time remembering the order and which characters to use. The most common error I see is (text)[href]. Spaces in between the [text] and (href) is also common.

2. ( and ) are URL code points: they can exist in URLs without being percent-encoded. This means that you can’t just paste or emit regularly serialised URLs, you also need to change any ( to %28 and ) to %29.

To fix it, you start out by using <> to delimit the URL instead of (). This fixes the serialised URL problem (< and > aren’t URL code points), and it was the traditional delimition of URLs too, where delimition was required. You could say that () was common too, but I’d argue that was just normal linguistic parenthesis rather than URL delimition. Markdown even uses <> as URL delimition already, but only for links with no custom text (called “automatic links” in Gruber’s definition, “autolinks” in CommonMark): “https://example.com” is just text, “<https://example.com>” makes it a link.

As for the way of adding a title attribute to a link, that shows that the parentheses aren’t even actually delimiting the URL. And then you get into the mess that’s […] and […][…], in addition to […](…). It’s an ill-considered mess.

1 month ago

[-]

1. Seems unavoidable because there is no natural order for these two elements. It happens that html puts the link first but any time I’m writing I would put the link second.

2. I guess that’s fair. I think parentheses in urls are a bigger issue than parentheses in markdown though. Parentheses in urls often end up percent encoded whether they need to be or not.

> You could say that () was common too, but I’d argue that was just normal linguistic parenthesis rather than URL delimition.

Right. I would say that markdown is inspired more by linguistic styles than markup styles.

You certainly could use <>, but I actually don’t think that matches the way people usually write urls in non-markup formatting. I find it interesting that Gruber chose to use that for autolinks, though.

I do appreciate you clarifying. I see what you mean, at least with respect to all the issues around #2. The more I use markdown, though, the more I come up appreciate this format, specifically because it’s so readable as text. I think it’s a reasonable trade off.

1 month ago

[-]

> 1. Seems unavoidable because there is no natural order for these two elements.

There is a solution: restructure it so the href part is inside the link, rather than adjacent to, which is the real problem. reStructuredText had `text <href>`_. In that form, the trailing underscore is a wart, but there’s more justification to it than it may initially seem (where Markdown can have [text] or [text][], reStructuredText’s equivalent is text_ or `text`_). For my own lightweight markup language, I’ve been using [text <href>] for a couple of years (and had {text <href>} before that).

> Parentheses in urls often end up percent encoded whether they need to be or not.

That’s what you can expect from something implementing RFC 3986; but these days, almost everything uses WHATWG’s URL Standard <https://url.spec.whatwg.org/>, under whose rules parentheses are not percent-encoded.

> I actually don’t think that matches the way people usually write urls in non-markup formatting

It isn’t any more, but it used to be very common.

1 month ago

[-]

I don’t really understand how reStructuredText’s approach fixes this. The fundamental thing is that you need to remember the order and the syntax. reStructuredText matches the order of Markdown (and so does not match html) but not the syntax. It also seems like a fine format.

> under whose rules parentheses are not percent-encoded.

I don’t know. I feel like half the time I end up with parens encoded. I’ll have to keep an eye on it. Maybe that behavior isn’t common anymore.

> [urls identified with <link> in plain text] used to be very common.

I don’t recall this being common but it’s possible I’m just not remembering.

__mharrison__

1 month ago

[-]

I've written (many) books in Markdown and still can't remember the link order or images in Markdown.

Otherwise, it is a relatively smooth writing experience. Much better than word, asciidoc, or latex.

1 month ago

[-]

Anything that doesn't force you to remember arbitrary ordering - square brackets first? Or parentheses? It's the textual equivalent of plugging in usb upside down.

An alternative would be to simply use square brackets for both clauses of the link.

1 month ago

[-]

I think it’s a natural outgrowth of the way links are commonly provided in plaintext, like so much other markdown.

> The details can be found at my website (https://example.com).

The problem with this is that if you want to render this “pretty”, there’s no way to know whether the link should be “my website” or “website” or even the whole sentence. So you add brackets to clarify.

> The details can be found at [my website](https://example.com).

There are certainly alternatives but I don’t think any of them are more natural, or memorable for that matter.

1 month ago

[-]

My issue is remembering that the square brackets come first, not the parentheses. I do like asciidoc's method: https://example.com for bare link, or https://example.com[pretty text] if alternate text is desired

Edit: It took me a re-read to fully understand your comment, I can see how square brackets might be an incremental addition. This may also help remember the syntax, thanks!

1 month ago

[-]

Excerpt from my notes when I was deciding on a link syntax for my own lightweight markup language:

AsciiDoc doesn’t actually have a real link syntax—what it has is more or less an natural consequence of other syntax choices, but isn’t actually URL-aware, and will mangle some less common URLs. Still, what you get is mostly this kind of thing:

• https://example.com[Link text]

• link:URL[Link text]

But woe betide you if you go beyond what it supports, its techniques when you need escaping are grotesque, monstrous horrors. Seriously, when you fall off the happy path, AsciiDoc is awful.

1 month ago

[-]

That asciidoc format also seems very reasonable.

The big issue isn’t specifically that markdown is wrong or right but that all these different systems are very inconsistent.

simonkagedal

1 month ago

[-]

Someone (maybe on this site) suggested to think of the bottom bars of the square brackets around the linked text to kind of frame the underline. Somehow worked really well for me, haven’t forgotten the syntax since.

1 month ago

[-]

I like this, thank you.

setopt

1 month ago

[-]

> An alternative would be to simply use square brackets for both clauses of the link.

For comparison, Org-mode uses [[LINK][DESCRIPTION]] instead of [DESCRIPTION](LINK).

1 month ago

[-]

This is great! Not an emacs user (as yet) but this and org-mode's /italic/ _underline_ *bold* +strike+ feel that much closer to the oft-touted "source looks kinda like formatting” ideal of markdown. Not sure why we ended up with the mediocre version as a defacto standard.

macintux

1 month ago

[-]

The only keyboard shortcut for org-mode and markdown-mode I consistently remember is C-c C-l for inserting links. Much easier to remember that than to remember the syntax, and the fact that both modes use the same easy-to-remember shortcut is a major win.

setopt

1 month ago

[-]

Sadly, most other keybindings differ between org-mode, markdown-node, and auctex. I would have loved more consistency, and often end up typing the syntax instead of tripping up keybindings.

PaulHoule

1 month ago

[-]

My feeling overall is that I can't get into flow writing Markdown, there are just enough things wrong that I never feel completely comfortable while doing it.

It seems that in the HTML 5 age there is some subset of HTML which should be completely satisfying for anyone. Maybe it is custom components that work like JSX (e.g. <footnote>) or something like tailwind. Editing HTML with one eye on a live view is more pleasant for me than anything else. Every kind of rich editor that looks like Microsoft Word (esp. Word!) comes across as a dull tool where selections, navigation, and applying styles almost work. There's got to be some kind of conceptual problem at the root of it all that makes fixing it like pushing around a bubble under the rug. I want to believe in Dreamweaver but 2-second latency to process keystrokes on AMD's best CPU from 2 years ago and the incredulous attitude Adobe support has about the problem makes it a non-started [1]

[1] if I ran an OS failing to update the UI in 0.2 sec gives an immediate kill -9 and telemetry of the event will get you dropped out of the app store not much later. I'm not saying rendering has to be settled in 0.2 sec but there has to be some response that feels... responsive.

TimorousBestie

1 month ago

[-]

To my memory, people had been using [link](url) and similar styling for a long time on old web forums and even BBSes.

Be glad they didn’t adopt Everything2’s “pipe link” syntax: [link|url]. Or maybe it was [url|link]? It’s been well over two decades, I don’t remember anymore.

1 month ago

[-]

I have never heard of [text](url) being used before Markdown. “text (url)” or “text <url>”, leaving the reader to infer which words the URL applied to, sure; but among formats that provided a span for the link text, I think Markdown may have been the first to use a spelling of two adjacent delimited parts. [url text], [text|url], [url="url"]text[/url], et cetera, seen them all.

bloppe

1 month ago

[-]

Markdown succeeded because both the source code and the rendered HTML are readable. Other markups like restructured text don't look good in source form.

But ya, in order to look good in source form, but still handle arbitrary content, they had to add all these little exceptions and corner cases.

velcrovan

1 month ago

[-]

HTML and CSS were also chaotic at one point and it sucked ass.

Loosy goosy is fine for a hobby project but if you do anything with vanilla Markdown beyond simple links, headings and text, you quickly find yourself in a frustrating zone of incompatible hacks and syntax extensions.

SOLAR_FIELDS

1 month ago

[-]

I maintain knowledge bases in Obsidian compatible repositories and one thing that's been great is having a hand rolled validation schema that validates against the AST output produced by remark. I call it a "markdown body grammar". So I can at least prevent people from doing edge casey things at build time when they produce documents

https://news.ycombinator.com/item?id=8271327

mikepurvis

1 month ago

[-]

And Standard Markdown / Common Markdown / CommonMark was subject to a bunch of drama when it first emerged:

I generally like John Gruber and have been a DF reader for years, but I really never understood his perspective on this; I have trouble seeing it as much more than a "worse is better" kind of take.

chuckadams

1 month ago

[-]

Yeah I ultimately can't hate markdown, but it really was just specified more or less as "whatever markdown.pl does", and markdown.pl was not exactly the most rigorously engineered thing. Even bbcode of all things has more predictable structure to it. The commonmark/pandoc guy now has Djot, which is supposed to be a bit more sane, but I get the feeling it's probably too late :-/

nick238

1 month ago

[-]

Didn't know this. It kind reminds me of MIME multipart messages (used in email attachments, MMS, etc.) where the header includes a "boundary" tag which the parser will look for to terminate the part. It feels strange, like it could be some injection risk where if the file knew what the boundary was going to be, it could desync the bounds and turn one malicious, inactive file into one or more bad files.

manwe150

1 month ago

[-]

I believe the spec intends that a decent mail handler is required to scan the text and make sure the delimiter is not present or pick a different one. Or use a different encoding (eg base64) to prevent conflicts if you want streaming ability. Although bugs of course could break the best of intentions

codazoda

1 month ago

[-]

I might be able to use this, especially in LLMs where I ask them to give me things in code fences all the time. If I ask for markdown in a code fence, it all falls apart. If, however, I asked for markdown in a ~~~ code fence, or even ~~~~~, all would be right with the world, since they typically use ```.

qingcharles

1 month ago

[-]

I was debugging code last night that uses ~~~ as a delimiter in a string. At least, as you say, you can go crazy and use ~~~~~ to get around it.

Igor_Wiwi

1 month ago

[-]

I will use it as a rendering benchmarking for mdview.io https://mdview.io/#mdv=N4IgbiBcCMA0IBMCGAXJUTADrhzWOAtgnjgMI...

1 month ago

[-]

All this complication seems to stem from the simple fact, that the fences don't have a recognizably distinct start and end marker. It's all "`" or "~", instead of one symbol at the start and another, different symbol at the end. And then going into the different numbers of backticks or tildes. Why add such ambiguity, that will only make it harder to parse things correctly? This immediately raises the question: "What if I start a block with 4 backticks and end it with 5?"

All these complications would have been avoidable with a more thought through design/better choices of symbols. For example one could have used brackets:

    [[[lang
    code here
    ]]]

And if one wanted to nest it, it should automatically work:

    [[[html
    html code
    [[[css
    css code
    ]]]
    [[[js
    js code
    ]]]
    html code
    ]]]

In case one wants to output literally "[[[" one could escape it using backslash, as usual in many languages.

In a parser that would be much simpler to parse. It is kind of like parsing S-expressions. There is no need for 4 backticks, 5, or any higher number. I don't want to sit there counting backticks in the document, to know what part of a nested code block some code belongs to. It's a silly design.

charles_f

1 month ago

[-]

Your solution for the problem described here is to escape with a different character. MD's is to add more special characters. Both are valid and exist in other languages, I wouldn't qualify one as better thought than the other - though since we're talking about text that I don't want modified, if I prefer adding ticks rather than going into the text and escaping them one by one.

The complication doesn't stem from lack of distinct start and end, what you are trying to solve for here, is when you have multiple languages in a single block, and want pretty colors on each. Seeing that HTML doesn't support imbrication of pre tags (or rather doesn't render one embedded in the next), that would probably not work without producing something that is not pure html.

> In a parser that would be much simpler to parse

Parsing a variable number of ` is not more complex than looking ahead for a closing boundary. In fact, once you introduce escaping characters, you need to handle escaping of the escaping character, which is slightly more complex.

1 month ago

[-]

The syntax highlighting of the code of each language itself is not the problem. This post is about markdown. A typical markdown parser doesn't do syntax highlighting for code blocks. That's usually done by some other library, like for example pygments. The issue is about markdown syntax. What happens on another language's level does not concern the markdown parser.

charles_f

1 month ago

[-]

That's exactly my point, the solution you're discussing is about something else, and not relevant to what's discussed in this post.

1 month ago

[-]

The solution I describe merely serves for being an easier to parse way of nesting code blocks. I don't mean it to serve for any syntax highlighting, as I am understanding is your impression. That would only be an outcome for tools that act upon the AST generated by the parser. Tools that can take code of a programming language and color it. Not the job of a markdown parser, for which my idea is meant.

_ache_

1 month ago

[-]

So if syntax highlighting isn't a problem. The standard way of presenting block of code in Markdown is to indent it.

Which is quick and easy to understand.

armchairhacker

1 month ago

[-]

> In case one wants to output literally "[[[" one could escape it using backslash, as usual in many languages.

Sometimes you want to paste a large region of code into a code block, and escaping the content is harder than fixing and start and end delimiters. This matters particularly in Markdown, where embedding large regions of code or text is common, whereas other languages you’d put it in its own file.

So I still suggest the ability to change the number of open and close brackets. Then you’ll also need an implicit newline or other way to distinguish content that starts with an open bracket.

embedding-shape

1 month ago

[-]

Indeed! Last time I dealt with this exact problem in a toy application made for myself, I ended up making the markdown parser only read ```$LANG syntax, and making it assume just ``` is a closing tag, not accepting it as a opening tag. Made it easier for the pretty syntax formatter to do it's job too, as it no longer has to figure out the language.

_ache_

1 month ago

[-]

Do you realize that your solution is basically to use a tag, which is why Markdown have been developed, to not use them.

The classic way in markdown to insert block of code is to indent the code.

1 month ago

[-]

Well, if you want complex things like nested code blocks, then a kind of "tag" approach can be just the solution needed. Input-wise it doesn't really make a difference, whether I have to type "[[[" and "]]]" or "```" and again "```". Whether or not my idea is more like a tag doesn't seem to have any repercussions. Outsourcing ever more complexity into the parser, with bad design decisions however has a significant cost, which is making development of parsers and grammars difficult.

1 month ago

[-]

The point of avoiding tags is to improve the ergonomics: you don't have to remember tag names, use a separate delimiting syntax anyway to indicate where the tag name is, and then repeat the tag name when you close the block. Especially given that this is for a block-level construct anyway, simply using a bracketing syntax isn't causing any of those problems.

Indenting inline code requires a text editor that makes indentation ergonomic or else extra effort per line; and it doesn't mesh well with lists or block quotes.

mohsen1

1 month ago

[-]

I love hacker news! You learn something useful here and there.

I always used html elements like <pre /> and <code /> to go around this in the past

[1] https://github.com/PratikDeoghare/brashtag [2] https://www.postgresql.org/docs/current/sql-syntax-lexical.h...

pratikdeoghare

1 month ago

[-]

I faced this problem when designing my own notation [1].

Solved it by surrounding code with more ticks than maximum number of consecutive ticks inside its text. This allows arbitrary nesting.

Postgres solves it by using `$something$ whatever $something$` [2].

1 month ago

[-]

> Solved it by surrounding code with more ticks than maximum number of consecutive ticks inside its text. This allows arbitrary nesting.

So, the same thing that Markdown does, as described in TFA?

zokier

1 month ago

[-]

I realize that it would be somewhat antithetical for markdown, but I increasingly feel that length-prefixing everything makes lot of stuff easier at pretty low cost. Anything depending on delimiters or start/end tags inevitably ends up with difficult quoting rules or some other awkward scheme (like seen here).

rednafi

1 month ago

[-]

Ah, YAML and Markdown, two beautiful accidents of tech. It still boggles my mind that we collectively couldn’t come up with a post hoc spec and fix all the warts with a strict parser for either of them. Sure, it would break quite a bit of existing stuff, but the pain would probably be worth it.

zokier

1 month ago

[-]

CommonMark is such posthoc spec

rednafi

1 month ago

[-]

Yet there’s GitHub-flavored Markdown, GitLab-flavored Markdown, and so on. Since people just write Markdown without caring about the parsers (nor should they), they break too often when you try to move them around. It’s even worse with YAML.

Silphendio

1 month ago

[-]

Both of those flavors are based on the CommonMark spec. There's just some extra features and some forbidden html.

https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

runfaster2000

1 month ago

[-]

C# has this same model:

epage

1 month ago

[-]

> In fact, a code fence need not consist of exactly three backticks or tildes. Any number of backticks or tildes is allowed, as long as that number is at least three

Unfortunately, some markdown implementations don't handle this well. We were looking at using code-fence like syntax in Rust and we were worried about people knowing how to embed it in markdown code fences but bad implementations was the ultimate deal breaker. We switched to `---` instead, making basic cases look like yaml stream separators which are used for frontmatter.

bityard

1 month ago

[-]

`---` is already used in Markdown for horizontal rules?

_ache_

1 month ago

[-]

Yeap, along with `+++`, `**` and mixing if I remember correctly. I don't understand the logic of using an non-standard syntax because some non-standard implementations may not render correctly.

Actually, yes, now you know for a fact that none of the Markdown implementation will render it correctly.

So, I guess, they used `~~~` instead and it was an error in OP post.

nicoburns

1 month ago

[-]

The problem here is that if you use ``` as a token in a non-markdown language, then it's going to be very hard to embed that code in a markdown code block. That problem doesn't happen with other syntax as it's already escaped by the code block. `---` inside a markdown code block will render as a literal `---`.

yencabulator

1 month ago

[-]

To embed content with multiple sequential backticks, use more backticks than the max run.

nicoburns

1 month ago

[-]

In theory, yes. But not all markdown implementations support this properly.

yencabulator

1 month ago

[-]

The CommonMark spec even has an example test case! The excuses for poor implementations are pretty thin.

lilyball

1 month ago

[-]

Not all markdown implementations are CommonMark

yencabulator

1 month ago

[-]

There's not much reason to be anything else than CommonMark + extensions.

lilyball

1 month ago

[-]

For new implementations, sure. But it's harder to change existing implementations (anything not already CommonMark-compatible will introduce unexpected changes to existing content if you switch to CommonMark), and especially for anything that's not being actively developed it's unlikely to ever change.

1 month ago

[-]

I hoped this would have some discussion of the design rather than simply saying how to do it, because I already knew (because it's come up on Stack Overflow / Stack Exchange meta a few times).

esafak

1 month ago

[-]

Markdown is so fragile. When you start combining tables, code blocks, and collapsible sections it falls apart. I wish there was a robust way to emit GFM from something more solid.

themk

1 month ago

[-]

Pandoc? It might help you. If you are programatically generating content, you can emit the JSON intermediate format. If you are hand writing, you can use something more sane like djot.

pyrolistical

1 month ago

[-]

As long as you are not nesting it works pretty well.

For anything more complicated, just give up and use html

data_ders

1 month ago

[-]

TIL about triple curlies! mind blown

trvz

1 month ago

[-]

Markdown assumes the user won’t do anything silly, and I’m fine with that. Rather the people enabling such behaviour are annoying.

rasur

1 month ago

[-]

#+BEGIN_SRC lolcode

blah

#+END_SRC

org-mode to the rescue ;p

1 month ago

[-]

Though it does get a little funny, when you want to write org mode code blocks, that contain actually org mode code. If I recall correctly a "+" at the beginning of the line is then used for escaping?