lol this is a great wording for something I've not been able to express before
I sometimes wonder... is it Markdown's specification chaos the reason for its success? Maybe it was just barely enough spec to be usable but also small enough to allow anyone to make an implementation that seemed right. No qualifications to fail. Thus, it proliferated.
The xkcd[1] problem is a darn shame, though. At least CommonMark exists for people who want to point to a "Standard"
Yeah I ultimately can't hate markdown, but it really was just specified more or less as "whatever markdown.pl does", and markdown.pl was not exactly the most rigorously engineered thing. Even bbcode of all things has more predictable structure to it. The commonmark/pandoc guy now has Djot, which is supposed to be a bit more sane, but I get the feeling it's probably too late :-/
Markdown is definitely a case of “worse is better” and it helped that it was half-canonicalizing ASCII formatting workarounds that had been in common use for decades.
My feeling overall is that I can't get into flow writing Markdown, there are just enough things wrong that I never feel completely comfortable while doing it.
It seems that in the HTML 5 age there is some subset of HTML which should be completely satisfying for anyone. Maybe it is custom components that work like JSX (e.g. <footnote>) or something like tailwind. Editing HTML with one eye on a live view is more pleasant for me than anything else. Every kind of rich editor that looks like Microsoft Word (esp. Word!) comes across as a dull tool where selections, navigation, and applying styles almost work. There's got to be some kind of conceptual problem at the root of it all that makes fixing it like pushing around a bubble under the rug. I want to believe in Dreamweaver but 2-second latency to process keystrokes on AMD's best CPU from 2 years ago and the incredulous attitude Adobe support has about the problem makes it a non-started [1]
[1] if I ran an OS failing to update the UI in 0.2 sec gives an immediate kill -9 and telemetry of the event will get you dropped out of the app store not much later. I'm not saying rendering has to be settled in 0.2 sec but there has to be some response that feels... responsive.
I might be able to use this, especially in LLMs where I ask them to give me things in code fences all the time. If I ask for markdown in a code fence, it all falls apart. If, however, I asked for markdown in a ~~~ code fence, or even ~~~~~, all would be right with the world, since they typically use ```.
All this complication seems to stem from the simple fact, that the fences don't have a recognizably distinct start and end marker. It's all "`" or "~", instead of one symbol at the start and another, different symbol at the end. And then going into the different numbers of backticks or tildes. Why add such ambiguity, that will only make it harder to parse things correctly? This immediately raises the question: "What if I start a block with 4 backticks and end it with 5?"
All these complications would have been avoidable with a more thought through design/better choices of symbols. For example one could have used brackets:
[[[lang
code here
]]]
And if one wanted to nest it, it should automatically work:
[[[html
html code
[[[css
css code
]]]
[[[js
js code
]]]
html code
]]]
In case one wants to output literally "[[[" one could escape it using backslash, as usual in many languages.
In a parser that would be much simpler to parse. It is kind of like parsing S-expressions. There is no need for 4 backticks, 5, or any higher number. I don't want to sit there counting backticks in the document, to know what part of a nested code block some code belongs to. It's a silly design.
Your solution for the problem described here is to escape with a different character. MD's is to add more special characters. Both are valid and exist in other languages, I wouldn't qualify one as better thought than the other - though since we're talking about text that I don't want modified, if I prefer adding ticks rather than going into the text and escaping them one by one.
The complication doesn't stem from lack of distinct start and end, what you are trying to solve for here, is when you have multiple languages in a single block, and want pretty colors on each. Seeing that HTML doesn't support imbrication of pre tags (or rather doesn't render one embedded in the next), that would probably not work without producing something that is not pure html.
> In a parser that would be much simpler to parse
Parsing a variable number of ` is not more complex than looking ahead for a closing boundary. In fact, once you introduce escaping characters, you need to handle escaping of the escaping character, which is slightly more complex.
The syntax highlighting of the code of each language itself is not the problem. This post is about markdown. A typical markdown parser doesn't do syntax highlighting for code blocks. That's usually done by some other library, like for example pygments. The issue is about markdown syntax. What happens on another language's level does not concern the markdown parser.
> In case one wants to output literally "[[[" one could escape it using backslash, as usual in many languages.
Sometimes you want to paste a large region of code into a code block, and escaping the content is harder than fixing and start and end delimiters. This matters particularly in Markdown, where embedding large regions of code or text is common, whereas other languages you’d put it in its own file.
So I still suggest the ability to change the number of open and close brackets. Then you’ll also need an implicit newline or other way to distinguish content that starts with an open bracket.
Indeed! Last time I dealt with this exact problem in a toy application made for myself, I ended up making the markdown parser only read ```$LANG syntax, and making it assume just ``` is a closing tag, not accepting it as a opening tag. Made it easier for the pretty syntax formatter to do it's job too, as it no longer has to figure out the language.
I realize that it would be somewhat antithetical for markdown, but I increasingly feel that length-prefixing everything makes lot of stuff easier at pretty low cost. Anything depending on delimiters or start/end tags inevitably ends up with difficult quoting rules or some other awkward scheme (like seen here).
> In fact, a code fence need not consist of exactly three backticks or tildes. Any number of backticks or tildes is allowed, as long as that number is at least three
Unfortunately, some markdown implementations don't handle this well. We were looking at using code-fence like syntax in Rust and we were worried about people knowing how to embed it in markdown code fences but bad implementations was the ultimate deal breaker. We switched to `---` instead, making basic cases look like yaml stream separators which are used for frontmatter.
The problem here is that if you use ``` as a token in a non-markdown language, then it's going to be very hard to embed that code in a markdown code block. That problem doesn't happen with other syntax as it's already escaped by the code block. `---` inside a markdown code block will render as a literal `---`.
Yeap, along with `+++`, `**` and mixing if I remember correctly.
I don't understand the logic of using an non-standard syntax because some non-standard implementations may not render correctly.
Actually, yes, now you know for a fact that none of the Markdown implementation will render it correctly.
So, I guess, they used `~~~` instead and it was an error in OP post.
Markdown's parser seems to be a fascinating anomaly: a specification that consists entirely of exceptions and corner cases.
lol this is a great wording for something I've not been able to express before
I sometimes wonder... is it Markdown's specification chaos the reason for its success? Maybe it was just barely enough spec to be usable but also small enough to allow anyone to make an implementation that seemed right. No qualifications to fail. Thus, it proliferated.
The xkcd[1] problem is a darn shame, though. At least CommonMark exists for people who want to point to a "Standard"
[1] https://xkcd.com/927/
Yeah I ultimately can't hate markdown, but it really was just specified more or less as "whatever markdown.pl does", and markdown.pl was not exactly the most rigorously engineered thing. Even bbcode of all things has more predictable structure to it. The commonmark/pandoc guy now has Djot, which is supposed to be a bit more sane, but I get the feeling it's probably too late :-/
Markdown is definitely a case of “worse is better” and it helped that it was half-canonicalizing ASCII formatting workarounds that had been in common use for decades.
… except its link syntax. That is an abomination that had never existed and should never have existed.
My feeling overall is that I can't get into flow writing Markdown, there are just enough things wrong that I never feel completely comfortable while doing it.
It seems that in the HTML 5 age there is some subset of HTML which should be completely satisfying for anyone. Maybe it is custom components that work like JSX (e.g. <footnote>) or something like tailwind. Editing HTML with one eye on a live view is more pleasant for me than anything else. Every kind of rich editor that looks like Microsoft Word (esp. Word!) comes across as a dull tool where selections, navigation, and applying styles almost work. There's got to be some kind of conceptual problem at the root of it all that makes fixing it like pushing around a bubble under the rug. I want to believe in Dreamweaver but 2-second latency to process keystrokes on AMD's best CPU from 2 years ago and the incredulous attitude Adobe support has about the problem makes it a non-started [1]
[1] if I ran an OS failing to update the UI in 0.2 sec gives an immediate kill -9 and telemetry of the event will get you dropped out of the app store not much later. I'm not saying rendering has to be settled in 0.2 sec but there has to be some response that feels... responsive.
What’s wrong with the link syntax and what would an alternative be?
This is also how you handle adding code blocks in GitHub comment suggestions, fwiw.
what if you want to show ````? should you add ````` tags then?
Yes, this is also how JupyterBook [1] does it (I think v1 uses Myst Markdown parser). I found this to work excellent!
[1]: https://jupyterbook.org/
I might be able to use this, especially in LLMs where I ask them to give me things in code fences all the time. If I ask for markdown in a code fence, it all falls apart. If, however, I asked for markdown in a ~~~ code fence, or even ~~~~~, all would be right with the world, since they typically use ```.
I was debugging code last night that uses ~~~ as a delimiter in a string. At least, as you say, you can go crazy and use ~~~~~ to get around it.
I love hacker news! You learn something useful here and there.
I always used html elements like <pre /> and <code /> to go around this in the past
I will use it as a rendering benchmarking for mdview.io https://mdview.io/#mdv=N4IgbiBcCMA0IBMCGAXJUTADrhzWOAtgnjgMI...
I faced this problem when designing my own notation [1].
Solved it by surrounding code with more ticks than maximum number of consecutive ticks inside its text. This allows arbitrary nesting.
Postgres solves it by using `$something$ whatever $something$` [2].
[1] https://github.com/PratikDeoghare/brashtag [2] https://www.postgresql.org/docs/current/sql-syntax-lexical.h...
All this complication seems to stem from the simple fact, that the fences don't have a recognizably distinct start and end marker. It's all "`" or "~", instead of one symbol at the start and another, different symbol at the end. And then going into the different numbers of backticks or tildes. Why add such ambiguity, that will only make it harder to parse things correctly? This immediately raises the question: "What if I start a block with 4 backticks and end it with 5?"
All these complications would have been avoidable with a more thought through design/better choices of symbols. For example one could have used brackets:
And if one wanted to nest it, it should automatically work: In case one wants to output literally "[[[" one could escape it using backslash, as usual in many languages.In a parser that would be much simpler to parse. It is kind of like parsing S-expressions. There is no need for 4 backticks, 5, or any higher number. I don't want to sit there counting backticks in the document, to know what part of a nested code block some code belongs to. It's a silly design.
Your solution for the problem described here is to escape with a different character. MD's is to add more special characters. Both are valid and exist in other languages, I wouldn't qualify one as better thought than the other - though since we're talking about text that I don't want modified, if I prefer adding ticks rather than going into the text and escaping them one by one.
The complication doesn't stem from lack of distinct start and end, what you are trying to solve for here, is when you have multiple languages in a single block, and want pretty colors on each. Seeing that HTML doesn't support imbrication of pre tags (or rather doesn't render one embedded in the next), that would probably not work without producing something that is not pure html.
> In a parser that would be much simpler to parse
Parsing a variable number of ` is not more complex than looking ahead for a closing boundary. In fact, once you introduce escaping characters, you need to handle escaping of the escaping character, which is slightly more complex.
The syntax highlighting of the code of each language itself is not the problem. This post is about markdown. A typical markdown parser doesn't do syntax highlighting for code blocks. That's usually done by some other library, like for example pygments. The issue is about markdown syntax. What happens on another language's level does not concern the markdown parser.
That's exactly my point, the solution you're discussing is about something else, and not relevant to what's discussed in this post.
So if syntax highlighting isn't a problem. The standard way of presenting block of code in Markdown is to indent it.
Which is quick and easy to understand.
> In case one wants to output literally "[[[" one could escape it using backslash, as usual in many languages.
Sometimes you want to paste a large region of code into a code block, and escaping the content is harder than fixing and start and end delimiters. This matters particularly in Markdown, where embedding large regions of code or text is common, whereas other languages you’d put it in its own file.
So I still suggest the ability to change the number of open and close brackets. Then you’ll also need an implicit newline or other way to distinguish content that starts with an open bracket.
Indeed! Last time I dealt with this exact problem in a toy application made for myself, I ended up making the markdown parser only read ```$LANG syntax, and making it assume just ``` is a closing tag, not accepting it as a opening tag. Made it easier for the pretty syntax formatter to do it's job too, as it no longer has to figure out the language.
Do you realize that your solution is basically to use a tag, which is why Markdown have been developed, to not use them.
The classic way in markdown to insert block of code is to indent the code.
I realize that it would be somewhat antithetical for markdown, but I increasingly feel that length-prefixing everything makes lot of stuff easier at pretty low cost. Anything depending on delimiters or start/end tags inevitably ends up with difficult quoting rules or some other awkward scheme (like seen here).
Markdown assumes the user won’t do anything silly, and I’m fine with that. Rather the people enabling such behaviour are annoying.
TIL about triple curlies! mind blown
> In fact, a code fence need not consist of exactly three backticks or tildes. Any number of backticks or tildes is allowed, as long as that number is at least three
Unfortunately, some markdown implementations don't handle this well. We were looking at using code-fence like syntax in Rust and we were worried about people knowing how to embed it in markdown code fences but bad implementations was the ultimate deal breaker. We switched to `---` instead, making basic cases look like yaml stream separators which are used for frontmatter.
`---` is already used in Markdown for horizontal rules?
The problem here is that if you use ``` as a token in a non-markdown language, then it's going to be very hard to embed that code in a markdown code block. That problem doesn't happen with other syntax as it's already escaped by the code block. `---` inside a markdown code block will render as a literal `---`.
Yeap, along with `+++`, `**` and mixing if I remember correctly. I don't understand the logic of using an non-standard syntax because some non-standard implementations may not render correctly.
Actually, yes, now you know for a fact that none of the Markdown implementation will render it correctly.
So, I guess, they used `~~~` instead and it was an error in OP post.
#+BEGIN_SRC lolcode
blah
#+END_SRC
org-mode to the rescue ;p