Cool Things I Learned From Reading The CommonMark Spec For Markdown
About a year ago, I finally enabled a limited set of Markdown syntax in my blog comments. This feature improvement is powered by the Flexmark Java library, which converts Markdown into HTML which I then validate with the OWASP AntiSamy project. Ever since I got that all working, I've been enamored with the idea of switching over my underlying publishing workflow to use Markdown instead of the XStandard WYSIWYG editor. As such, I sat down over the weekend and read the CommonMark Specification, which is the spec that Flexmark is trying to follow. In doing so, I learned some pretty cool stuff about Markdown; or, at least, about CommonMark markdown.
View this code in my Flexmark-0.42.6-With-ColdFusion project on GitHub.
My primary exposure to Markdown comes in the form of "README.md" documents on GitHub. Which means that I know the basic formatting abilities: creating titles, paragraphs, lists, and links. But, I've never really sat down and built a quality mental model for what Markdown can do. As such, the stuff I learned over the weekend may be common knowledge for some; but, for me, it was new and exciting stuff!
Instead of trying to clearly articulate all of the new concepts, I'm just going to put them right into a Markdown document. I've embedded the explanations within embedded HTML comments - which, incidentally, is something that I had no idea I could do in Markdown:
<!--
LESSON: HTML comments can be embedded right in the markdown. This can be used to
interrupt other Markdown constructs. I could also be used to embed meta-data right
into the content to be consumed programmatically after rendering. Example:
-->
<!-- VIDEO: 23vid7f3 -->
<!--
LESSON: Style tags can be embedded right in the markdown.
-->
<style type="text/css">
body {
font-size: 16px ;
}
</style>
<!--
LESSON: Script tags can be embedded right in the markdown.
-->
<script type="text/javascript">
console.log( "Oh chickens!" );
</script>
<script type="text/app-data">
{
id: 12345,
name: "Thing"
}
</script>
<!--
LESSON: Code-fences can use more than 3-ticks to start and end a fenced-block. This
allows for embedded ticks to be used. The end just has to have AT LEAST as many ticks
as the start.
-->
````````txt
This has embedded ``` ticks.
````````
<!--
LESSON: Inline code-spans can use more than 1-tick to start and end a span. This
allows for embedded ticks to be used. The end just to have the same number of ticks
as the start.
-->
This ```an embedded ` tick ``` right here.
<!--
LESSON: Pre tags can be embedded right in the markdown.
-->
<pre>
This markdown will **not be interpreted**.
</pre>
<!--
LESSON: Markdown within HTML Blocks will be interpreted (as markdown) if the inner-
content is separated by a blank line.
-->
<div data-id="container">
This markdown **will be interpreted**!
</div>
<!--
LESSON: Link references can contain optional Title attributes.
-->
[mylink]: https://www.bennadel.com "This is a groovy link title!"
Hey, why don't you [click me][mylink].
<!--
LESSON: Link references can be rendered by label alone. This renders the link label
as the link text.
-->
[BenNadel.com]: https://www.bennadel.com "A blog on things and stuff."
You should really check out [BenNadel.com].
<!--
LESSON: Link references work with images.
--
NOTE: I am also using the Flexmark Attributes Extension to assign WIDTH to the image.
This is not a native part of the CommonMark Specification.
-->
[myimage]: ./goose-duck.jpg "Isn't she great?!"
![Goose Duck][myimage]{width="100"}
<!--
LESSON: List items that are right next to each other will be embedded right in an LI.
However, list items that are separated by a blank line will be embedded in P tags.
-->
* I will be wrapped in a `p` inside an `li`.
* I will be wrapped in a `p` inside an `li`.
<!--
LESSON: HTML comments can interrupt two sibling lists.
-->
* In list one.
* In list one.
<!-- -->
* In list two.
* In list two.
<!--
LESSON: Partial-word emphasis works with **, but not with __.
-->
Holy**chickens**!, this is fan__freakin__tastic!
<!--
LESSON: Absolute links wrapped in < and > will get auto-linked.
-->
Check out <https://www.bennadel.com> for more info.
<!--
LESSON: Two trailing spaces on a line will create a hard break, <br/>.
-->
This will all
be on one
line.
But, this will
be on three
different lines.
<!--
LESSON: Anything that looks like HTML will be kept as-is. If it is inline, it will
wrapped in `p` tags. If it goes across lines, it will be treated as a block element.
-->
Checkout <MyInlineElement></MyInlineElement>.
<!--
CAUTION: If you try to wrap attribute on another line, this gets treated as an inline
element. I am not sure if this is part of the spec; or, if this is just Flexmark.
-->
<MyBlockElement data-id="123" data-value="something">
</MyBlockElement>
<!--
Self-closing tags appear to be treated as block elements if they are defined by
themselves on a SINGLE LINE. Again, I am not sure if this is the spec or just how
Flexmark is implementing it.
-->
<MyBlockElement data-id="123" data-value="something" />
Now, if I load that Markdown content and parse it with the Flexmark 0.42.6 library, I get the following HTML output:
NOTE: I've added a few line-breaks just to separate the individual lessons.
<!--
LESSON: HTML comments can be embedded right in the markdown. This can be used to
interrupt other Markdown constructs. I could also be used to embed meta-data right
into the content to be consumed programmatically after rendering. Example:
-->
<!-- VIDEO: 23vid7f3 -->
<!--
LESSON: Style tags can be embedded right in the markdown.
-->
<style type="text/css">
body {
font-size: 16px ;
}
</style>
<!--
LESSON: Script tags can be embedded right in the markdown.
-->
<script type="text/javascript">
console.log( "Oh chickens!" );
</script>
<script type="text/app-data">
{
id: 12345,
name: "Thing"
}
</script>
<!--
LESSON: Code-fences can use more than 3-ticks to start and end a fenced-block. This
allows for embedded ticks to be used. The end just has to have AT LEAST as many ticks
as the start.
-->
<pre><code class="language-txt">This has embedded ``` ticks.
</code></pre>
<!--
LESSON: Inline code-spans can use more than 1-tick to start and end a span. This
allows for embedded ticks to be used. The end just to have the same number of ticks
as the start.
-->
<p>This <code>an embedded ` tick</code> right here.</p>
<!--
LESSON: Pre tags can be embedded right in the markdown.
-->
<pre>
This markdown will **not be interpreted**.
</pre>
<!--
LESSON: Markdown within HTML Blocks will be interpreted (as markdown) if the inner-
content is separated by a blank line.
-->
<div data-id="container">
<p>This markdown <strong>will be interpreted</strong>!</p>
</div>
<!--
LESSON: Link references can contain optional Title attributes.
-->
<p>Hey, why don't you <a href="https://www.bennadel.com" title="This is a groovy link title!">click me</a>.</p>
<!--
LESSON: Link references can be rendered by label alone. This renders the link label
as the link text.
-->
<p>You should really check out <a href="https://www.bennadel.com" title="A blog on things and stuff.">BenNadel.com</a>.</p>
<!--
LESSON: Link references work with images.
--
NOTE: I am also using the Flexmark Attributes Extension to assign WIDTH to the image.
This is not a native part of the CommonMark Specification.
-->
<p><img src="./goose-duck.jpg" alt="Goose Duck" title="Isn't she great?!" width="100" /></p>
<!--
LESSON: List items that are right next to each other will be embedded right in an LI.
However, list items that are separated by a blank line will be embedded in P tags.
-->
<ul>
<li>
<p>I will be wrapped in a <code>p</code> inside an <code>li</code>.</p>
</li>
<li>
<p>I will be wrapped in a <code>p</code> inside an <code>li</code>.</p>
</li>
</ul>
<!--
LESSON: HTML comments can interrupt two sibling lists.
-->
<ul>
<li>In list one.</li>
<li>In list one.</li>
</ul>
<!-- -->
<ul>
<li>In list two.</li>
<li>In list two.</li>
</ul>
<!--
LESSON: Partial-word emphasis works with **, but not with __.
-->
<p>Holy<strong>chickens</strong>!, this is fan__freakin__tastic!</p>
<!--
LESSON: Absolute links wrapped in < and > will get auto-linked.
-->
<p>Check out <a href="https://www.bennadel.com">https://www.bennadel.com</a> for more info.</p>
<!--
LESSON: Two trailing spaces on a line will create a hard break, <br/>.
-->
<p>This will all
be on one
line.</p>
<p>But, this will<br />
be on three<br />
different lines.</p>
<!--
LESSON: Anything that looks like HTML will be kept as-is. If it is inline, it will
wrapped in `p` tags. If it goes across lines, it will be treated as a block element.
-->
<p>Checkout <MyInlineElement></MyInlineElement>.</p>
<!--
CAUTION: If you try to wrap attribute on another line, this gets treated as an inline
element. I am not sure if this is part of the spec; or, if this is just Flexmark.
-->
<MyBlockElement data-id="123" data-value="something">
</MyBlockElement>
<!--
Self-closing tags appear to be treated as block elements if they are defined by
themselves on a SINGLE LINE. Again, I am not sure if this is the spec or just how
Flexmark is implementing it.
-->
<MyBlockElement data-id="123" data-value="something" />
This is really cool stuff! When I first started entertaining the idea of updating my home-grown blogging platform to using Markdown, I was worried that Markdown wouldn't be sufficiently expressive. But, I'm already seeing that it provides me with just about every hook that I need. And, the most glaring omission - being able to add CSS classes and HTML Attributes - is something that I can easily enable with the Attributes Extension in Flexmark.
Migrating from my WYSIWYG editor to Markdown seems completely feasible!
For completeness, I'll share the ColdFusion code that is being used to power this demo. First, I have to setup the JAR file mappings and create the JavaLoader for my ColdFusion application:
component
output = false
hint = "I provide the application settings and event handlers."
{
// Define the application.
this.name = hash( getCurrentTemplatePath() );
this.applicationTimeout = createTimeSpan( 0, 0, 10, 0 );
this.sessionManagement = false;
// Setup the application mappings.
this.directory = getDirectoryFromPath( getCurrentTemplatePath() );
this.mappings[ "/" ] = this.directory;
this.mappings[ "/flexmark" ] = ( this.directory & "vendor/flexmark-0.42.6/" );
this.mappings[ "/javaloader" ] = ( this.directory & "vendor/javaloader-1.2/javaloader/" );
this.mappings[ "/javaloaderfactory" ] = ( this.directory & "vendor/javaloaderfactory/" );
// ---
// PUBLIC METHODS.
// ---
/**
* I initialize the application.
*
* @output false
*/
public boolean function onApplicationStart() {
// In order to prevent memory leaks, we're going to use the JavaLoaderFactory to
// instantiate our JavaLoader. This will keep the instance cached in the Server
// scope so that it doesn't have to continually re-create it as we test our
// application configuration.
application.javaLoaderFactory = new javaloaderfactory.JavaLoaderFactory();
// Create a JavaLoader that can access the Flexmark 0.42.6 JAR files.
// --
// NOTE: This list of JAR files contains the CORE Flexmark functionality plus
// the Attributes extension. Flexmark is configured such that each extension is
// packaged as a separate, optional set of JAR files.
application.flexmarkJavaLoader = application.javaLoaderFactory.getJavaLoader([
expandPath( "/flexmark/flexmark-0.42.6.jar" ),
expandPath( "/flexmark/flexmark-ext-attributes-0.42.6.jar" ),
expandPath( "/flexmark/flexmark-formatter-0.42.6.jar" ),
expandPath( "/flexmark/flexmark-util-0.42.6.jar" )
]);
// Indicate that the application has been initialized successfully.
return( true );
}
}
Then, within my index-file, I create an instance of the Flexmark core library, read in the Markdown file, parse it, and then output the HTML to the response:
<cfscript>
// Read-in our markdown file.
markdown = fileRead( expandPath( "./content.md" ) );
// Create some of our Class definitions. We need this in order to access some static
// methods and properties.
AttributesExtensionClass = application.flexmarkJavaLoader.create( "com.vladsch.flexmark.ext.attributes.AttributesExtension" );
HtmlRendererClass = application.flexmarkJavaLoader.create( "com.vladsch.flexmark.html.HtmlRenderer" );
ParserClass = application.flexmarkJavaLoader.create( "com.vladsch.flexmark.parser.Parser" );
// Create our options instance - this dataset is used to configure both the parser
// and the renderer.
options = application.flexmarkJavaLoader.create( "com.vladsch.flexmark.util.options.MutableDataSet" ).init();
// Define the extensions we're going to use. In this case, the only extension that
// I want to add is the Attributes Extension. This allows me to use {...} postfix
// syntax in order to append attributes to the preceding element.
// --
// NOTE: If you want to add more extensions, you will need to download more JAR files
// and add them to the JavaLoader class paths.
options.set(
ParserClass.EXTENSIONS,
[
AttributesExtensionClass.create()
]
);
// Create our parser and renderer - both using the options.
// --
// NOTE: In the demo, I'm re-creating these on every page request. However, in
// production I would probably cache both of these inside of some Abstraction
// (such as MarkdownParser.cfc) which would, in turn, get cached inside the
// application scope.
parser = ParserClass.builder( options ).build();
renderer = HtmlRendererClass.builder( options ).build();
// Parse the markdown into an AST (Abstract Syntax Tree) document node.
document = parser.parse( javaCast( "string", markdown ) );
// Render the AST (Abstract Syntax Tree) document into an HTML string.
html = renderer.render( document );
</cfscript>
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>
Using Flexmark 0.42.6 To Parse Markdown Into HTML in ColdFusion
</title>
</head>
<body>
<h1>
Using Flexmark 0.42.6 To Parse Markdown Into HTML in ColdFusion
</h1>
<h2>
Rendered Output:
</h2>
<hr />
<cfoutput>#html#</cfoutput>
<hr />
<h2>
Rendered Markup:
</h2>
<pre class="language-html"
><code class="language-html"
><cfoutput>#encodeForHtml( html )#</cfoutput
></code
></pre>
<!-- For our fenced code-block syntax highlighting. -->
<link rel="stylesheet" type="text/css" href="./vendor/prism-1.14.0/prism.css" />
<script type="text/javascript" src="./vendor/prism-1.14.0/prism.js"></script>
</body>
</html>
This post was primarily a note-to-self; but, hopefully you learned something new and excited about Markdown. Or, at least, about the CommonMark Markdown specification. These features won't be available everywhere. And, some context, like GitHub, use a different set of Markdown rules altogether.
Want to use code from this post? Check out the license.
Reader Comments
@All,
I had to rename the
content.md
to becontent.md.txt
. It seems that if you render a GitHub gist for a Markdown file, the gist actually tries to render the interpreted content, not the raw content.@All,
As a follow-up, I wanted to noodle on some ways to embed more "widget" type content into the Markdown. Flexmark does provide a way to extend the Parser / Renderer; but, that's outside my experience level and requires writing some hefty Java classes. As such, I am falling-back to using some RegEx in a post-processing step:
www.bennadel.com/blog/3616-considering-ways-to-embed-widgets-in-my-markdown-using-flexmark-0-42-6-and-coldfusion.htm
This way, I can convert something like:
... into something like:
It's not a perfectly elegant solution; but, I think it will give me what I need.