Finding Shallow HTML Comment Nodes In The DOM Using TreeWalker

By Ben Nadel

Published 2014-04-14 in JavaScript / DHTML — Comments (2)

The other day, I starting playing around with the TreeWalker API as a way to iterate over HTML comment nodes contained within a given DOM (Document Object Model) node. When I first started tinkering, it didn't look like there was any way to perform a "shallow" search (ie, only look at child nodes). However, I now realize that if I widen my net of node types, I can perform a shallow search for comment nodes.

In my first approach, I was only looking at comment nodes. This made it difficult to constrain the search to the immediate children of the root node. Sure, I could use the filter() method to reject any comment node whose parent was not the root node; but, that would still require iterating over a deep-search of the comments, which felt sub-optimal.

The key to a shallow search for comment nodes does require filtering; but, it also requires a looser search. Instead of just searching for comment nodes, I have to search for both comment nodes and element nodes. Then, I need to use the filter method to skip over any element nodes (and thereby prevent the TreeWalker from following deep tree branches).

As you're filtering nodes in the TreeWalker, there are actually two different forms of "skip":

FILTER_SKIP - Value to be returned by NodeFilter.acceptNode() for nodes to be skipped by the NodeIterator or TreeWalker object. The children of skipped nodes are still considered. This is treated as "skip this node but not its children".
FILTER_REJECT - Value to be returned by the NodeFilter.acceptNode() method when a node should be rejected. The children of rejected nodes are not visited by the NodeIterator or TreeWalker object; this value is treated as "skip this node and all its children".

Notice that "Reject" will prevent the TreeWalker from going down into a given element node. We can use this to our benefit; if we start searching for comment nodes and element nodes, but reject all elements, it will keep the search shallow and will only find comments.

To see this in action, take a look at the following code:

<!doctype html>
<html>
<head>
	<meta charset="utf-8" />

	<title>
		Finding Shallow Comment Nodes In The DOM Using TreeWalker
	</title>
</head>
<body>

	<h1>
		Finding Shallow Comment Nodes In The DOM Using TreeWalker
	</h1>

	<!-- Comment 1: In the Body. -->

	<div>

		<!-- Comment 2: In a nested Div. -->

		<div>

			<!-- Comment 3: In a double-nested Div. -->

		</div>

	</div>

	<!-- Comment 4: Back up in that Body. -->


	<script type="text/javascript" src="../../vendor/jquery/jquery-2.0.3.min.js"></script>
	<script type="text/javascript">

		if ( ! document.createTreeWalker ) {

			throw( new Error( "Browser does not support createTreeWalker()." ) );

		}


		// I find the comment nodes in the given root node.
		function findComments( rootNode, isDeep ) {

			// I filter the nodes as they encountered by the TreeWalker.
			function filter( node ) {

				// Always accept comments.
				if ( node.nodeType === 8 ) {

					return( NodeFilter.FILTER_ACCEPT );

				}

				// If the search is Deep, then simply skip this Element node.
				if ( isDeep ) {

					return( NodeFilter.FILTER_SKIP );

				}

				// If the search is Shallow, then reject this element node. This will
				// skip the current element node AND the entire sub-tree contained
				// within this element node.
				return( NodeFilter.FILTER_REJECT );

			}

			// IE and other browsers differ in how the filter method is passed into the
			// TreeWalker. Mozilla takes an object with an "acceptNode" key. IE takes the
			// filter method directly. To work around this difference, we will define the
			// acceptNode function a property of itself.
			filter.acceptNode = filter;

			// When creating the TreeWalker, we want to look at both the comment nodes
			// and the element nodes. Even for a shallow search, we need the element
			// nodes in order to provide a way to skip any nested DOM branches.
			// --
			// NOTE: The last argument is a deprecated, optional parameter. However,
			// in IE, the argument is not optional and therefore must be included.
			var treeWalker = document.createTreeWalker(
				rootNode,
				( NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_ELEMENT ),
				filter,
				false
			);

			var comments = [];

			// Collect the comments.
			while ( treeWalker.nextNode() ) {

				comments.push( treeWalker.currentNode );

			}

			return( comments );

		}


		// -------------------------------------------------- //
		// -------------------------------------------------- //


		// Find all the comments that are children of the Body tag.
		var comments = findComments( document.body, false );

		$( comments ).each(
			function() {

				$( "<p></p>" )
					.text( this.nodeValue )
					.insertAfter( this )
				;

			}
		);

	</script>

</body>
</html>

As you can see, we configure the TreeWalker to find both comment nodes and element nodes. Of course, we accept all comment nodes and skip all element nodes; but, if we're doing a shallow search, we full-on "reject" element nodes to prevent a deep walk of the DOM.

So, performing a shallow search for comment nodes is possible with the TreeWalker. But, I'm not sure I would ever use it for this [shallow searching]. There's a lot of cruft here and a lot of logic and the overhead of calling a function on each encountered node. And, when compared to simply grabbing the child nodes and plucking the comments (what jQuery.fn.comments() does), I'm not sure the TreeWalker represents a "win" in this context.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/2608

Reader Comments

Edward J Beckett Dec 13, 2014 at 4:08 PM

96 Comments

@Ben..

Now this is some great stuff .... Pure JS DOM traversal feels a bit cleaner than using a library... (a lot more boilerplate... but cool to do ...)

Edward J Beckett Dec 13, 2014 at 7:53 PM

96 Comments

@Ben...

Moreover... TreeWalker is very fast... Here's a jsperf comparing jQuery's remove to a native implementation... The TreeWalker implementation is much faster...

http://jsperf.com/treewalker-remove

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.