jSoup Error: Index Out Of Bounds For Length
Over on my Feature Flags Book site, I'm starting to move some of the content behind a pay-wall; and, to do this, I'm using jSoup to replace multiple content paragraphs with a single purchase notice paragraph within designated chapters. However, in my first approach to this algorithm, I was getting the following jSoup error:
Index 1 out of bounds for length 0
The error isn't terribly helpful; but, I believe what's happening here is that when I remove an element from the jSoup DOM (Document Object Model) using an .empty()
call, jSoup is not breaking the parent-child relationship to the removed elements. Which is then causing an issue when I go to re-append the removed elements back into the same parent.
I can reproduce this error with a simple jSoup demo using this HTML document:
<body>
<p>jSoup + ColdFusion = Noice!</p>
</body>
To reproduce the error with ColdFusion (Lucee CFML), I'm going to .empty()
the body
and then re-append the single p
element:
<cfscript>
body = javaNew( "org.jsoup.Jsoup" )
.parseBodyFragment( fileRead( "./content.htm" ) )
.body()
;
paragraph = body.firstElementChild();
// Remove all the children from the BODY and then try to re-add the paragraph.
body
.empty()
.appendChild( paragraph )
;
// Output resultant HTML to the page.
echo( body.outerHtml() );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I create a new Java class wrapper using the jSoup JAR files.
*/
public any function javaNew( required string className ) {
var jarPaths = [
expandPath( "./jsoup-1.16.1.jar" )
];
return( createObject( "java", className, jarPaths ) );
}
</cfscript>
And, when we run this ColdFusion code, we get the following error:
Index 1 out of bounds for length 0
For anyone Googling to get here, this is the stacktrace that I get:
lucee.runtime.exp.NativeException: Index 1 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:372)
at java.base/java.util.ArrayList.remove(ArrayList.java:536)
at org.jsoup.helper.ChangeNotifyingArrayList.remove(ChangeNotifyingArrayList.java:37)
at org.jsoup.nodes.Node.removeChild(Node.java:504)
at org.jsoup.nodes.Node.setParentNode(Node.java:482)
at org.jsoup.nodes.Node.reparentChild(Node.java:563)
at org.jsoup.nodes.Element.appendChild(Element.java:577)
To fix this error, we need to call .remove()
on the p
element before we try to re-append it to the body
:
<cfscript>
body = javaNew( "org.jsoup.Jsoup" )
.parseBodyFragment( fileRead( "./content.htm" ) )
.body()
;
paragraph = body.firstElementChild();
// In order to re-append the paragraph back into the document, we have to first BREAK
// THE PARENT RELATIONSHIP to the body. We can do that by calling removing() on the
// paragraph itself.
paragraph.remove();
// Remove all the children from the BODY and then try to re-add the paragraph.
body
.empty() // Remove any remaining non-element nodes (ex, comments).
.appendChild( paragraph )
;
// Output resultant HTML to the page.
echo( body.outerHtml() );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I create a new Java class wrapper using the jSoup JAR files.
*/
public any function javaNew( required string className ) {
var jarPaths = [
expandPath( "./jsoup-1.16.1.jar" )
];
return( createObject( "java", className, jarPaths ) );
}
</cfscript>
The only difference in this version of the code is that I'm calling paragraph.remove()
before adding the node back into the DOM. Whatever this is doing behind the scenes, it is properly breaking the parent-child relationship in a way that calling .empty()
does not.
ASIDE: Some jSoup methods, like
.children()
, return an Array ofElement
nodes calledElements
. This array has its own.remove()
method that will call.remove()
on all of the nodes in the collection.
I don't know enough about jSoup — or the intention of these methods — in order to call this a "bug"; but, I will say that it seems unexpected to me. In fact, I would expect an .empty()
method to be little more than a short-hand implementation for looping over all the child-nodes and calling .remove()
on them in turn.
Want to use code from this post? Check out the license.
Reader Comments
Thanks Ben, good catch! I have fixed this in jsoup and it'll be in the next release (1.16.2).
See bug #2013.
Please do feel free to raise issues directly on the jsoup tracker -- whether it's a hardline "bug" or just a rough edge, am always happy for feedback.
@Jonathan,
Wow, thanks for knocking that out! 🔥 As I was looking at the stacktrace, I saw a number of core Java calls, so I wasn't sure if this was something in jSoup itself, or something in the way Java's
ArrayList
worked. Glad to see it was only simplefor
-loop change on your end.jSoup is awesome! I'm using it more and more these days. 💪
Hi Ben
Just thinking off the top of my head, so this could be a completely idiotic suggestion, but couldn't you just create a deep copy of that object, like:
Example:
@Charles,
To be honest, I don't know how
duplicate()
plays with Java objects. We're consuming this stuff in ColdFusion; but, the jSoup library is ultimately a Java library; and, I'm not sure how "deep" the "duplicate" logic will run. Meaning, if the issue we have here is with parent-child pointers being left in place, it's very possible thatduplicate()
will just copy-over the same pointers into the new structure.Now, that said, it does look like jSoup has a deep
clone()
method. So, it's very possible that this does exactly what you are suggesting it would do. I'd be curious to see if this would have an effect - I'm assuming it would.All good thoughts!
Great article, Ben! Your detailed exploration of the jSoup error and the fix is incredibly helpful. It's not uncommon to encounter such issues while working with libraries like jSoup, and your solution will undoubtedly save others a lot of troubleshooting time. It's also fantastic to see how responsive the jSoup team is to address issues promptly. Keep up the great work, and thanks for sharing your insights and solutions with the community! 👍🔥
A fix for this exact issue (1.16.2) was released last night.
https://jsoup.org/news/release-1.16.2
Your blog is also listed in the notes:
https://github.com/jhy/jsoup/issues/2013
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →