Ask Ben: Finding XML Nodes That Have Children With The Given Case-Insensitive Phrase
Posted February 11, 2009 at 10:01 AM
Okay, so how about this one. (BTW I love that I found this site and can ask all my stupid questions). I am bringing in an RSS feed in XML and parse it. Now I want to pull only the articles that pertain to some keyword. Like oh say COLDFUSION. .... What I want to do is pull only the articles with the search term in the title. I can do this of course by looping over the xml but is it possible with XPATH? I'm betting it is but I have just started into XPATH, XSLT, XSQL for Oracle. Does "IN" or "CONTAINS" work?
Yes, XPath does have a contains() method and is, in fact, the way we are going to find your RSS feed items (at least initially). First, though, let's build a test XML Feed structure:
Launch code in new window » Download code as text file »
- <!--- Define the XML feed. --->
- <cfxml variable="xmlFeed">
-
- <items>
- <item>
- <title>I Love ColdFusion</title>
- <description>ColdFusion is amazing!</description>
- <link>http://www.bennadel.com</link>
- </item>
- <item>
- <title>I Want To Swim In A Pudding Bath</title>
- <description>Author talks about why it would be awesome to swim around in a bathtub full of pudding.</description>
- <link>http://www.bennadel.com</link>
- </item>
- <item>
- <title>I Think ColdFusion Knocked Up My Daughter</title>
- <description>Author described a conspiracy theory in which he things his ColdFusion application server impregnated his daughter in an attempt to spawn a race of super humans with amazing back-end processing!</description>
- <link>http://www.bennadel.com</link>
- </item>
- <item>
- <title>Christina Cox Is A Hottie</title>
- <description>Author talks about actress Christina Cox and what makes her such a hottie.</description>
- <link>http://www.bennadel.com</link>
- </item>
- <item>
- <title>COLDFusion Is So Hot!</title>
- <description>Author describes what make ColdFusion such a hot technology.</description>
- <link>http://www.bennadel.com</link>
- </item>
- </items>
-
- </cfxml>
As you can see here, some of the Title tags contain "ColdFusion", some of them do not. Now, we don't want to find the Title tag, right? What we want to do is find the Item node that has the child node, Title, whose text value contains the phrase ColdFusion. To do this, we can leverage the power of XPath predicates (statements that must evaluate to true for a node to be returned in an XmlSearch() result set):
//item[ contains( title/text() , 'ColdFusion' ) ]
Here, the "//item" is telling us to get all the item nodes anywhere within the document. Then our conditional search predicate:
[ contains( title/text() , 'ColdFusion' ) ]
... requires that the given node being examined (item) must have a title child tag whose text() value contains the phrase "ColdFusion". Fairly straightforward, right. Let's put this into action:
Launch code in new window » Download code as text file »
- <!---
- Get all ITEM nodes that have a Title child whose text
- value (text()) contains the text "ColdFusion".
- --->
- <cfset arrItemNodes = XmlSearch(
- xmlFeed,
- "//item[ contains( title/text() , 'ColdFusion' ) ]"
- ) />
-
- <!--- Output the node titles. --->
- <cfloop
- index="xmlItemNode"
- array="#arrItemNodes#">
-
- #xmlItemNode.Title.XmlText#<br />
-
- </cfloop>
When we run this code, we get the following output:
I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter
It sort of worked - it did find two correct items, but it missed this one:
COLDFusion Is So Hot!
The problem here is that XML and XPath, unlike ColdFusion itself, is very much case-sensitive. Where as in ColdFusion, "ColdFusion" is equal to "COLDFusion", XPath and XmlSearch() see these as two distinct values.
So, what can we do about this? Well, if you look at the library of XPath functions, you will see that it does have methods for converting values to upper or lower case:
- lower-case()
- upper-case()
This would be great, but the problem you will quickly find if you try to use them is that these methods have not been implemented as of ColdFusion 8's XPath / XmlSearch() engine. So, what can we do if we want to start performing case-insensitive searches? I don't think there's any one correct answer for this, so I'll just share the first thing that popped into my mind.
What we can do is create a lowercase version of the title text and store it back into the XML document in a way that 1) doesn't ruin the content for further use and 2) can be searched on using XPath and XmlSearch(). To do this, what I'm going to do is loop over the title tags and store the lowercase title as an attribute back into the title tag itself. Then, once that is done, I am going to perform the XPath search again using the title tag's "lcase" attribute rather than the XML Text value:
Launch code in new window » Download code as text file »
- <!--- Gather all of the title nodes. --->
- <cfset arrTitleNodes = XmlSearch(
- xmlFeed,
- "//item/title/"
- ) />
-
- <!---
- Loop over each title and store a lowercase attribute of
- its value that can be searched on in a case-insensitive
- manner.
- --->
- <cfloop
- index="xmlTitleNode"
- array="#arrTitleNodes#">
-
- <!--- Store lowercase text in to attribute. --->
- <cfset xmlTitleNode.XmlAttributes[ "lcase" ] = LCase(
- XmlFormat( xmlTitleNode.XmlText )
- ) />
-
- </cfloop>
-
-
- <!---
- Get all ITEM nodes that have a Title child whose LCASE
- attribute contains the lowercase "coldfusion" value.
- --->
- <cfset arrItemNodes = XmlSearch(
- xmlFeed,
- "//item[ contains( title/@lcase, 'coldfusion' ) ]"
- ) />
-
- <!--- Output the node titles. --->
- <cfloop
- index="xmlItemNode"
- array="#arrItemNodes#">
-
- #xmlItemNode.Title.XmlText#<br />
-
- </cfloop>
Notice that this time, we are searching for "coldfusion," not "ColdFusion." There's a little bit more overhead here, but now, when we run this code, we get the following output:
I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter
COLDFusion Is So Hot!
With the aide of this lowercase attribute, we are successfully finding all case-versions of ColdFusion.
Of course, if we are going to loop over the Title tags, we might as well just perform the text search using ColdFusion and grab the appropriate nodes in the first pass. In the following code, as we loop over the Title tags, we are going to perform a case-insensitive ColdFusion text search. If the title has the right text, we are going to grab its parent node, the target Item node, and add it to our array of matching nodes:
Launch code in new window » Download code as text file »
- <!--- Gather all of the title nodes. --->
- <cfset arrTitleNodes = XmlSearch(
- xmlFeed,
- "//item/title/"
- ) />
-
- <!--- Create an array of item nodes. --->
- <cfset arrItemNodes = [] />
-
-
- <!---
- Loop over each title and check to see if the text contains
- the phrase ColdFusion - since we are checking in ColdFusion,
- we don't have to worry about case.
- --->
- <cfloop
- index="xmlTitleNode"
- array="#arrTitleNodes#">
-
- <!--- Check for phrase. --->
- <cfif FindNoCase( "ColdFusion", xmlTitleNode.XmlText )>
-
- <!--- Add parent node (Item) to array. --->
- <cfset ArrayAppend(
- arrItemNodes,
- xmlTitleNode.XmlParent
- ) />
-
- </cfif>
-
- </cfloop>
-
-
- <!--- Output the node titles. --->
- <cfloop
- index="xmlItemNode"
- array="#arrItemNodes#">
-
- #xmlItemNode.Title.XmlText#<br />
-
- </cfloop>
When we run the code this time, we get the following output:
I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter
COLDFusion Is So Hot!
Again, we gather all of the appropriate matches for "ColdFusion" without having to do any additional XPath / XmlSearch() calls.
This would all be made so much easier if ColdFusion would simply support case-conversion methods in XPath, but for now, I hope that something here may have helped.
Download Code Snippet ZIP File
Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Newer Post
An Intensive Exploration Of jQuery With Ben Nadel (Video Presentation)
Older Post
Using A Rough Box Model To Gather Near-By Zip Codes
Reader Comments
There's another couple of options here Ben:
<cfset aNoCase1 = xmlSearch(xmlFeed, "//item[contains(translate(title/text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'coldfusion')]")>
It's a bit long-winded, but it works.
This next one conditionally works... it's OK for looking up the count of results, but as it transforms the XML, one has to be cautious with what one does with the results:
<cfset aNoCase2 = xmlSearch(lcase(xmlFeed), lcase("//item[contains(title/text(), 'COLDFusion')]"))>
Another note here is that the the nodes in the resultant array are not references to the original nodes, they're references to a separate XML doc which is created by the lcase(xmlFeed) operation. So one cannot update the nodes in the array and expect to see the updates in the original doc (like one usually would). So this one comes with some caveats, but if those are not a concern: it's an adequate approach.
--
Adam
@Adam,
Very nice tip on translate(). I have never used that before. Yes, tedius, but it works. As far as the LCase() of the entire XML document, I actually considered going down that path. But, then, my concern was getting back to the original reference in the first document.
They just need to go ahead and support lower-case() :)
Great post. I've also found that if the xml has a schema listed but is not valid the xml search fails even if the elements exist. If I deleted the schema ref (in the string xml prior to xmlParse) the search worked fine. Sure, you would think I should be using valid xml (against the schema) but the thing is I did not control the xml being returned from this web service and it wasn't. I did not see why xmlSearch should care. If the search works then return data dang you.
@RyanTJ,
I believe validation is an optional part of the XML parsing. But, to be honest, I have never used any schema validation explicitly. I cannot offer any better advice on that matter.
@RyanTJ,
.... all to say, yeah, if it can parse the XML, why does it care :(
Great post, Ben! I think XPath and XSL are often underused, and I always dig your posts on how to get more mileage out of them.
Your examples ("I Think ColdFusion Knocked Up My Daughter"??) are as twisted and borderline-inappropriate as always. Rock on, Mr. Nadel!
RyanTJ, could you pls clarify what you're saying here about xmlSearch() failing? Maybe paste some sample code?
Ben: could you please drop me an email offline (it's just about this lower-case / upper-case stuff, and CF's support for it).
Cheers.
--
Adam
It's not about them implementing anything. XPath 1 just doesn't have those functions, the XPath engine they use (Xalan) is XPath 1 compliant.
They'd need to use an XPath 2 compatible library instead, and that means switching to Saxon because that's the only implementation in Java unfortunately.
People seem to think that Macrodobe actually implement this stuff. They don't. The Regex engine is Apache ORO, the XML stuff is Apache Xerces and Xalan.
>It's not about them implementing anything.
Well, Elliott, it would be about them implementing Saxon instead Xalan, wouldn't it? So it's every thing about them implementing something, isn't it?
>People seem to think that Macrodobe actually implement this stuff.
Yes. They seem to think Adobe implements third-party libraries to get the work done. They also seem to think that perhaps other capabilities might present themselves if CF's chosen XML solution was a different one, possibly one in keeping with the times.
All of which is spot on.
You're the only one confused around here, mate.
--
Adam
@Adam, @Elliott,
I don't want to start attacking ColdFusion or Adobe here. When I say stuff about wishing they would implement it, I'm just generically saying, "That would be a cool feature to have." I don't mean much more than that.
Hi Ben
I don't think there's any way anything you said could've been construed as an attack against anything or one. Everything you said is spot on, valid, and I'm sure is something Adobe are giving at least some consideration to.
--
Adam
"How I Became An XSLT Junkie" :)
I'm finding XSLT/XPATH etc etc so much easier to use than parsing and looping and handling errors in the xml than straight ColdFusion.
I told my dba to have Oracle return XML results to me. But now we are looking at XSQL. Meanwhile the die hard Java, C#, VB programmers are going nutso wacko. (Are were they always that way?)
Seriously, I have scrapped my RSS integrator for websites and replaced it with a much simpler but more powerful XSLT version.
Have you read the book "ColdFusion Brain Freeze"?
@Don,
XSLT is definitely a powerful thing. While there is certainly a learning curve to XSLT, when you get it in your head, it can be a great way to transform XML.
I don't know that book, but I will look it up.
Ben,
You have been an amazing resource for me as I grow my skills and this specific article is pretty close to what I'm looking for, but my question is what if you need a case insensitive search of a node?
Specifically you are expecting people to send xml to you a certain way but you can't trust they won't do contactINFO or contactinfo instead of contactInfo.
The attribute trick you showed here won't work in this case because it's the NODE itself that we can't find properly.
Any thoughts?
Erick
@Erick,
You'd have to create a UDF or something that traverses the XML tree doing case-insensitive searching. Right now, there's really no way with XPath that I can see to do this.



