Code Kata: Getting Initials For A Name In Lucee CFML 5.3.8.201
At work, we have many user interfaces (UI) that use initials instead of full names, such as the "face pile" widget. The current technique for extracting initials from names is rather simplistic: we grab the set of tokens defined by the RegEx pattern, \S+
, and then pluck out the leading character of each match. As a code kata, I wanted to see if I could author a slightly more clever user defined function (UDF) that could take a name and return initials in Lucee CFML 5.3.8.201.
There's a large variety in the way people format their names. And, I really only understand the English versions - I have no insight into how names are formatted in other languages. On top of that, I don't have a perfect sense of what people would want their initials to be. As such, I don't have a solid target for my initial-calculating algorithm.
For example, I'm assuming that a person that generally uses an initial for their first name:
"F. Murray Abraham"
... probably wants their initials to be "MA" and not "FA". After all, they are going out of their way to reduce the use of their first name in common practice.
But, what about someone like:
"J.J. Abrams"
Using the same logic, his initials would just be "A". Is that correct? Honestly, I have no idea. Maybe it is? Only J.J. Abrams can answer that question since the formatting of one's name is as much a personal preference as it is a standard.
Then, we start to throw in prefixes like "Dr." and suffixes like "Jr.", "Esq.", and "IV" and things get even more fuzzy. I don't have any of those in my name, so I have no real instinct for their inclusion or exclusion. But, I'm erring on the side of exclusion since they are already abbreviations. And, I'm assuming that we don't want to double down on abbreviations.
With that said, here's the ColdFusion algorithm that I came up with for reducing a name down to a set of - at most 2 - initials:
<cfscript>
// Let's try to get the initials for a variety of common English name formats.
names = [
"Ben Nadel",
"Fuzzy Wuzzy III",
"William Stanley Preston, Esq.",
"Samuel L. Jackson",
"F. Murray Abraham",
"Ludwig van Beethoven",
"Robert Downey Jr.",
"Mary Stewart Masterson",
"April O'Neil",
"Dr. Leo Spaceman",
"Sting",
"Julia Louis-Dreyfus",
"Randolph Severn ""Trey"" Parker III",
// And then some stuff that isn't that common.
"J.J. Abrams",
"J. K. Rowling",
"LL Cool J",
"Dr.",
"Jr.",
"-",
"@"
];
for ( name in names ) {
echo( "#name# .... #getInitials( name )# <br />" );
}
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I return the best-guess initials for the given name. Returns between zero and 2
* characters.
*/
public string function getInitials( required string name ) {
// If the name contains a double-quoted value, let's try to honor this as the
// person's preferred name. To keep things simple, in the case of a quote, we'll
// just grab everything after the first quote character.
var preferredName = ( name.find( '"' ) )
? name.listRest( '"' )
: name
;
var baseName = preferredName
.trim()
// Remove any comma-based suffixes like ", Esquire".
.listFirst( "," )
// Remove any token that is followed by a period. Let's assume that all
// already-abbreviated values should be EXCLUDED from the further abbreviation
// of the name down into a set of initials. Note that this will also take care
// of things like "Dr." and "Jr.".
.reReplace( "\S+\.", "", "all" )
// Remove Roman numeral generational indicators, like "VIII". Note that this
// is a CASE-SENSITIVE match so that we don't accidentally match on short,
// foreign names.
.reReplace( "\s(I|II|III|IV|V|VI|VII|VIII|IX|X)$", "" )
// Remove any punctuation. Primarily, we want to collapse compound terms into
// a single token that won't confused the word-boundary (\b) matcher.
.reReplace( "[[:punct:]]+", "", "all" )
;
// EDGE CASE: If creating the base name left us with no character data, let's
// revert back to the original name and just strip out punctuation. This way, we
// have a better chance of returning SOMETHING visual.
if ( baseName == "" ) {
baseName = name.reReplace( "[[:punct:]]+", "", "all" );
}
// Get any non-space character that is preceded by a word-boundary match. Since we
// removed all punctuation, this SHOULD match only valid printable characters.
var letters = baseName
.trim()
.ucase()
.reMatchNoCase( "\b\S" )
;
switch ( letters.len() ) {
case 0:
return( "" );
break;
case 1:
return( letters.first() );
break;
default:
return( letters.first() & letters.last() );
break;
}
}
</cfscript>
The algorithm is heavily based on Regular Expression (RegEx) patterns and Lists (the unsung heroes of ColdFusion). But, overall, I don't think it's too complicated. And, when we run this ColdFusion code, we get the following output:
Ben Nadel .... BN
Fuzzy Wuzzy III .... FW
William Stanley Preston, Esq. .... WP
Samuel L. Jackson .... SJ
F. Murray Abraham .... MA
Ludwig van Beethoven .... LB
Robert Downey Jr. .... RD
Mary Stewart Masterson .... MM
April O'Neil .... AO
Dr. Leo Spaceman .... LS
Sting .... S
Julia Louis-Dreyfus .... JL
Randolph Severn "Trey" Parker III .... TP
J.J. Abrams .... A
J. K. Rowling .... R
LL Cool J .... LJ
Dr. .... D
Jr. .... J
- ....
@ ....
Honestly, I'm pretty satisfied with the outcome here. But, once again, I really only have an understanding of how English names work. I have no idea if this is meaningful for foreign names, let alone names that operate outside of the English alphabet.
That said, I do believe that generating initials is as much an opinion of the parent application as it is a single standard. As such, I do think it makes sense for any given ColdFusion application to have its own implementation of this algorithm that it can evolve over time.
Anyway, I haven't been doing too much back-end work the last few weeks; so, I just wanted a little ColdFusion juice to keep the brain lubricated.
Want to use code from this post? Check out the license.
Reader Comments
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →