Now Using utf8mb4 To Enable Emoji In My Comments

By Ben Nadel

Published 2020-03-25 in SQL — Comments (17)

This post is mostly here so that I can test the comment feature in production. I just updated my blog_comment table to use the utf8mb4 character set, which means that it can now support Emoji characters:

ALTER TABLE `blog_comment`
	MODIFY COLUMN `content` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
	MODIFY COLUMN `content_markdown` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
;

From what I understand, the utf8mb4 character set enables support for characters in the Astral Plane, which is where many (but not all) emoji characters live. Specifically, it supports Emjoi that require 4-bytes to encode.

On it's own, utf8mb4 doesn't enable Emoji support in my running MySQL server. I also had to tell the INSERT statement to use the appropriate character set:

SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;

INSERT INTO blog_comment 
(
	content,
	content_markdown,

	-- .... truncated for demo ....

I believe that there is a way to configure the MySQL server to use those settings by default; however, that's beyond my experience at this point. So, rather than go down that rabbit-hole this morning, I'm just using SET NAMES, which I know works.

And now, hopefully, I'll be able to add Emoji in the comments below!

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/3794

Reader Comments

Ben Nadel Mar 25, 2020 at 7:52 AM

15,902 Comments

And then, there were emoji:

😆
⛄️
💩

... and lo, it was good.

Ben Nadel Mar 25, 2020 at 7:53 AM

15,902 Comments

Hmm, the comment was inserted; but, the page hung. Trying again.

Ben Nadel Mar 25, 2020 at 7:57 AM

15,902 Comments

Huh, it looks like the key-casing on my returned Struct suddenly changed? That's..... wonky.

Ben Nadel Mar 25, 2020 at 7:58 AM

15,902 Comments

Well, anyway, it works now. 👍

Jamie Krug Mar 25, 2020 at 1:47 PM

3 Comments

❣

Ben Nadel Mar 25, 2020 at 1:53 PM

15,902 Comments

@Jamie,

Woot! It's alive!!

It looks like something in the last Adobe CF Hotfix messed with my key-casing.

Martin Mädler Mar 25, 2020 at 4:25 PM

1 Comments

I like! ⭐️😁✅👾💩🤡

Casey Flynn Mar 26, 2020 at 1:01 AM

1 Comments

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

^ I read this once a year to refresh my memory. If my understanding is correct, unicode is a "character set" and utf8mb4 is an "encoding."

Unicode has "code points" to represent 1,112,064 different characters across all the world's languages (and some random stuff like emojis too). utf8 can store most of the code points of unicode (using up to 3 bytes), and utf84mb can store them all (including emojis, using up to 4 bytes).

This is how you can say hello in Mandarin/UTF8/binary:
$ echo -n "你好" | xxd -b
11100100 10111101 10100000 11100101 10100101 10111101

因為我學中文和我住在台灣，我知道這個。😅

Ben Nadel Mar 26, 2020 at 6:54 AM

15,902 Comments

@Casey,

Ha, I know that I've read that article a handful of times as well :D Somehow, this stuff never quite plants itself permanently in my brain, and I have to re-learn aspects of this over and over. I think you are right about the "character set" vs. "encoding". ... I think 🤷‍♂️

Chris G Mar 26, 2020 at 10:39 AM

247 Comments

Yay🎉🥳 ... Hmmm didn't like the party hat emoji?

Ben Nadel Mar 26, 2020 at 11:23 AM

15,902 Comments

@Chris,

Hmm, yeah, that second comes through as one weird little binary chip or something :D

Chris G Mar 26, 2020 at 12:54 PM

247 Comments

It showed up on my laptop, but it's a weird chip on my phone...which is the device I used to comment. Strange. Here's a screenshot of it working...

https://screencast.com/t/T4bdQmQMJk81

Pete Freitag Mar 26, 2020 at 4:24 PM

20 Comments

Nice 🤓!

I posted about this a few weeks ago as well. It gives a bit of an explanation: https://www.petefreitag.com/item/895.cfm

From my experience you don't need to do the SET NAMES thing, you can instead alter the DB schema

ALTER TABLE tableName
DEFAULT CHARACTER SET utf8mb4,
MODIFY columnName varchar(200)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
MODIFY anotherColumn text
CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.

Ben Nadel Mar 26, 2020 at 4:34 PM

15,902 Comments

@Pete,

I modified the schema - it might be this piece that I am still missing:

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.

Though, I feel like we tried that at work, and we still needed to include the SET NAMES stuff. I'll have to play around.

Ben Nadel Apr 5, 2020 at 8:01 AM

15,902 Comments

@All,

As a quasi-follow-up to this post, I started to think more deeply about how I might further leverage Emoji in my blog. The native chr() and asc() functions in ColdFusion only work for the Unicode range 0x0000 to 0xffff (or, 0 to 65,535). Once you go above that into the "supplemental" character range, these functions no longer work. As such, I believe you have to dip down into the Java layer:

www.bennadel.com/blog/3804-printing-emoji-characters-from-unicode-codepoints-in-lucee-cfml-5-3-5-92.htm

I know that, these days, you can embed Emoji directly within your CFML code. But, that just feels janky to me (probably cause I'm an old dog learning new tricks). As such, I'm really interesting in being able to render Emoji characters from the Unicode CodePoint.

Ben Nadel Jul 24, 2021 at 6:37 AM

15,902 Comments

@All,

Being able to store emoji is one thing; making the emoji easier to use is another thing. I want to add some emoji short-cuts to the blog. And, as a precursor to that, I needed to figure out how to render emoji glyphs using codepoints:

www.bennadel.com/blog/4084-rending-emoji-glyphs-using-hexadecimal-codepoints-in-javascript.htm

Turns out, the String.fromCodePoint() function is perfect for this.

Ben Nadel Mar 1, 2022 at 1:16 PM

15,902 Comments

I finally got around to upgrading my MySQL database version and my table schemas. Now, everything runs on utf8mb4. And, as @Pete pointed out, I can remove the SET NAMES stuff from my SQL statements as long as I have have characterEncoding=UTF-8 in my connection string. I also took this as an opportunity cleanup that connection string as well (as this tends to rot over time).

www.bennadel.com/blog/4218-upgrading-my-coldfusion-blog-from-mysql-5-7-10-to-mysql-8-0-28.htm

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.