Skip to main content
Ben Nadel at the React NYC Meetup (Oct. 2018) with: Nir Kaufman
Ben Nadel at the React NYC Meetup (Oct. 2018) with: Nir Kaufman

VARCHAR(Length) Limit Refers To Characters, Not Bytes, In MySQL

By
Published in , Comments (1)

When you define a varchar field in a MySQL database table, you can provide a length limit, ex varchar(255). For as long as I can remember, I thought this limit referred to the number of bytes that could be stored in the field. And, to be fair, back when I was only consuming ASCII characters, this assumption was coincidentally true—one ASCII character is represented by one byte. In reality, this length limit refers to the number of characters that can be stored in a field, regardless of how many bytes are needed to represent said character string.

This is an important distinction to understand because once you go beyond the basic ASCII character-set, you start dealing with variable-length character representations. That is, you start dealing with characters, such as emojis, that are represented under the hood by a series of bytes, not just a single byte.

Fortunately, if you define a varchar field to have a limit of 255, it doesn't matter which characters you insert into that field. If you insert 255 ASCII characters, it may only require 255 bytes of storage. And, if you insert 255 emoji characters, it may require far more than 255 bytes; but, you won't have to know or care about that (for the most part).

To see this in action, let's define a simple MySQL database table with a single varchar field:

CREATE TABLE `varchar_test` (
  `id` tinyint unsigned NOT NULL,
  `value` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

This varchar field can hold 10 characters, regardless of how many bytes that requires. To demonstrate, let's use ColdFusion to insert two rows: one with 10 ASCII characters and one with 10 multi-byte emojis:

<!--- Define 10-characters of 1-byte ASCII data. --->
<cfset asciiData = repeatString( "A", 10 ) />

<!--- Define 10-characters of multi-byte UTF-8 data ("raising hands" emoji). --->
<cfset emoji = canonicalize( "&##x1f64c;", true, true ) />
<cfset emojiData = repeatString( emoji, 10 ) />

<!--- Insert both character-strings into database using VARCHAR(10). --->
<cfquery name="results">
	TRUNCATE TABLE
		varchar_test
	;
	INSERT INTO
		varchar_test
	SET
		id = 1,
		value = <cfqueryparam value="#asciiData#" cfsqltype="cf_sql_varchar" />
	;
	INSERT INTO
		varchar_test
	SET
		id = 2,
		value = <cfqueryparam value="#emojiData#" cfsqltype="cf_sql_varchar" />
	;
</cfquery>

When we run this ColdFusion code, it runs without error. And, once the data is stored, we can look in the MySQL database to see how it's being represented:

SELECT
	id,
	value,

	-- Get the BYTES required to store the value.
	LENGTH( value ) AS byteLength,

	-- Get the CHARACTERS required to store the value.
	CHAR_LENGTH( value ) AS charLength
FROM
	varchar_test
;

And when we run this MySQL, we get the following results:

The results of a MySQL query.

Both the ASCII data and the emoji data have been stored successfully. And, both report back as 10 characters according to the CHAR_LENGTH() function. But, while the ASCII value has a byte length of 10, the emoji value has a byte length of 40—four times the limit of the field. Clearly the varchar(length) limit refers to the characters, not the bytes.

Which is great because it means that, as application developers, we don't have to worry so much about what data the user is entering.

In fact, it seems that the maxlength field on a <textarea> control is even more restrictive. So, if we have a textarea with a maxlength="10", we may not be able to enter 10 emojis into that field even if we can store 10 emoji characters in the underlying database.

To see this in a more full-circle way, I'm going to query for the two records we inserted above; and then, try to copy-paste the values into a textarea control with a maxlimit:

<cfquery name="results" returnType="array">
	SELECT
		id,
		value
	FROM
		varchar_test
	;
</cfquery>

<!--- Ensure the browser will render non-ascii character-sets. --->
<cfcontent type="text/html; charset=utf-8" />
<cfoutput>
	<dl>
		<div>
			<dt>
				<strong>Database ASCII:</strong>
			</dt>
			<dd>
				#results[ 1 ].value#
			</dd>
			<dd>
				#len( results[ 1 ].value )# chars
			</dd>
			<dd>
				#arrayLen( charsetDecode( results[ 1 ].value, "utf-8" ) )# bytes
			</dd>
			<dd>
				<input maxlength="10" />
			</dd>
		</div>
		<div style="margin-top: 1rem ;">
			<dt>
				<strong>Database EMOJI:</strong>
			</dt>
			<dd>
				#results[ 2 ].value#
			</dd>
			<dd>
				#len( results[ 2 ].value )# chars
			</dd>
			<dd>
				#arrayLen( charsetDecode( results[ 2 ].value, "utf-8" ) )# bytes
			</dd>
			<dd>
				<input maxlength="10" />
			</dd>
		</div>
	</dl>
</cfoutput>

If I run this ColdFusion page and then copy-paste the rendered database value into the relevant <textarea>, I get the following output:

Browser screenshot show text and input controls.

Notice that while I can copy-paste the entire ASCII value into the <textarea>, I can only copy-paste half the emoji value into the <textarea> despite it having the same "max length" as the underlying MySQL database field. Notice also that ColdFusion seems to think that the 10 character emoji value is actually 20 characters (which is apparently what the browser thinks as well, hence the truncation).

Ultimately, the browser is more restrictive than the MySQL database. Which means that if we use the same maxlength in the browser as we do varchar(length) in the database, we should never have to worry about the user entering more text information that the given row can store.

Want to use code from this post? Check out the license.

Reader Comments

15,902 Comments

Over on Facebook, Seb Duggan had a good point to watch out for the multi-emoji patterns. Where two or more emojis are combined to visually represent a single glyph. I don't know much about these - if they're the same behavior as special ligature behaviors? But, regardless, if I were to replace my emoji in my demo with a multi-emoji glyph, the demo would result in an field overflow / truncation.

That said, the <textarea maxlength> would also take this into account. So, as long as you have a maxlength that is less than or equal to the varchar(length) constraint, you should be fine.

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel