Home
Website
What is UTF-8 encoding? A walkthrough for non-programmers

What is UTF-8 encoding? A walkthrough for non-programmers

Written by: Darrielle Evans

Woman with long dark hair wearing a bright yellow sweater, sitting at a desk with a laptop, holding a white mug, with a marketing workbook titled

FREE WEBSITE REDESIGN WORKBOOK

Learn how to redesign your website with this free guide.

Updated: 11/20/25

Early in my career, I worked as a technical consultant on the development team of a global health and wellness brand during a scaled digital transformation, which included an international website rollout for both consumers and their distributors. The project was very complex.

Free Workbook: How to Plan a Successful Website Redesign

The content had to meet local regulations, support multiple languages, and be delivered carefully through the custom headless CMS (Adobe Experience Manager) that we developed. It quickly became apparent how easily things could break when characters like ñ, ç, or entire Chinese glyphs weren’t appropriately encoded.

Although I had some prior knowledge of encoding, I quickly realized just how foundational UTF-8 is to building websites that work across borders.

In this post, I’ll break down what UTF-8 actually is, why it matters for anyone working on web projects, and how it quietly powers the multilingual, global digital experiences we use daily. Before I get started, I do recommend that you deepen your understanding by reviewing the basics of Unicode, as it’s the standard that makes UTF-8 possible.

Table of Contents

What is UTF-8?
UTF-8: The Character Set in Web Development
Difference Between UTF-8 and UTF-16

What is UTF-8?

UTF-8 is an encoding system for Unicode. UTF-8 stands for “Unicode Transformation Format - 8 bits.” It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character.

To understand everything about UTF-8, I’ll walk you through the basics first.

How Computers Store Information

In order to store information, computers use a binary system. In binary, all data is represented in sequences of 1s and 0s. The most basic unit of binary is a bit, which is just a single 1 or 0. The next largest unit of binary, a byte, consists of 8 bits. An example of a byte is “01101011.”

Every digital asset you’ve ever encountered — from software to mobile apps to websites to Instagram stories — is built on this system of bytes, which are strung together in a way that makes sense to computers.

When we refer to file sizes, we’re referencing the number of bytes. For example, a kilobyte is roughly one thousand bytes, and a gigabyte is roughly one billion bytes.

Text is one of many assets that computers store and process. Text is made up of individual characters, each of which is represented in computers by a string of bits. These strings are assembled to form digital words, sentences, paragraphs, romance novels, and so on.

The Ultimate Workbook for Redesigning Your Website

Guidance + templates to simplify your next website redesign project.

A four-part redesign planning guide
A redesign budget template
A website redesign audit template
And more!

Learn more Download Now for Free

Download Free

All fields are required.

You're all set!

Click this link to access this resource at any time.

Download Now

ASCII: Converting Symbols to Binary

The American Standard Code for Information Interchange (ASCII) was an early standardized encoding system for text. Encoding is the process of converting characters in human languages into binary sequences that computers can process.

ASCII’s library includes every upper-case and lower-case letter in the Latin alphabet (A, B, C…), every digit from 0 to 9, and some common symbols (like /, !, and ?). It assigns each of these characters a unique three-digit code and a unique byte.

ASCII Character Table

The table below shows examples of ASCII characters with their associated codes and bytes.

CHARACTER	ASCII CODE	BYTE
A	065	01000001
a	097	01100001
B	066	01000010
b	098	01100010
Z	090	01011010
z	122	01111010
0	048	00110000
9	057	00111001
!	033	00100001
?	063	00111111

Just as characters come together to form words and sentences in language, binary code does so in text files. So, the sentence “The quick brown fox jumps over the lazy dog” represented in ASCII binary would be:

01010100 01101000 01100101 00100000 01110001 01110101 01101001 01100011 01101011 00100000 01100010 01110010 01101111 01110111 01101110 00100000 01100110 01101111 01111000 00100000 01101010 01110101 01101101 01110000 01110011 00100000 01101111 01110110 01100101 01110010 00100000 01110100 01101000 01100101 00100000 01101100 01100001 01111010 01111001 00100000 01100100 01101111 01100111 00101110

That doesn’t mean much to us humans, but it’s a computer’s bread and butter.

How many ways can a character be represented in ASCII?

ASCII was originally designed as a 7-bit system, which means it can represent 128 unique characters (values 0–127). That covers the English alphabet, numbers, punctuation, and some control characters like carriage return and line feed.

A common misconception is that ASCII uses 8 bits (a full byte), which would allow for 256 characters. In reality, standard ASCII only ever defined 128. The “extra” bit in an 8-bit byte was often used for error checking, formatting, or left unused.

Later, different systems did take advantage of the full 8-bit range to create “extended ASCII” sets with up to 256 characters. But because each system defined those extra slots differently, compatibility issues were common. For example, the byte value 130 might display as “é” on a Windows machine but as „ (double low quote) on another system.

These kinds of conflicts highlighted the need for a universal standard that could store every symbol, in every language, consistently.

Unicode: A Way to Store Every Symbol, Ever

ASCII was fine when we were only thinking about English, but once the internet went global, it just couldn’t keep up. That’s where Unicode comes in. Instead of cramming characters into a limited set of slots, Unicode gives every symbol its own unique identifier, called a code point. Think of it like giving every letter, number, or emoji its own street address — no matter where you are, you’ll always know exactly what it is.

Unicode has space for over 1.1 million code points, which is more than enough to cover every language, past and present, plus extras like math symbols, currency signs, and emojis. It’s the reason text doesn’t fall apart when you switch between countries or devices.

Unicode Character Table

Now, I’ll give you some examples of text characters and their matching code points. Each code point begins with “U” for “Unicode,” followed by a unique string of characters to represent the character.

CHARACTER	CODE POINT
A	U+0041
a	U+0061
0	U+0030
9	U+0039
!	U+0021
Ø	U+00D8
ڃ	U+0683
ಚ	U+0C9A
𠜎	U+2070E
😁	U+1F601

If you want to learn how code points are generated and what they mean in Unicode, check out this in-depth explanation.

So, now with Unicode I have a standardized way of representing every character used by every human language in a single library. This solves the issue of multiple labeling systems for different languages — any computer on Earth can use Unicode.

But Unicode alone doesn’t store words in binary. Computers need a way to translate Unicode into binary so that its characters can be stored in text files.

Here’s where UTF-8 comes in.

The Ultimate Workbook for Redesigning Your Website

Guidance + templates to simplify your next website redesign project.

A four-part redesign planning guide
A redesign budget template
A website redesign audit template
And more!

Learn more Download Now for Free

Download Free

All fields are required.

You're all set!

Click this link to access this resource at any time.

Download Now

UTF-8: The Character Set in Web Development

UTF-8 is the most common character encoding used on the internet today. Actually, it’s the default for HTML5. Over 98% of all websites (probably including yours) store characters this way.

You’ll also see UTF-8 show up in common data formats like XML and JSON. While these formats technically can use other encodings, UTF-8 is the standard for web data transfer.

That’s why I recommend making sure all your site pages and databases are using UTF-8. Most content management systems and website builders will save files in UTF-8 automatically, but it’s still worth double-checking, especially if you’re redesigning your site. A redesign is the perfect time to audit your encoding settings and confirm everything lines up with modern web standards.

How to Check and Update Your Site’s Encoding Settings

Making sure your site is using UTF-8 isn’t complicated. Here are a few ways you can confirm or update your settings if necessary.

1. Check your HTML <head> tag.

Look for a meta tag like this:

checking your site for utf-8 via meta tag

If it’s missing or shows another encoding (like ISO-8859-1), update it to UTF-8.

2. Review your CMS settings.

WordPress: UTF-8 is the default, but you can confirm under Settings > Reading or by checking your wp-config.php file for DB_CHARSET set to utf8.

Other platforms (Squarespace, Wix, Shopify, etc.) usually enforce UTF-8 automatically, but it’s still good to review the documentation or encoding settings.

3. Check your database.

If your site pulls from a database (like MySQL), make sure the tables and columns are set to utf8mb4. This version of UTF-8 supports the full range of characters, including emojis.

4. Test your pages.

To test your pages, you can use Google Chrome. Open your site in Chrome, right-click, and choose View Page Source. If you see UTF-8 in the meta tag and your characters (especially special ones like accents or emojis) display correctly, then you are all set.

When it comes to HTML, your site needs to tell the browser or software that it’s using UTF-8, or the text won’t render correctly. This is what the <meta charset=“UTF-8”> tag does. It signals how to translate the file back into readable characters.

UTF-8 Character Table

Below is the same character table from above, with the UTF-8 character set output added for each. Notice how some characters are represented as just one byte, while others use more.

CHARACTER	CODE POINT	UTF-8 BINARY ENCODING
A	U+0041	01000001
a	U+0061	01100001
0	U+0030	00110000
9	U+0039	00111001
!	U+0021	00100001
Ø	U+00D8	11000011 10011000
ڃ	U+0683	11011010 10000011
ಚ	U+0C9A	11100000 10110010 10011010
𠜎	U+2070E	11110000 10100000 10011100 10001110
😁	U+1F601	11110000 10011111 10011000 10000001

Understanding UTF-8 Character Conversion to Bytes

I’ve shown in the table above how some characters take just one byte while others need more. But why does UTF-8 give one byte for some characters and up to four for others? The answer is simple: to preserve memory.

This spatial efficiency is one of UTF-8’s biggest advantages. If every Unicode character always used four bytes, a simple English text file would be four times larger than it needs to be.

Here’s a quick example:

“Hello world” → 11 bytes (all single-byte characters)
“Bonjour à tous” → 13 bytes (the accented “à” takes two bytes while the rest take one)

UTF-8’s flexibility means you get the best of both worlds: compact file sizes for everyday text, with the ability to represent virtually any character when you need it.

Are there other encoding systems besides UTF-8?

UTF-8 may be the dominant standard today, but it isn’t the only encoding system that exists. A few others you’ll come across, mostly in older files or legacy systems, include:

ASCII. The original 7-bit system, supporting just 128 characters (English letters, digits, and basic punctuation).
ISO-8859-1 (Latin-1). An extended version of ASCII that added support for Western European characters like ñ or ü. This was the default for early versions of HTML.
UTF-16. Another Unicode encoding that uses two bytes for most characters but can extend to four bytes for less common ones. It’s still used internally by some programming languages like Java and C#.
UTF-32. A fixed-width encoding where every character takes four bytes. Easy for computers to process but very inefficient in terms of file size, so it’s rarely used for web content.

These systems paved the way for modern encoding, but they each had limitations. ASCII and ISO-8859-1 couldn’t represent every language. UTF-16 and UTF-32 could, but they required more storage space. UTF-8 was the perfect balance to efficiently handle common characters, but flexible enough to handle every symbol in Unicode.

Difference Between UTF-8 and UTF-16

Both UTF-8 and UTF-16 are ways of encoding the same Unicode characters. They differ by how they store them.

UTF-8 uses a variable-length system where each character takes one to four bytes. Common characters like English letters only need one byte, while less common symbols may take more. This makes UTF-8 efficient for text-heavy languages like English and keeps file sizes small.

UTF-16 usually uses two bytes for each character, and some characters take four. This means it can be faster for languages with lots of non-Latin characters (like Chinese or Hindi), but it also uses more memory for plain English text compared to UTF-8.

I’ll never forget when I had to really dig into this difference during a client project. We were pulling text data from an older Windows system, and half the characters were coming through as unreadable boxes. At first, I thought the file was corrupted. In actuality, the source was exporting everything in UTF-16, but our site was expecting UTF-8. That was a long day, but I learned a very valuable lesson about the difference between the two and how it can mess with how content appears if not set up correctly.

UTF-8 vs. UTF-16 Character Table

Both UTF-8 and UTF-16 systems use different algorithms to map code points to binary strings. As shown in the character table below, the binary output for any given character will look different for both UTF-8 and UTF-16:

Character	UTF-8 binary encoding	UTF-16 binary encoding
A	01000001	01000001 11011000 00001110 11011111
𠜎	11110000 10100000 10011100 10001110	01000001 11011000 00001110 11011111

When should I use UTF-8?

For most websites, UTF-8 is the clear choice. It keeps memory use low by storing common characters, like English letters and numbers, in a single byte. By comparison, UTF-16 needs two or even four bytes to represent those same characters. That means an English text file saved in UTF-16 would be at least twice the size of one saved in UTF-8.

Another advantage is backward compatibility. The first 128 characters in Unicode line up exactly with ASCII. Because of that, UTF-8 can read and display older ASCII files without breaking, making it an easy fit for the modern web while still honoring the systems that came before it.

When should I use UTF-16?

UTF-16 makes sense in a smaller set of cases, mainly for sites or systems that use languages filled with non-Latin characters. In those situations, UTF-8 may need to store each character as four bytes, while UTF-16 can often get away with just two. That difference can make it more efficient for certain scripts, like Chinese or Hindi, where multi-byte characters are the norm.

For the vast majority of websites though, especially those centered on English or other Latin-based languages, UTF-8 is still the most practical and efficient option.

Pro tip: If your pages are filled with ABCs and 123s, I’d recommend sticking with UTF-8.

Here’s my summary of why and how UTF-8 encoding is important.

The more I’ve learned about UTF-8, the more I see it as one of those invisible details that quietly makes the Internet feel seamless. Most of the time, I don’t even think about it while I’m coding — until something breaks. A garbled character or a bloated file size is usually my reminder that encoding isn’t set up correctly.

That’s why I make it a habit to confirm my projects are using UTF-8 from the start. It doesn’t take long, but it saves me from headaches later and ensures that my work is accessible to anyone, anywhere. I think of UTF-8 as one of those quiet essentials: You don’t notice it when it’s right, but you’ll definitely notice when it’s wrong.

Editor's note: This post was originally published in August 2020 and has been updated for comprehensiveness.

The Ultimate Workbook for Redesigning Your Website

Guidance + templates to simplify your next website redesign project.

A four-part redesign planning guide
A redesign budget template
A website redesign audit template
And more!

Learn more Download Now for Free

Download Free

All fields are required.

You're all set!

Click this link to access this resource at any time.

Download Now

Topics:

Website Development

What is information architecture & why does it matter?

Nov 03, 2025
How to make a website with user accounts and profiles [with WordPress, Wix, and more]

Sep 18, 2025
How to build a Google Site that looks good and drives business [templates & examples]

Jul 31, 2025
How I set up a website for my small business — all my tips to make the process easy

May 06, 2025
How to Code a Website for Free in 8 Easy Steps

Apr 15, 2025
Static vs. Dynamic Websites: Here's the Difference

Mar 10, 2025
How to Build a Website From Scratch If You're Not a Developer

Mar 06, 2025
Web development intro: A breakdown of the basics for beginners

Mar 03, 2025
I Show You How to Build an Ecommerce Website From Scratch

Jan 29, 2025
How to Make a Website for Your Small Business on a Budget

Jan 29, 2025

Blogs

Blogs

Marketing

Sales

Service

Website

AI

Instagram Marketing

Customer Retention

Email Marketing

SEO

Sales Prospecting

Newsletters

Newsletters

The Hustle

Masters In Marketing

The Science of Scaling

Mindstream

Videos

Videos

The Hustle

Marketing with HubSpot

My First Million

Marketing Against the Grain

HubSpot

The Next Wave

The Science of Scaling

Podcasts

Podcasts

My First Million

The Hustle Daily Show

Marketing Against the Grain

The Next Wave

Science of Scaling

Inclusion in Marketing

7 Day Weekend

Nudge

The Ross Simmonds Show

Truth, Lies and Work

Resources

Resources

Academy

Templates

Ebooks

Kits

Tools

HubSpot Products

The HubSpot Customer Platform

Overview of all products

Marketing Hub

Sales Hub

Service Hub

Content Hub

Data Hub

Commerce Hub

About HubSpot

Contact Us

Customer Support

Log in

日本語

Deutsch

English

Español

Português

Français

What is UTF-8 encoding? A walkthrough for non-programmers

FREE WEBSITE REDESIGN WORKBOOK

Free Workbook: How to Plan a Successful Website Redesign

What is UTF-8?

How Computers Store Information

The Ultimate Workbook for Redesigning Your Website

Download Free

You're all set!

ASCII: Converting Symbols to Binary

ASCII Character Table

How many ways can a character be represented in ASCII?

Unicode: A Way to Store Every Symbol, Ever

Unicode Character Table

The Ultimate Workbook for Redesigning Your Website

Download Free