
Introduction
Most developers use strings every day without thinking too much about how text is represented, how computers interpret it, or how it is stored in memory. For many everyday tasks, that is completely fine.
But once you start working heavily with text, dealing with large strings, complex Unicode input, or performance-critical code, those details begin to matter.
For a long time, text was simple. ASCII defined a small set of characters, each represented by a single byte. Counting characters was trivial. Indexing was straightforward. One byte meant one character.
That model no longer holds.
In Swift, String APIs often raise questions like:
- Why is count not constant time?
- Why cannot strings be indexed by integers?
- Why do characters behave differently than expected?
These behaviors can be surprising at first, but they are not accidental. They reflect the reality of modern text.
Unicode expanded text far beyond ASCII. What users perceive as a single character may consist of multiple Unicode scalars and multiple encoded bytes. Supporting this correctly has deep implications for how strings are represented and manipulated.
In this article, we will start with ASCII to establish the original mental model, then move to Unicode and encodings, and finally explore how these foundations shape Swift’s String API. We will connect the theory to concrete behaviors such as string views, character counting, and index-based traversal.
The goal is to replace confusion with understanding. Once the underlying model is clear, Swift’s String APIs stop feeling complex and start feeling deliberate.
Key Insights You’ll Learn
- Unicode defines what characters are, but not how they are stored in memory.
- Encodings like UTF-8, UTF-16, and UTF-32 define how Unicode code points are represented as bytes.
- Swift strings are Unicode-correct by design and prioritize user-visible characters over raw bytes.
- A Swift Character is a grapheme cluster, not a byte, and not a Unicode scalar.
- The same string can be viewed as UTF-8 code units, Unicode scalars, or characters, depending on the task.
- String.count is an O(n) operation, because counting characters requires traversing grapheme clusters.
- Swift uses String.Index instead of integers to make correct text traversal explicit and safe.
ASCII
ASCII was one of the first standards for representing text in computers. It was introduced in the 1960s and defines 128 characters. These include English letters, digits, punctuation, and a few control characters.
ASCII uses values from 0 to 127 and fits into a single byte.
Examples:
- A → 65, Z → 90
- a → 97, z → 122
- 0 → 48, 9 → 57
This worked well for English, but it stopped there. There was no room for non-Latin alphabets, accented letters, or symbols used in other languages. As computers became global, this limitation became a real problem.
Unicode
Unicode was introduced in 1991 to solve the limitations of ASCII. Instead of focusing on bytes, Unicode defines characters. Each character is assigned a unique number called a code point.
Unicode answers one question: what character is this? It does not define how characters are stored in memory.
A key design decision was compatibility. That’s why the first 128 Unicode code points are exactly the same as ASCII.
Because of this:
- ASCII text is valid Unicode text.
- Unicode could replace ASCII as the global standard without breaking existing systems.
Examples (code points are written in hexadecimal):
- Latin: A → U+0041, a → U+0061, Z → U+005A, z → U+007A
- Cyrillic: Б → U+0411, Г → U+0413, З → U+0417, Я → U+042F
- Chinese: 你 → U+4F60, 好 → U+597D, 中 → U+4E2D, 文 → U+6587
- Symbols: @ → U+0040, # → U+0023, $ → U+0024, % → U+0025, & → U+0026
- Emojis: 🙂 → U+1F642, 😀 → U+1F600, 🚀 → U+1F680, 🔥 → U+1F525
Unicode also allows a single visible character to be composed of multiple code points. For example, e + ◌́ = é.
This character can be represented in two ways:
- As a single code point: U+00E9 (é).
- As two code points: U+0065 (e) and U+0301 (combining acute accent).
Both forms look identical on screen, but they are different sequences internally. This property is fundamental to modern text processing and explains why strings are more complex than they first appear.
Encodings
So far, we talked about characters and code points. Now we need to answer a different question: how are these code points stored as bytes? That is what encodings are for.
An encoding defines how Unicode code points are represented in memory or on disk. Unicode defines what a character is. Encodings define how that character is stored.
The most common Unicode encodings are UTF-8, UTF-16, and UTF-32. They all represent the same Unicode code points. The difference is how many bytes they use and how those bytes are arranged.
- UTF-8: variable-length encoding that uses 1 to 4 bytes per code point. Smaller code point values use fewer bytes. ASCII characters fit into a single byte.
- UTF-16: variable-length encoding based on 16-bit units. Most code points use 2 bytes. Larger code points use 4 bytes.
- UTF-32: fixed-length encoding. Every code point uses 4 bytes. Simple to reason about, but inefficient in memory usage.
The same character A (U+0041) encoded using different encodings can take a different number of bytes:
- UTF-8 → 1 byte
- UTF-16 → 2 bytes
- UTF-32 → 4 bytes
The character stays the same. Only the byte representation changes.
Encodings use more bytes when the code point value is larger. That’s why different characters take different amounts of space. Characters added later to Unicode usually have higher code point values, which is why they tend to take more bytes in UTF-8.
Examples in UTF-8:
- A (U+0041) → 1 byte
- é (U+00E9) → 2 bytes
- 你 (U+4F60) → 3 bytes
- 😀 (U+1F600) → 4 bytes
Why UTF-8 dominates
UTF-8 is the most widely used Unicode encoding today.
- It is backward compatible with ASCII, which made the adoption easy.
- It is compact for common text, especially English and source code.
- It works well with existing byte-oriented systems and protocols.
Because of this, UTF-8 became the default encoding for the web.
This transition is visible on Apple platforms as well. Objective-C String APIs were built around UTF-16, while Swift moved to UTF-8 as the default encoding.
Unicode scalars and Grapheme clusters
So far, we talked about characters, code points, and encodings. Now we need to be more precise about what a “character” actually means.
Unicode separates code points from user-visible characters.
Unicode scalars
A Unicode scalar is a single Unicode code point.
Scalars are fundamental units defined by Unicode and encoded by UTF-8, UTF-16, or UTF-32.
Encodings operate on Unicode scalars. They convert scalar values into bytes and back.
At this level, text looks simple. Each scalar is just a number that maps directly to bytes.
Why scalars are not enough
The problem is that users do not interact with scalars.
A single user-visible character may consist of multiple Unicode scalars.
For example:
- The flag emoji 🇺🇸 looks like one character, but it is composed of two scalars: 🇺 (U+1F1FA) + 🇸 (U+1F1F8).
- The family emoji 🧑🧑🧒🧒 looks like one character, but it is composed of seven scalars: 🧑 (U+1F9D1) + zero-width joiner (U+200D) + 🧑 (U+1F9D1) + zero-width joiner (U+200D) + 🧒 (U+1F9D2) + zero-width joiner (U+200D) + 🧒 (U+1F9D2).
If scalars were treated as characters, many basic text operations would no longer match user expectations.
Unicode scalars are precise, but they are too low-level to model characters as people perceive them.
Grapheme clusters
To bridge this gap, Unicode defines grapheme clusters.
A grapheme cluster represents a single user-visible character. It may consist of one Unicode scalar or multiple Unicode scalars combined together.
From the user’s perspective, each of the following is one character: “A”, “é”, “你”, “🇺🇸”, “ 🧑🧑🧒”.
Even though internally, they may contain very different scalar sequences.
This distinction explains several properties of modern text processing:
- Counting characters is not trivial.
- Indexing into strings is not straightforward.
- A “character” does not have a fixed size in memory.
This complexity is not specific to any programming language. It is a direct consequence of Unicode’s design.
How Swift models text
Swift’s String is designed to be Unicode-correct by default. Swift does not treat text as bytes or code points. It models text the way users perceive it.
A String in Swift is a collection of characters. A Character represents a grapheme cluster, which is a single user-visible character.
The choice favors correctness over convenience and directly shapes how strings behave in Swift.
The three views of String
A Swift String can be viewed at different levels of abstraction. Each view represents the same text, but it iterates over a different unit.
Swift exposes this through three main views:
- UTF-8 View (String.UTF8View)
- Unicode Scalar View (String.UnicodeScalarView)
- Character View (String) with elements of type Character
Here is how to access them:
let greeting = "Hello, world! 👋"
let utf8: String.UTF8View = greeting.utf8
let scalars: String.UnicodeScalarView = greeting.unicodeScalars
let characters: [Character] = Array(greeting)
// or
let characters: String = greeting // iterating yields Character
UTF-8 View (String.UTF8View)
This view iterates over the string’s UTF-8 code units. Each element represents a single byte of UTF-8 data (UInt8).
This is the lowest-level view and is useful when working directly with byte-oriented data:
- binary data
- network protocols
- file formats
- performance-sensitive text processing
A single user-visible character may correspond to multiple UTF-8 code units.
Unicode Scalar View (String.UnicodeScalarView)
This view iterates over Unicode scalars, where each element corresponds to a single Unicode code point (Unicode.Scalar).
This view sits between bytes and characters.
It is useful when you need to:
- inspect or transform exact Unicode values
- reason about normalization or scalar-level properties
- process text without grouping into user-visible characters
A single user-visible character may consist of multiple Unicode scalars.
Character View (Character)
Iterating over a String directly yields Character values.
In Swift, a Character represents a grapheme cluster, which corresponds to a single user-visible character.
This is the highest-level view. It prioritizes correctness and user expectations over raw performance.
A grapheme cluster does not have a fixed size. It may consist of multiple scalars and therefore multiple UTF-8 code units.
Why .count is not O(1)
In many languages, getting the length of a string is a constant-time operation. In Swift, String.count is not O(1).
This is not a performance oversight. It is a direct result of how Swift models text.
In Swift, String.count returns the number of characters in the string. A Character represents a grapheme cluster, not a byte, and not a Unicode scalar.
Grapheme clusters do not have a fixed size. A single character may consist of one or more Unicode scalars and multiple UTF-8 code units.
Because of this, Swift cannot know how many characters a string contains without iterating through the string and identifying grapheme cluster boundaries.
As a result, String.count is an O(n) operation.
Consider the same example string used earlier:
let greeting = "Hello, world! 👋"
greeting.count // 15 - Characters
greeting.unicodeScalars.count // 15 - Unicode scalars
greeting.utf8.count // 18 - UTF-8 code units (bytes)
Each view answers a different question. The values may differ, even though the underlying string is the same.
Now consider a more subtle example:
var word = "cafe"
word // cafe
word.count // 4 - Characters
word.unicodeScalars.count // 4 - Unicode scalars
word.utf8.count // 4 - UTF-8 code units (bytes)
// Appending a combining accent at the scalar level
word.unicodeScalars.append("\u{0301}")
word // café
word.count // 4 - Characters
word.unicodeScalars.count // 5 - Unicode scalars
word.utf8.count // 6 - UTF-8 code units (bytes)
Visually, the string still looks like it has 4 characters, and count still returns 4. But an extra Unicode scalar was added, so unicodeScalars.count increases. That scalar is encoded as additional bytes, so utf8.count increases as well.
This example shows why counting characters can not be done by looking at bytes or scalars alone. Swift must scan the string to determine grapheme cluster boundaries.
Swift could make String.count faster by defining it in terms of bytes or scalars. But that would break the guarantee that count reflects what users perceive as characters. Instead, Swift chooses correctness over convenience.
The important takeaway is not to avoid count, but to understand what it does. Once you do, its performance characteristics make sense.
Why String.Index exists
In many languages, strings are indexed by integers. You can access the first character with index 0, the second with 1, and so on.
Swift does not allow this.
This is not a design accident. It is a consequence of how Swift models text.
In Swift, a String is a collection of characters, and a Character represents a grapheme cluster. Grapheme clusters do not have a fixed size and do not occupy a predictable number of bytes.
Because of this, there is no constant-time (O(1)) way to jump to the nth character in a string. The string must be scanned from the beginning to determine where each character boundary lies.
An integer index would suggest that characters are fixed-size and randomly accessible. They are not.
That is why Swift uses String.Index.
A String.Index represents a position between characters in a specific string. It always points to a valid character boundary and is tied to the string it belongs to.
Consider the following example:
let greeting = "Hello, world! 👋"
let firstIndex = greeting.startIndex
let secondIndex = greeting.index(after: greeting.startIndex)
let seventhIndex = greeting.index(greeting.startIndex, offsetBy: 7)
let lastIndex = greeting.index(before: greeting.endIndex)
let firstCharacter = greeting[firstIndex] // "H"
let secondCharacter = greeting[secondIndex] // "e"
let seventhCharacter = greeting[seventhIndex] // "w"
let lastCharacter = greeting[lastIndex] // "👋"
Although index(_:offsetBy:) takes an integer offset, it does not provide random access. Advancing an index by n characters is an O(n) operation. Swift must move through the string one character at a time, determining grapheme cluster boundaries at each step. This is the same reason String.count is an O(n) operation.
It is also important to note that endIndex represents the position after the final character in the string.
Swift provides a small set of index-manipulation APIs for moving forward and backward through a string, as well as measuring distances between indices. All of them operate at the character level and have linear performance characteristics.
Swift could have exposed integer indexing and documented pitfalls. Instead, it chose an API that makes incorrect assumptions impossible. The result is a string model that is Unicode-correct, safe by construction, and explicit about its performance characteristics.
Swift’s string APIs are often seen as complex at first. But that complexity comes from treating text as users see it, not as raw bytes. Once you understand how Unicode, grapheme clusters, and indexing fit together, Swift’s design becomes predictable and intentional.
Thanks for reading! Follow my Medium profile for more practical Swift and iOS content.
Understanding Swift Strings: Unicode, Encoding, and Performance was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.