Day 012 — Strings, Runes, Bytes & UTF-8¶
Month 1 · Week 2 · ⬅ Day 011 · Day 013 ➡ · Journal index
🎯 Learning Objective¶
Reason about strings as immutable UTF-8 bytes, and convert correctly between bytes, runes, and strings.
📚 Topics¶
- Strings as immutable UTF-8 byte sequences
- Bytes vs runes ·
rangedecoding unicode/utf8,strings,strings.Builder
📖 Reading / Sources¶
- Go Blog — Strings, bytes, runes and characters
- pkg.go.dev —
unicode/utf8 - Learning Go ch.3 (Strings, Runes, Bytes)
📝 Notes¶
- A string is an immutable sequence of bytes, conventionally UTF-8. You cannot assign
s[i] = ...→ [[immutable-strings]]. - Indexing a string yields a single byte (
uint8), which may be only part of a multi-byte rune.len(s)counts bytes, not characters. rangeover a string decodes UTF-8: the index is the byte offset where each rune starts, and the value is a rune (int32code point) → [[range-string]].utf8.RuneCountInString(s)gives the number of code points;[]rune(s)gives one element per code point (allocates) and lets you index/slice by character.[]byte(s)andstring(b)convert with a copy (strings are immutable).string(65)converts a code point →"A", not"65"— usestrconvfor numbers → [[string-int-trap]].- A "character" the user sees (grapheme) can be multiple runes (combining marks, emoji ZWJ). Rune ≈ code point, not always a visible glyph.
- Build strings in a loop with
strings.Builder(amortized, no per-+allocation);+in a hot loop is O(n²) → [[strings-builder]].
💻 Code Examples¶
s := "héllo, 世界"
fmt.Println(len(s)) // 14 bytes
fmt.Println(utf8.RuneCountInString(s)) // 9 runes
for i, r := range s { // i = byte offset, r = rune
fmt.Printf("%d:%c ", i, r)
}
Full code:
examples/month-01/strings-runes/main.go· Run:go run ./examples/month-01/strings-runes
🏋️ Exercises / Practice¶
| Exercise | Status | Link |
|---|---|---|
| Reverse a string by rune (+ palindrome) | ✅ | exercises/month-01/week-2/runereverse |
🐛 Mistakes Made¶
- Reversed a string by bytes → multi-byte runes turned into garbage. Converted to
[]runefirst. - Used
string(n)to stringify an int → got a stray character. Switched tostrconv.Itoa.
❓ Open Questions¶
- How do I count user-perceived characters (graphemes)? (Needs grapheme segmentation, outside the stdlib.)
🧠 Active Recall (answer without looking)¶
- Q: What does indexing
s[i]return, and what doesrange syield?A
s[i] is a single byte; range s yields the byte offset and the decoded rune (code point).
2. Q: Why is string(72) equal to "H" and not "72"? A
Converting an integer to string interprets it as a Unicode code point (72 = 'H'); use strconv.Itoa for the digits.
🪶 Feynman Reflection¶
A Go string is a read-only ribbon of bytes. Most letters take one byte, but accented and CJK characters take several. Indexing reads one byte off the ribbon; range is smart enough to read whole characters (runes) and tell me where each begins.
🕳️ Knowledge Gaps¶
- Normalization (NFC/NFD) and
golang.org/x/text— note as future, non-stdlib.
✅ Summary¶
I understand bytes vs runes, can count and reverse strings correctly under UTF-8, and use strings.Builder for efficient concatenation.
⏭️ Next Steps / Prep for Tomorrow¶
- Day 013: structs and struct tags.
| Time spent | Difficulty | Confidence |
|---|---|---|
| 90 min | 🟦🟦⬜⬜⬜ | 🟦🟦🟦⬜⬜ |
Suggested commit: feat(examples): strings, runes, bytes, utf-8 (day 012)