Day 012 — Strings, Runes, Bytes & UTF-8¶

Month 1 · Week 2 · ⬅ Day 011 · Day 013 ➡ · Journal index

🎯 Learning Objective¶

Reason about strings as immutable UTF-8 bytes, and convert correctly between bytes, runes, and strings.

📚 Topics¶

Strings as immutable UTF-8 byte sequences
Bytes vs runes · range decoding
unicode/utf8, strings, strings.Builder

📖 Reading / Sources¶

Go Blog — Strings, bytes, runes and characters
pkg.go.dev — unicode/utf8
Learning Go ch.3 (Strings, Runes, Bytes)

📝 Notes¶

A string is an immutable sequence of bytes, conventionally UTF-8. You cannot assign s[i] = ... → [[immutable-strings]].
Indexing a string yields a single byte (uint8), which may be only part of a multi-byte rune. len(s) counts bytes, not characters.
range over a string decodes UTF-8: the index is the byte offset where each rune starts, and the value is a rune (int32 code point) → [[range-string]].
utf8.RuneCountInString(s) gives the number of code points; []rune(s) gives one element per code point (allocates) and lets you index/slice by character.
[]byte(s) and string(b) convert with a copy (strings are immutable). string(65) converts a code point → "A", not "65" — use strconv for numbers → [[string-int-trap]].
A "character" the user sees (grapheme) can be multiple runes (combining marks, emoji ZWJ). Rune ≈ code point, not always a visible glyph.
Build strings in a loop with strings.Builder (amortized, no per-+ allocation); + in a hot loop is O(n²) → [[strings-builder]].

💻 Code Examples¶

s := "héllo, 世界"
fmt.Println(len(s))                    // 14 bytes
fmt.Println(utf8.RuneCountInString(s)) // 9 runes
for i, r := range s {                  // i = byte offset, r = rune
    fmt.Printf("%d:%c ", i, r)
}

Full code: examples/month-01/strings-runes/main.go · Run: go run ./examples/month-01/strings-runes

🏋️ Exercises / Practice¶

Exercise	Status	Link
Reverse a string by rune (+ palindrome)	✅	exercises/month-01/week-2/runereverse

🐛 Mistakes Made¶

Reversed a string by bytes → multi-byte runes turned into garbage. Converted to []rune first.
Used string(n) to stringify an int → got a stray character. Switched to strconv.Itoa.

❓ Open Questions¶

How do I count user-perceived characters (graphemes)? (Needs grapheme segmentation, outside the stdlib.)

🧠 Active Recall (answer without looking)¶

Q: What does indexing s[i] return, and what does range s yield?
A

s[i] is a single byte; range s yields the byte offset and the decoded rune (code point). 2. Q: Why is string(72) equal to "H" and not "72"?

A

Converting an integer to string interprets it as a Unicode code point (72 = 'H'); use strconv.Itoa for the digits.

🪶 Feynman Reflection¶

A Go string is a read-only ribbon of bytes. Most letters take one byte, but accented and CJK characters take several. Indexing reads one byte off the ribbon; range is smart enough to read whole characters (runes) and tell me where each begins.

🕳️ Knowledge Gaps¶

Normalization (NFC/NFD) and golang.org/x/text — note as future, non-stdlib.

✅ Summary¶

I understand bytes vs runes, can count and reverse strings correctly under UTF-8, and use strings.Builder for efficient concatenation.

⏭️ Next Steps / Prep for Tomorrow¶

Day 013: structs and struct tags.

Time spent	Difficulty	Confidence
90 min	🟦🟦⬜⬜⬜	🟦🟦🟦⬜⬜

Suggested commit: feat(examples): strings, runes, bytes, utf-8 (day 012)