Skip to content

Day 012 — Strings, Runes, Bytes & UTF-8

Month 1 · Week 2 · ⬅ Day 011 · Day 013 ➡ · Journal index

🎯 Learning Objective

Reason about strings as immutable UTF-8 bytes, and convert correctly between bytes, runes, and strings.

📚 Topics

  • Strings as immutable UTF-8 byte sequences
  • Bytes vs runes · range decoding
  • unicode/utf8, strings, strings.Builder

📖 Reading / Sources

📝 Notes

  • A string is an immutable sequence of bytes, conventionally UTF-8. You cannot assign s[i] = ... → [[immutable-strings]].
  • Indexing a string yields a single byte (uint8), which may be only part of a multi-byte rune. len(s) counts bytes, not characters.
  • range over a string decodes UTF-8: the index is the byte offset where each rune starts, and the value is a rune (int32 code point) → [[range-string]].
  • utf8.RuneCountInString(s) gives the number of code points; []rune(s) gives one element per code point (allocates) and lets you index/slice by character.
  • []byte(s) and string(b) convert with a copy (strings are immutable). string(65) converts a code point → "A", not "65" — use strconv for numbers → [[string-int-trap]].
  • A "character" the user sees (grapheme) can be multiple runes (combining marks, emoji ZWJ). Rune ≈ code point, not always a visible glyph.
  • Build strings in a loop with strings.Builder (amortized, no per-+ allocation); + in a hot loop is O(n²) → [[strings-builder]].

💻 Code Examples

s := "héllo, 世界"
fmt.Println(len(s))                    // 14 bytes
fmt.Println(utf8.RuneCountInString(s)) // 9 runes
for i, r := range s {                  // i = byte offset, r = rune
    fmt.Printf("%d:%c ", i, r)
}

Full code: examples/month-01/strings-runes/main.go · Run: go run ./examples/month-01/strings-runes

🏋️ Exercises / Practice

Exercise Status Link
Reverse a string by rune (+ palindrome) exercises/month-01/week-2/runereverse

🐛 Mistakes Made

  • Reversed a string by bytes → multi-byte runes turned into garbage. Converted to []rune first.
  • Used string(n) to stringify an int → got a stray character. Switched to strconv.Itoa.

❓ Open Questions

  • How do I count user-perceived characters (graphemes)? (Needs grapheme segmentation, outside the stdlib.)

🧠 Active Recall (answer without looking)

  1. Q: What does indexing s[i] return, and what does range s yield?
    A

s[i] is a single byte; range s yields the byte offset and the decoded rune (code point). 2. Q: Why is string(72) equal to "H" and not "72"?

A

Converting an integer to string interprets it as a Unicode code point (72 = 'H'); use strconv.Itoa for the digits.

🪶 Feynman Reflection

A Go string is a read-only ribbon of bytes. Most letters take one byte, but accented and CJK characters take several. Indexing reads one byte off the ribbon; range is smart enough to read whole characters (runes) and tell me where each begins.

🕳️ Knowledge Gaps

  • Normalization (NFC/NFD) and golang.org/x/text — note as future, non-stdlib.

✅ Summary

I understand bytes vs runes, can count and reverse strings correctly under UTF-8, and use strings.Builder for efficient concatenation.

⏭️ Next Steps / Prep for Tomorrow

  • Day 013: structs and struct tags.

Time spent Difficulty Confidence
90 min 🟦🟦⬜⬜⬜ 🟦🟦🟦⬜⬜

Suggested commit: feat(examples): strings, runes, bytes, utf-8 (day 012)