Skip to content

Day 113 — Protocol Buffers & .proto Files

Month 5 · Week 1 · ⬅ Day 112 · Day 114 ➡ · Journal index

🎯 Learning Objective

Write a proto3 .proto schema fluently and understand exactly how its messages are encoded on the wire so I can reason about compatibility.

📚 Topics

  • proto3 syntax: syntax, package, message, service, scalar types
  • Field numbers, the wire format (varint / length-delimited), and forward/backward compatibility

📖 Reading / Sources

📝 Notes

  • A .proto file is an IDL: a language-neutral contract. protoc turns it into Go (or any language) structs + accessors → [[protobuf]].
  • Field numbers are the identity on the wire — names and declaration order are irrelevant. Numbers 1–15 cost one tag byte; 16–2047 cost two. Reserve hot fields in the 1–15 range.
  • Never reuse or renumber a field. Removing one? reserved 4, 7; reserved "old_name"; so nobody resurrects the number with a different type.
  • proto3 scalar semantics: singular scalars have no explicit presence by default — a field set to its zero value (0, "", false) is indistinguishable from "unset" and is not serialized. Add the optional keyword to get presence (HasName()), or use a wrapper/oneof.
  • Wire types (low 3 bits of each tag): 0 VARINT (ints/bool/enum), 1 I64 (fixed64/double), 2 LEN (string/bytes/message/packed), 5 I32 (fixed32/float). tag = (field_number << 3) | wire_type → [[wire-format]].
  • Varints are base-128, little-endian, MSB = continuation bit. Negative int32/int64 always take 10 bytes; use sint32/sint64 (ZigZag) for frequently-negative numbers.
  • Compatibility is structural: unknown fields are skipped using their wire type, so old readers tolerate new fields. int32int64uint32bool are wire-compatible; switching between e.g. int32 and sint32 is not.
  • A service block declares RPC methods; it generates client/server interfaces but produces no wire bytes itself — covered Day 115.

💻 Code Examples

syntax = "proto3";

package user.v1;
option go_package = "github.com/nabin747/go-from-zero/gen/userv1;userv1";

// A user record. Field numbers — not names — are the wire identity.
message User {
  int32  id    = 1;            // VARINT
  string name  = 2;            // LEN
  string email = 3;            // LEN
  optional string nickname = 4; // explicit presence -> generates HasNickname()
  repeated string roles = 5;   // packed/repeated LEN
  reserved 6, 7;               // never reuse retired numbers
}

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
}

message GetUserRequest { int32 id = 1; }

Wire-format mechanics (varint, zigzag, tags, length-delimited fields) rebuilt with the stdlib: examples/month-05/protowire/main.go · Run: go run ./examples/month-05/protowire

🏋️ Exercises / Practice

Exercise Status Link
Implement protobuf varints + zigzag by hand exercises/month-05/week-1/varint

🐛 Mistakes Made

  • Assumed a string name = 2; set to "" would still appear on the wire — proto3 omits zero-valued singular scalars. Reached for optional to get presence.
  • Renumbered a field while editing the schema, breaking an old client. Lesson: only ever add new numbers; reserved the old ones.

❓ Open Questions

  • When is a oneof better than several optional fields for modeling presence? (Leaning: oneof when the fields are mutually exclusive and you want a single set field.)

🧠 Active Recall (answer without looking)

  1. Q: On the wire, what identifies a field — its name or its number?
    A

Its number. Names exist only in the .proto; the encoded bytes carry (field_number << 3) | wire_type. That's why renaming a field is safe but renumbering is not. 2. Q: In proto3, why might a field you set never show up in the serialized bytes?

A

Singular scalars have no explicit presence: if the value equals the zero value (0/""/false) it is treated as unset and skipped. Use optional (or a message/oneof) when you must distinguish "set to zero" from "unset".

🪶 Feynman Reflection

A .proto file is a contract written once and compiled into every language. On the wire there are no field names, just numbered (tag, value) pairs where the tag packs the field number and a 3-bit type. Because readers skip unknown numbers by type, you can grow a schema forever — as long as you never reuse a number.

🕳️ Knowledge Gaps

  • Packed vs unpacked encoding of repeated scalars across proto2/proto3 defaults.
  • Any, oneof, and well-known types (Timestamp, Duration) — revisit when modeling richer messages.

✅ Summary

I can author a proto3 schema, choose field numbers deliberately, reason about presence semantics, and decode the varint/length-delimited wire format by hand.

⏭️ Next Steps / Prep for Tomorrow

  • Day 114: drive protoc/buf to generate Go code from this schema and wire up the toolchain.

Time spent Difficulty Confidence
90 min 🟦🟦⬜⬜⬜ 🟦🟦🟦⬜⬜

Suggested commit: docs(journal): protobuf and .proto files (day 113)