Day 113 — Protocol Buffers & .proto Files¶
Month 5 · Week 1 · ⬅ Day 112 · Day 114 ➡ · Journal index
🎯 Learning Objective¶
Write a proto3 .proto schema fluently and understand exactly how its messages are encoded on the wire so I can reason about compatibility.
📚 Topics¶
- proto3 syntax:
syntax,package,message,service, scalar types - Field numbers, the wire format (varint / length-delimited), and forward/backward compatibility
📖 Reading / Sources¶
📝 Notes¶
- A
.protofile is an IDL: a language-neutral contract.protocturns it into Go (or any language) structs + accessors → [[protobuf]]. - Field numbers are the identity on the wire — names and declaration order are irrelevant. Numbers 1–15 cost one tag byte; 16–2047 cost two. Reserve hot fields in the 1–15 range.
- Never reuse or renumber a field. Removing one?
reserved 4, 7; reserved "old_name";so nobody resurrects the number with a different type. - proto3 scalar semantics: singular scalars have no explicit presence by default — a field set to its zero value (
0,"",false) is indistinguishable from "unset" and is not serialized. Add theoptionalkeyword to get presence (HasName()), or use a wrapper/oneof. - Wire types (low 3 bits of each tag):
0VARINT (ints/bool/enum),1I64 (fixed64/double),2LEN (string/bytes/message/packed),5I32 (fixed32/float).tag = (field_number << 3) | wire_type→ [[wire-format]]. - Varints are base-128, little-endian, MSB = continuation bit. Negative
int32/int64always take 10 bytes; usesint32/sint64(ZigZag) for frequently-negative numbers. - Compatibility is structural: unknown fields are skipped using their wire type, so old readers tolerate new fields.
int32↔int64↔uint32↔boolare wire-compatible; switching between e.g.int32andsint32is not. - A
serviceblock declares RPC methods; it generates client/server interfaces but produces no wire bytes itself — covered Day 115.
💻 Code Examples¶
syntax = "proto3";
package user.v1;
option go_package = "github.com/nabin747/go-from-zero/gen/userv1;userv1";
// A user record. Field numbers — not names — are the wire identity.
message User {
int32 id = 1; // VARINT
string name = 2; // LEN
string email = 3; // LEN
optional string nickname = 4; // explicit presence -> generates HasNickname()
repeated string roles = 5; // packed/repeated LEN
reserved 6, 7; // never reuse retired numbers
}
service UserService {
rpc GetUser(GetUserRequest) returns (User);
}
message GetUserRequest { int32 id = 1; }
Wire-format mechanics (varint, zigzag, tags, length-delimited fields) rebuilt with the stdlib:
examples/month-05/protowire/main.go· Run:go run ./examples/month-05/protowire
🏋️ Exercises / Practice¶
| Exercise | Status | Link |
|---|---|---|
| Implement protobuf varints + zigzag by hand | ✅ | exercises/month-05/week-1/varint |
🐛 Mistakes Made¶
- Assumed a
string name = 2;set to""would still appear on the wire — proto3 omits zero-valued singular scalars. Reached foroptionalto get presence. - Renumbered a field while editing the schema, breaking an old client. Lesson: only ever add new numbers;
reservedthe old ones.
❓ Open Questions¶
- When is a
oneofbetter than severaloptionalfields for modeling presence? (Leaning:oneofwhen the fields are mutually exclusive and you want a single set field.)
🧠 Active Recall (answer without looking)¶
- Q: On the wire, what identifies a field — its name or its number?
A
Its number. Names exist only in the .proto; the encoded bytes carry (field_number << 3) | wire_type. That's why renaming a field is safe but renumbering is not.
2. Q: In proto3, why might a field you set never show up in the serialized bytes?A
Singular scalars have no explicit presence: if the value equals the zero value (0/""/false) it is treated as unset and skipped. Use optional (or a message/oneof) when you must distinguish "set to zero" from "unset".
🪶 Feynman Reflection¶
A .proto file is a contract written once and compiled into every language. On the wire there are no field names, just numbered (tag, value) pairs where the tag packs the field number and a 3-bit type. Because readers skip unknown numbers by type, you can grow a schema forever — as long as you never reuse a number.
🕳️ Knowledge Gaps¶
- Packed vs unpacked encoding of
repeatedscalars across proto2/proto3 defaults. Any,oneof, and well-known types (Timestamp,Duration) — revisit when modeling richer messages.
✅ Summary¶
I can author a proto3 schema, choose field numbers deliberately, reason about presence semantics, and decode the varint/length-delimited wire format by hand.
⏭️ Next Steps / Prep for Tomorrow¶
- Day 114: drive
protoc/bufto generate Go code from this schema and wire up the toolchain.
| Time spent | Difficulty | Confidence |
|---|---|---|
| 90 min | 🟦🟦⬜⬜⬜ | 🟦🟦🟦⬜⬜ |
Suggested commit: docs(journal): protobuf and .proto files (day 113)