Hashing data structures with variable-length fields is error-prone because naive string concatenation loses field boundaries, causing collisions. Three strategies are compared: delimiters (simple but fragile with special characters), length prefixes (standard in protocols like Protobuf), and length postfixes (better for streaming and nested structures). Mixing strategies is discouraged unless done carefully. Fixed-size integers are recommended for encoding lengths, while variable-length encodings like VarInt add overhead and complexity. Dynatrace's open-source Hash4j library implements the length-postfix strategy in Java, providing methods like putString, putByteArray, and putOrderedIterable that handle variable-length fields safely and support nested structures.

11m read timeFrom dynatrace.com
Post cover image
Table of contents
A common pitfall: Concatenating variable-length fieldsStrategy 1: DelimitersStrategy 2: Length prefixesStrategy 3: Length postfixesWhat about mixing strategies?Is the length information needed for all variable-length fields?How to serialize the length informationHash4j’s approachEffective serialization strategies for hashing variable-length fields

Sort: