Skip to content

Glossary

BSON

Binary JSON, MongoDB's storage format

By Published Updated

BSON (Binary JSON) is the binary-encoded data format MongoDB uses for storage and over-the-wire communication. JSON-shaped at the API level — same object/array hierarchy — but binary on disk and on the network.

Why a binary format when JSON exists:

  • Type fidelity. JSON has only string, number, boolean, null, object, array. BSON adds Date, Binary, ObjectId, Decimal128, Regex, Timestamp, Int32 vs Int64. Round-trips through BSON preserve the exact type.
  • Self-describing length headers. Every BSON document and value is prefixed with its length, so MongoDB can skip over fields without parsing them. JSON requires reading from the start.
  • Decimal arithmetic. BSON’s Decimal128 type stores exact decimal numbers up to 34 significant digits — important for financial data where IEEE 754 floats round badly.

BSON files are typically 10-20% larger than equivalent JSON because of the length headers and explicit type tags, but query performance is dramatically faster because MongoDB doesn’t have to parse the document to find a field.

Most MongoDB drivers (Node, Python, Java, Go) automatically convert between BSON and the language’s native types. You rarely interact with BSON directly unless you’re writing a driver or reading the bson backup files outside MongoDB itself.

The 16 MB document limit: a single BSON document cannot exceed 16 megabytes. This is a hard MongoDB-server constraint, not a BSON-spec limit — the format itself uses 32-bit signed lengths and could theoretically reach 2 GB. The 16 MB cap protects against degenerate use cases (storing entire books or image binaries inside a document) and matches the size of the typical network packet window. Document growth past the limit forces a GridFS workaround (split the payload into chunks across multiple documents), or a redesign where large blobs live in object storage with the document holding a reference.

Why BSON’s ObjectId is 12 bytes, not a UUID: MongoDB’s default _id is a 12-byte ObjectId composed of a 4-byte Unix timestamp, a 5-byte random “process” identifier (replaces the older machine-id/PID split), and a 3-byte incrementing counter. The structure provides time-ordering (newest documents sort naturally to the end) and avoids the coordination cost of a UUID4. The trade-off is 12 bytes vs 16 for UUID, which over a billion documents saves 4 GB of primary-key storage. Newer collections often use UUID v7 instead for cross-database compatibility. Related: IEEE 754, BigInt. Reference: BSON Specification.

Worked example

The JSON document {"x": 1} is 8 bytes as text. The same document in BSON is 0C 00 00 00 10 78 00 01 00 00 00 00 — 12 bytes: 4 bytes total length, 1 byte type code (0x10 = int32), the field name x as a null-terminated string (2 bytes), 4 bytes for the integer value, and a 1-byte document terminator. JSON is shorter for tiny payloads but BSON wins on parse speed and on documents with non-string types. A real-world example: storing a financial transaction with a Decimal128 price of 123.45 rounds-trips exactly through BSON; through JSON it becomes 123.45 as a float and risks coming back as 123.44999999999999 in JavaScript. The Decimal128 representation is 16 bytes regardless of the magnitude.

When and why it matters

If you build any app on MongoDB and persist money, you should be storing it as Decimal128, not as a JavaScript Number serialised to JSON. Mongoose schemas default to Number, which becomes double in BSON, which silently introduces IEEE-754 rounding on prices like 0.10 + 0.20. The same applies to dates: storing dates as strings (“2026-01-15”) loses indexable ordering across timezone boundaries; storing them as BSON Date preserves millisecond precision and allows range queries to use B-tree indexes efficiently. When migrating data into or out of MongoDB, use mongoexport --jsonFormat=canonical rather than relaxed mode to preserve BSON type information through the JSON intermediate. Reference: MongoDB Manual — BSON Types.

Frequently asked questions

What is BSON?
BSON (Binary JSON) is the binary-encoded serialisation format MongoDB uses for storing and transmitting documents. It extends JSON with additional data types — including dates, binary data, ObjectId, and 64-bit integers — and is designed for fast traversal and in-place updates.
What types does BSON support that JSON does not?
BSON adds: Date (64-bit milliseconds since epoch), ObjectId (12-byte unique ID), Int32, Int64, Decimal128 (for exact decimals), Binary, Regular Expression, and Undefined. JSON only has string, number, boolean, null, array, and object.
Is BSON more space-efficient than JSON?
Not necessarily — BSON embeds field names and type information in each document, often making it larger than compact JSON. Its advantage is speed: length-prefixed strings and arrays allow O(1) size lookups, enabling faster parsing and in-place field updates without re-serialising the whole document.
What is a MongoDB ObjectId and how is it constructed?
An ObjectId is a 12-byte BSON value used as the default _id. It encodes a 4-byte Unix timestamp, a 5-byte random value unique to the machine and process, and a 3-byte incrementing counter — making it sortable by creation time and globally unique without a central coordinator.

Related

Published May 15, 2026 · Last reviewed May 31, 2026