Skip to content

Glossary

Lossless compression

Compression that preserves every byte

By Published Updated

Lossless compression reduces file size while preserving every byte of the original. Decompressing produces output bit-identical to the input. Tradeoff: smaller savings than lossy compression — typically 30-70% size reduction depending on content.

How it works: lossless algorithms find statistical patterns (repeated substrings, predictable sequences) and encode them with shorter representations. Two classic families:

  • Dictionary-based (LZ77, LZ78, LZW): build a dictionary of seen substrings and emit back-references. The basis of DEFLATE, gzip, ZIP.
  • Entropy coding (Huffman, arithmetic coding, ANS): assign shorter binary codes to more frequent symbols. Typically combined with dictionary methods.

Common lossless formats:

  • PNG — images (uses DEFLATE)
  • FLAC — audio (preserves 16-24 bit PCM, typically 50-60% the size of WAV)
  • ZIP, gzip, Brotli, Zstandard — general data
  • WebP and AVIF — both support lossless modes
  • Git pack files — source-code repository storage

Use lossless when you need bit-perfect reproduction, when the content will be edited further, or when the file is text/structured data (which doesn’t compress well lossily anyway).

The information-theoretic ceiling: Claude Shannon’s 1948 paper established that lossless compression cannot drop below the source’s entropy — the average information per symbol. For random data (random bytes, encrypted ciphertext, already-compressed files), the entropy is maximal and lossless compression achieves essentially zero savings. This is why “gzip image.jpg” gains almost nothing; the JPEG bytes already look random to a compressor. The corollary: if your compression ratio is suspiciously good on data that should be high-entropy, you’ve probably found a bug.

Lossless on lossy data — when it pays: a common confusion is reaching for FLAC over a 128 kbps MP3 source, expecting better audio quality. The MP3 has already discarded information; FLAC just losslessly preserves the discarded version. For audio that originated as 16-bit PCM (CDs, studio masters), FLAC is the right archival choice. For audio that originated lossy, transcoding to FLAC only inflates the file. The general rule: store the master in the highest-quality lossless format that the source supports; deliver via the best lossy format the consumer can play. Related: DEFLATE, lossy, entropy. Reference: Shannon CE, A Mathematical Theory of Communication (Bell Syst Tech J, 1948).

Worked example: compressing a 10 MB log file

A typical 10 MB application log (JSON lines with timestamps, level, message, repeated field names) is highly redundant. Real-world numbers from a recent benchmark on the same input: gzip default level ≈ 1.6 MB (84% reduction, 0.2 s encode), Brotli level 6 ≈ 1.1 MB (89%, 0.5 s), Zstandard level 3 ≈ 1.3 MB (87%, 0.05 s), Zstandard level 19 ≈ 0.9 MB (91%, 1.8 s). Random bytes (10 MB from /dev/urandom) compress to within a few bytes of 10 MB in every algorithm — incompressible because high entropy. Already-PNG images shrink another 1-3% under gzip -9, which is why HTTP servers typically skip Content-Encoding: gzip on PNG/JPEG/MP4 responses to save CPU.

Choosing an algorithm in 2026

For web delivery: Brotli at quality 5-6 for static assets (best ratio at acceptable encode time, supported in every modern browser since 2017), gzip as fallback for legacy clients. For internal storage and pipelines: Zstandard, which dominates the compression-ratio-vs-speed Pareto frontier at most quality levels and is now the default in tar, Linux kernel modules, RocksDB, and the npm package format. For archival of irreplaceable masters: still use a wrapper that includes a checksum (xz with SHA-256, or zip with CRC + external SHA-256) — compression itself does not detect bitrot. Reference: RFC 8878 — Zstandard Compression and the application/zstd Media Type.

Frequently asked questions

What is lossless compression?
Lossless compression reduces file size using algorithms (like DEFLATE, LZ77, or Huffman coding) that encode redundancy, allowing the original data to be reconstructed exactly. No information is discarded.
What are common examples of lossless formats?
PNG and WebP-lossless for images, FLAC and ALAC for audio, ZIP and GZIP for files, and GIF (limited palette) are all lossless. Decompressing them always yields bit-for-bit identical data to the original.
What is the difference between lossless and lossy compression?
Lossless compression preserves every bit; lossy compression discards information the encoder deems imperceptible (JPEG quantisation, MP3 frequency masking) to achieve higher compression ratios. Lossy files cannot be perfectly restored.
When should I choose lossless over lossy?
Use lossless for source assets, documents, code, and anything that will be edited or re-compressed — repeated lossy re-encoding accumulates artefacts. Use lossy for delivery formats (web images, streaming audio) where file size matters more than perfect fidelity.

Related

Published May 15, 2026 · Last reviewed May 31, 2026