Why Linux Archives Are Confusing (And How to Think About Them)
If you've ever downloaded software from a Linux package repository or received a server backup file, you've encountered extensions like .tar.gz, .tar.bz2, .tar.xz, or just .tgz. These aren't just formats — they're the result of combining two separate tools that do different jobs.
The confusion comes from this separation: TAR is an archiver (it bundles files together into one file), and GZ / BZ2 / XZ are compressors (they reduce file size). They're chained together, which is why you get compound extensions like .tar.gz. Understanding this distinction makes the whole ecosystem click.
This guide covers what each format does, how to create and extract archives on Linux, macOS, and Windows, when to use each compression method, and how to convert between formats when you receive a file in one format and need another.
The Formats Explained
TAR (Tape Archive)
TAR is over 40 years old. Originally designed for writing files to magnetic tape, it's now the standard Unix bundling tool. TAR does no compression — it just concatenates files into a single stream with metadata (filenames, permissions, ownership). A .tar file is often larger than the sum of its contents because of header overhead.
TAR's role today: bundle multiple files/directories into a single file, then compress that bundle with a separate tool.
GZ (GNU Zip / gzip)
Gzip is the most common compression layer for TAR archives. It uses the DEFLATE algorithm — the same algorithm inside ZIP files. GZ compression is fast, widely supported, and produces reasonable compression ratios. .tar.gz (also written .tgz) is the most common Linux software distribution format.
Typical compression ratio vs uncompressed: 40–70% reduction for mixed content.
BZ2 (bzip2)
Bzip2 uses the Burrows-Wheeler transform algorithm, which achieves better compression than DEFLATE but at the cost of speed. BZ2 compression takes 2–5x longer than GZ and requires more memory. The files are generally 5–15% smaller than equivalent GZ archives.
.tar.bz2 was the dominant format for Linux source distributions in the 2000s and 2010s. You still encounter it frequently with older software distributions.
When to use BZ2: When file size matters and compression speed doesn't — like distributing software releases where you compress once and thousands of people download.
XZ (LZMA2)
XZ uses the LZMA2 algorithm, which achieves significantly better compression than both GZ and BZ2. A .tar.xz archive is typically 20–30% smaller than the same .tar.gz archive. The trade-off: XZ is the slowest to compress — sometimes 5–10x slower than gzip.
Modern Linux kernel source releases ship as .tar.xz for this reason. The kernel project compresses once; millions of developers download. Saving 20% on a 120 MB archive is meaningful at scale.
| Format | Extension | Algorithm | Speed | Compression | Memory Use |
|---|---|---|---|---|---|
| TAR only | .tar | None | Fastest | None | Low |
| TAR+GZ | .tar.gz/.tgz | DEFLATE | Fast | Good | Low |
| TAR+BZ2 | .tar.bz2 | BWT+HC | Slow | Better | Medium |
| TAR+XZ | .tar.xz | LZMA2 | Very slow | Best | High |
| ZIP | .zip | DEFLATE | Fast | Good | Low |
Creating Archives
Creating TAR.GZ Archives
# Archive a single directory
tar -czf archive.tar.gz /path/to/directory/
# Archive multiple files/directories
tar -czf archive.tar.gz file1.txt file2.txt /path/to/dir/
# Archive with verbose output (shows files being added)
tar -czvf archive.tar.gz /path/to/directory/
# Archive while excluding specific files
tar -czf archive.tar.gz --exclude="*.log" --exclude=".git" /path/to/directory/
Flag breakdown: -c creates, -z uses gzip compression, -f specifies the output filename, -v verbose.
Creating TAR.BZ2 Archives
# Same as tar.gz but replace -z with -j
tar -cjf archive.tar.bz2 /path/to/directory/
# Verbose
tar -cjvf archive.tar.bz2 /path/to/directory/
Creating TAR.XZ Archives
# Replace -z or -j with -J (capital J)
tar -cJf archive.tar.xz /path/to/directory/
# Verbose
tar -cJvf archive.tar.xz /path/to/directory/
# With maximum compression (very slow)
XZ_OPT=-9 tar -cJf archive.tar.xz /path/to/directory/
Extracting Archives
Extracting is simpler because modern tar auto-detects the compression format:
# Extract any .tar.* archive (auto-detects gzip/bz2/xz)
tar -xf archive.tar.gz
tar -xf archive.tar.bz2
tar -xf archive.tar.xz
# Extract to a specific directory
tar -xf archive.tar.gz -C /target/directory/
# List contents without extracting
tar -tf archive.tar.gz
# Extract a specific file from the archive
tar -xf archive.tar.gz path/inside/archive/file.txt
Pro Tip: Always list the archive contents with tar -tf before extracting. Some archives don't have a top-level directory, meaning extraction will dump files directly into your current directory. If that's the case, create a subdirectory and extract there.
Converting Between Archive Formats
You'll often receive a .tar.bz2 and need .tar.gz (for software that only accepts gzip), or a .tar.xz and need to convert for a system with old tooling. Converting without re-decompressing the entire archive requires piping:
# Convert tar.gz to tar.bz2 (decompress gz, re-compress bz2)
gunzip -c archive.tar.gz | bzip2 > archive.tar.bz2
# Convert tar.bz2 to tar.gz
bunzip2 -c archive.tar.bz2 | gzip > archive.tar.gz
# Convert tar.xz to tar.gz
xz -dc archive.tar.xz | gzip > archive.tar.gz
# Convert tar.gz to tar.xz (best compression)
gunzip -c archive.tar.gz | xz -c > archive.tar.xz
These pipelines are memory-efficient — they stream data rather than decompressing to disk first. For very large archives (multi-GB), this approach avoids running out of disk space.
For archiving on Windows without the command line, use our RAR to ZIP converter for common archive format conversions, or the full document converter hub for file format needs.
Extracting on Windows and macOS
Windows
Windows 11 natively supports .tar, .tar.gz, and .zip extraction via File Explorer (right-click → Extract). For .tar.bz2 and .tar.xz, you need a third-party tool:
- 7-Zip (free): handles all formats including
.tar.bz2,.tar.xz,.tar.xz - WinRAR: handles most formats but isn't free after trial
Via PowerShell (Windows 10+):
# Extract tar.gz
tar -xzf archive.tar.gz -C C:\target\directory\
# List contents
tar -tf archive.tar.gz
Windows PowerShell 5.1+ includes a tar command that mirrors the Unix behavior.
macOS
macOS includes tar with full format support. Same commands as Linux work identically. For GUI extraction, Archive Utility (built-in) handles .tar.gz and .tar.bz2. For .tar.xz, install the xz utility via Homebrew: brew install xz.
Archive Best Practices for Backups
For long-term backup archiving, format choice matters more than for distribution:
Use .tar.xz when:
- You're archiving once and storing for years
- Storage space matters
- Backup files are large (hundreds of MB or more)
- You can tolerate slow compression
Use .tar.gz when:
- You're creating backups automatically (cron jobs, scripts)
- Speed matters and storage is cheap
- The archive will be transferred over a network frequently
Avoid .tar.bz2 for new work — .tar.xz achieves better compression at similar speed and has largely superseded BZ2 for scenarios where you'd choose BZ2 over GZ.
For archiving mixed document/file libraries, see our guide on the best format for archiving documents for additional considerations around format longevity and tooling support.
File Size Comparison (Real-World Example)
To illustrate the trade-offs, here's a real benchmark on a 500 MB directory of source code:
| Format | Size | Compression Time | Ratio vs TAR |
|---|---|---|---|
.tar (none) | 515 MB | 0.3s | 1.0× |
.tar.gz | 128 MB | 4.2s | 0.25× |
.tar.bz2 | 118 MB | 18.1s | 0.23× |
.tar.xz | 98 MB | 47.3s | 0.19× |
.zip | 131 MB | 5.1s | 0.25× |
XZ achieves the smallest file size (19% of original) but takes 10× longer than gzip. For a one-time archive, 47 seconds is trivial. For an automated backup running every hour, that overhead matters.
ZIP vs TAR+GZ: When to Use Which
ZIP and .tar.gz are often interchangeable for end users, but there are differences that matter in technical contexts:
TAR+GZ advantages:
- Better compression (compresses the entire archive as a single stream; ZIP compresses each file independently)
- Preserves Unix file permissions, ownership, and symbolic links
- Better for distributing software on Linux/macOS
ZIP advantages:
- Random access — you can extract a single file without decompressing the whole archive
- Native support in Windows and macOS without extra tools
- Better for sharing with Windows users
For a broader comparison of archive formats including RAR and 7z, see our ZIP vs RAR vs 7Z guide.
Frequently Asked Questions
What does tar: Error opening archive: Failed to open 'file.tar.gz': No such file mean?
The file path is wrong. Run ls in the directory to confirm the filename and path. Paths are case-sensitive on Linux — Archive.tar.gz and archive.tar.gz are different files.
Can I add files to an existing TAR archive?
Yes, but only to uncompressed .tar files. You can't append to compressed archives (.tar.gz, .tar.bz2) directly. To add a file to a compressed archive, you'd need to decompress it, append, and re-compress. For frequently-modified archives, keep an uncompressed .tar and compress to .tar.gz only when distributing.
Why does my .tar.gz file show as 0% compression?
This happens when you compress already-compressed content (JPEG images, MP3 audio, ZIP files, etc.). These formats are already compressed and won't benefit from gzip. The archive will be slightly larger than the source files because of TAR headers. In this case, use .tar without compression and let the individual file formats handle their own compression.
Is .tgz the same as .tar.gz?
Yes. .tgz is just a shorter extension for .tar.gz. The file format is identical.
How do I extract a specific directory from a large archive?
tar -xf archive.tar.gz --strip-components=1 archive-name/subdir/
The --strip-components=1 removes the top-level directory prefix from extracted paths.
Conclusion
TAR, GZ, BZ2, and XZ each serve specific purposes in the Unix toolchain. For most batch archiving and software distribution needs, .tar.gz remains the pragmatic choice — it's fast, universally supported, and well-understood. For size-critical distributions, .tar.xz delivers meaningfully better compression. BZ2 still works but XZ has superseded it for new projects.
On the conversion side, piped decompression/recompression handles format changes without needing temporary disk space. And for situations where you need to process archives at scale or convert between formats as part of a larger batch processing workflow, the TAR command-line interface is both powerful and composable.



