ADVANCED FILE SYSTEMS

btrfs openzfs

Btrfs

Btrfs is a modern copy on write (CoW) filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Btrfs is designed to be a multipurpose filesystem, scaling well on very large block devices.

OpenZFS

OpenZFS is a combined file system and logical volume manager include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair.

  • Extent based file storage
  • 2^64 byte == 16 EiB maximum file size (practical limit is 8 EiB due to Linux VFS)
  • Space-efficient packing of small files
  • Space-efficient indexed directories
  • Dynamic inode allocation
  • Writable snapshots, read-only snapshots
  • Subvolumes (separate internal filesystem roots)
  • Checksums on data and metadata (crc32c)
  • Compression (zlib and LZO)
  • Integrated multiple device support
  • File Striping
  • File Mirroring
  • File Striping+Mirroring
  • Single and Dual Parity implementations (experimental, not production-ready)
  • SSD (flash storage) awareness (TRIM/Discard for reporting free blocks for reuse) and optimizations (e.g. avoiding unnecessary seek optimizations, sending writes in clusters, even if they are from unrelated files. This results in larger write operations and faster write throughput)
  • Efficient incremental backup
  • Background scrub process for finding and repairing errors of files with redundant copies
  • Online filesystem defragmentation
  • Offline filesystem check
  • In-place conversion of existing ext3/4 file systems
  • Seed devices. Create a (readonly) filesystem that acts as a template to seed other Btrfs filesystems. The original filesystem and devices are included as a readonly starting point for the new filesystem. Using copy on write, all modifications are stored on different devices; the original is unchanged.
  • Subvolume-aware quota support
  • Send/receive of subvolume changes
  • Efficient incremental filesystem mirroring
  • Batch, or out-of-band deduplication (happens after writes, not during)
  • ZFS does away with partitioning,  available disks (of any size) are used to the best of their ability.
  • Compression can be used to increase bandwidth. 
  • ZFS is 128-bit, meaning it is very scalable. (e.g. 16 exabyte limit).
  • State-of-the-art unsurpassed data reliability (including silent corruption detection and automatic data repair, if available) from end to end, thanks to transparent checksums in parent block pointers.
  • Elimination of the RAID5 write hole that usually requires battery-powered disk technology or UPS technology to keep RAID arrays consistent in case of power failures
  • Online array reconstruction / reassemble, in a fraction of the time usually required to reconstruct an array, thanks to reconstruction dependent on contents rather than on disk topology
  • Online check (scrub) and online resilver of faulty, to verify and preserve the validity of on-disk data in the face of silent disk data corruption
  • Automatic deduplication and compression of data, selectable per volume or filesystem according to administrator policy
  • Dynamic striping across all devices to maximize throughput
  • Copy-on-write design makes most disk writes sequential
  • Multiple block sizes, automatically chosen to match workload
  • Explicit I/O priority with deadline scheduling
  • Globally optimal I/O sorting and aggregation
  • Multiple independent prefetch streams with automatic length and stride detection for maximum streaming performance
  • Unlimited, instantaneous read/write snapshots
  • Parallel, constant-time directory operations

Comparison between Btrfs and ZFS

From one point of view, the two file systems are very similar: they are copy-on-write checksummed file systems with multi-device support and writable snapshots. From other points of view, they are wildly different: file system architecture, development model, maturity, license, and host operating system, among other things. Btrfs organizes everything on disk into a btree of extents containing items and data. ZFS organizes everything on disk into a tree of block pointers, with different block sizes depending on the object size. btrfs checksums and reference-counts extents, ZFS checksums and reference-counts variable-sized blocks. Both file systems write out changes to disk using copy-on-write – extents or blocks in use are never overwritten in place, they are always copied somewhere else first. Btrfs is more suitable to storage than that of ZFS. One of the major problems with the ZFS approach – “slabs” of blocks of a particular size – is fragmentation. Each object can contain blocks of only one size, and each slab can only contain blocks of one size. You can easily end up with, for example, a file of 64K blocks that needs to grow one more block, but no 64K blocks are available, even if the file system is full off nearly empty slabs of 512 byte blocks, 4K blocks, 128K blocks, etc. In contrast, the items-in-a-btree approach is extremely space efficient and flexible. Defragmentation is an ongoing process – repacking the items efficiently is part of the normal code path preparing extents to be written to disk. Doing checksums, reference counting, and other assorted metadata busy-work on a per-extent basis reduces overhead and makes new features (such as fast reverse mapping from an extent to everything that references it) possible.