denis7774@gmail.com
Denis Reva
zhuyifei1999@gmail.com
YiFei Zhu
DwarFS is a read-only file system with a focus on achieving very high compression ratios in particular for very redundant data.
This probably doesn't sound very exciting, because if it's redundant, it should compress well. However, I found that other read-only, compressed file systems don't do a very good job at making use of this redundancy. See here for a comparison with other compressed file systems.
DwarFS also doesn't compromise on speed and for my use cases I've found it to be on par with or perform better than SquashFS. For my primary use case, DwarFS compression is an order of magnitude better than SquashFS compression, it's 4 times faster to build the file system, it's typically faster to access files on DwarFS and it uses less CPU resources.
Distinct features of DwarFS are:
* Clustering of files by similarity using a similarity hash function. This makes it easier to exploit the redundancy across file boundaries.
* Segmentation analysis across file system blocks in order to reduce the size of the uncompressed file system. This saves memory when using the compressed file system and thus potentially allows for higher cache hit rates as more data can be kept in the cache.
* Highly multi-threaded implementation. Both the file system creation tool as well as the FUSE driver are able to make good use of the many cores of your system.
* Optional experimental Python support to provide custom filtering and ordering functionality.
mhx/dwarfs