Abstract:
As a data reduction technology, the deduplication technology realizes the compression function of highly redundant data sets, and can effectively solve the overhead cost which is caused by the waste of space in the storage system. Compared to the previous studies which were mainly based on small-scale static snapshots or short-coverage snapshots, the highest deduplication ratio can be achieved by using large-scale snapshots with a long-coverage time. The large-scale snapshots are selected from the shared user file system. The characteristics of backup datasets from files, data blocks, and users are studied, and the advantages and disadvantages of different data partitioning methods and strategies are analyzed. The proposed result suggests a reference for future deduplication system design.