For questions where a task is to be applied to only one instance of multiple copies of data (files or blocks of data on a filesystem, or strings in a text), or where duplicates of the first such instance are to be ignored for space/time saving purposes.
Deduplication is the elimination or disregarding of identical copies of data during processing. It mainly occurs in two contexts:
- Space saving/speedup on file storage and transfer systems. This can mean to scan a file system for multiple copies of the same file, and removing all but one of those found. On a lower level, the same can apply to blocks of data found on the filesystem. Alternatively, it can mean the identification of files or data blocks already encountered when transferring/backing up data, and skipping any duplicates to reduce backup size/transfer volume.
- Eliminating/preventing repeated copies of a (sub)string in a larger string or text file. In this case, the task may be to scan a given string for multiple instances of a given substring, or when appending text to a file or string, identifying which parts of the text to be added are already present on the destination and skipping them upon output.