Data deduplication, often called "intelligent compression" or "single-instance storage", is a process that uses matching logic to eliminate file records that are duplicates (dupes). It is a method of reducing storage needs by eliminating redundant data and replacing it with a pointer to the unique data copy
Data deduplication offers other benefits. Lower storage space requirements will save money on disk expenditures. The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time objectives (RTO) for a longer time and reduces the need for tape backups. Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.
Data deduplication offers other benefits. Lower storage space requirements will save money on disk expenditures. The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time objectives (RTO) for a longer time and reduces the need for tape backups. Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.
Deduping is a 3 step process
Step 1: Move the non duplicates (unique tuples) into a temporary table
SELECT * into new_table
FROM old_table
WHERE 1
GROUP BY [column to remove duplicates by];
Step 2: Delete the old table. We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;
Step 3: Rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;
The above set of code works fine if you have to remove duplicate code. What if you just have to mark the duplicate records, but not delete the records
Step 1: Move the non duplicates (unique tuples) into a temporary table
SELECT * into #new_table
FROM old_table
WHERE 1
GROUP BY [column to remove duplicates by];
Step 2: Update the column of the old table to mark it as duplicate
UPDATE old_table
SET old_table.column3 = 1
FROM old_table INNER JOIN #new_table
ON old_table.column2 = #new_table.column2
Step 3: Delete the new table
DROP TABLE #new_table
Comments
Post a Comment