Skip to main content

Data DeDuping!

Data deduplication, often called "intelligent compression" or "single-instance storage", is a process that uses matching logic to eliminate file records that are duplicates (dupes). It is a method of reducing storage needs by eliminating redundant data and replacing it with a pointer to the unique data copy

Data deduplication offers other benefits. Lower storage space requirements will save money on disk expenditures. The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time objectives (RTO) for a longer time and reduces the need for tape backups. Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.

Deduping is a 3 step process

Step 1: Move the non duplicates (unique tuples) into a temporary table
SELECT * into new_table
FROM old_table
WHERE 1
GROUP BY [column to remove duplicates by];

Step 2: Delete the old table. We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;

Step 3: Rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;

The above set of code works fine if you have to remove duplicate code. What if you just have to mark the duplicate records, but not delete the records

Step 1: Move the non duplicates (unique tuples) into a temporary table
SELECT * into #new_table
FROM old_table
WHERE 1
GROUP BY [column to remove duplicates by];

Step 2: Update the column of the old table to mark it as duplicate
UPDATE old_table
SET old_table.column3 = 1
FROM old_table INNER JOIN #new_table
ON old_table.column2 = #new_table.column2
   
Step 3: Delete the new table
DROP TABLE #new_table


Comments

Popular posts from this blog

JavaScript Interview Questions

This is a compilations of all the interview questions related to Javascript that i have encountered.  Q: Difference between window.onload and onDocumentReady? A: The onload event does not fire until every last piece of the page is loaded, this includes css and images, which means there’s a huge delay before any code is executed. That isnt what we want. We just want to wait until the DOM is loaded and is able to be manipulated. onDocumentReady allows the programmer to do that. Q:  What is the difference between == and === ? A: The == checks for value equality, but === checks for both type and value. Few examples: "1" == 1; // value evaluation only, yields true "1" === 1; // value and type evaluation, yields false "1" == true; // "1" as boolean is true, value evaluation only, yields true "1" === false; // value and type evaluation, yields false Q: What does “1″+2+5 evaluate to? What about 5 + 2 +...

Adding a linked Server using the GUI

Adding a linked Server using the GUI There are two ways to add another SQL Server as a linked server.  Using the first method, you need to specify the actual server name as the “linked server name”.  What this means is that everytime you want to reference the linked server in code, you will use the remote server’s name.  This may not be beneficial because if the linked server’s name changes, then you will have to also change all the code that references the linked server.  I like to avoid this method even though it is easier to initially setup.  The rest of the steps will guide you through setting up a linked server with a custom name: To add a linked server using SSMS (SQL Server Management Studio), open the server you want to create a link from in object explorer. In SSMS, Expand Server Objects -> Linked Servers -> (Right click on the Linked Server Folder and select “New Linked Server”) Add New Linked Server The “New Linked Server” Dialog a...

Lookup!!

LOOKUP As the name suggests, Excel gives us the option to lookup for a number or text in a specific area which needs to be stated. If the value is found the corresponding value or text is returned The syntax for LOOKUP is as follows; =LOOKUP( lookup_value , lookup_vector , result_vector )       In the diagram, column D contains varying salaries, against which there is a company car in column E which corresponds to each salary. For example, a £20030 salary gets a Golf, a £35000 salary gets a Scorpio. A LOOKUP formula can be used to return whatever car is appropriate to a salary figure that is entered. In this case, the lookup_value is the cell where the salary is entered (B13), the lookup_vector is the salary column (D3:D11), and the result_vector is the car column (E3:E11). Hence the formula; =LOOKUP(B13,D3:D11,E3:E11) Typing 40000 in cell B13 will set the lookup_value. LOOKUP will search through the lookup_vector to find the matching salary, and return th...