Skip to main content

Data DeDuping!

Data deduplication, often called "intelligent compression" or "single-instance storage", is a process that uses matching logic to eliminate file records that are duplicates (dupes). It is a method of reducing storage needs by eliminating redundant data and replacing it with a pointer to the unique data copy

Data deduplication offers other benefits. Lower storage space requirements will save money on disk expenditures. The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time objectives (RTO) for a longer time and reduces the need for tape backups. Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.

Deduping is a 3 step process

Step 1: Move the non duplicates (unique tuples) into a temporary table
SELECT * into new_table
FROM old_table
WHERE 1
GROUP BY [column to remove duplicates by];

Step 2: Delete the old table. We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;

Step 3: Rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;

The above set of code works fine if you have to remove duplicate code. What if you just have to mark the duplicate records, but not delete the records

Step 1: Move the non duplicates (unique tuples) into a temporary table
SELECT * into #new_table
FROM old_table
WHERE 1
GROUP BY [column to remove duplicates by];

Step 2: Update the column of the old table to mark it as duplicate
UPDATE old_table
SET old_table.column3 = 1
FROM old_table INNER JOIN #new_table
ON old_table.column2 = #new_table.column2
   
Step 3: Delete the new table
DROP TABLE #new_table


Comments

Popular posts from this blog

Insufficient access rights to perform the operation. (Exception from HRESULT: 0x80072098)

While accessing the active directory (AD) and authorization manager (AZMAN) , If you get “   Insufficient access rights to perform the operation. (Exception from HRESULT: 0x80072098)  “ message check the    account that is being used to get the LDAP query from AD .  ERROR DETAILS Exception Details:  System.Runtime.InteropServices.COMException: Insufficient access rights to perform the operation. (Exception from HRESULT: 0x80072098) Source Error: Line 154:    'Session("FullName") = System.Security.Principal.WindowsIdentity.GetCurrent.Name.ToString() Line 155: Line 156:    If Not User.IsInRole("Role1") Then Line 157:          Response.Redirect("./Login.aspx") Line 158:    End If  Stack Trace : .... SOLVE IT Steps to do check the app pool rights: Click on the website name that you are having problem with in IIS  In the right panel you will see 'Basic Settings'. Click It. Select the specific pool option and enter the name of the ac

Sql Server database Read_Only / Read_Write

The ALTER DATABASE command allows a database administrator to modify SQL Server databases and their files and filegroups. This includes permitting the changing of database configuration options. Why Read Only ? When you need to ensure that the data is a database is not modified by any users or automated processes, it is useful to set the database into a read-only mode. Once read-only, the data can be read normally but any attempts to create, updated or delete table rows is disallowed. This makes the read-only mode ideal when preparing for data migration, performing data integrity checking or when the data is only required for historical reporting purposes. Make Database Read Only USE  [master] GO ALTER DATABASE  [TESTDB]  SET  READ_ONLY  WITH  NO_WAIT GO Make Database Read/Write USE  [master] GO ALTER DATABASE  [TESTDB]  SET  READ_WRITE  WITH  NO_WAIT GO In case you get the following error message make the database single user: Msg 5070, Level 16, Stat

Do's and Don't SQL

Do's: Writing comments whenever something is not very obvious, as it won’t impact the performance.  (--) for single line  (/*…*/) to mark a section Use proper indentation Use Upper Case for all SQL keywords. SELECT, UPDATE, INSERT, WHERE, INNER JOIN, AND, OR, LIKE. Use BEGIN... END block for multiple statements in conditional code  Use Declare and Set in beginning of Stored procedure Create objects in same database where its relevant table exists otherwise it will reduce network performance. Use PRIMARY key in WHERE condition of UPDATE or DELETE statements as this will avoid error possibilities. If User table references Employee table than the column name used in reference should be UserID where User is table name and ID primary column of User table and UserID is reference column of Employee table. Use select column name instead of select * Use CTE (Common Table Expression); its scope is limited to the next statement in SQL query, instead of temporary tables and der