Non-primarykey records delete

IxiRancid

Well-known member
Joined
Jun 16, 2004
Messages
104
Location
Europe
Is there a way to find/match non-primarykey records in two different tables?
problem:

TableA - contains records (John, Dhough, )
TableB - contains same records as TableA (John, Dhough, D)
both tables do not have primary keys

when a transaction is made a record from TableA is replicated into TableB, however it has a filed marked D, matching record from TableB must be then deleted.

Major problem:
as the rows are identical (except D), no primary key, but there could be up to 25 SAME, IDENTIC records in TableB:
1. John, Dhough,
2. John, Dhough,
3. John, Dhough,

So, which one to delete?

Ive been looking at some RID (record identifier), but cant find anywhere if these RIDs are perhaps equal to some byte?
 
The easiest way is to use a primary key.
ALL tables should have a primary key, or at least a unique. How else can you distinguish between two records?

Make sure you have primary keys for your tables and your problems are gone. Along with a lot of other hard-to-trace anomalities ;)

HTH
/Kejpa
 
Yes I know, however this is a 15 year old DB2 system built by outsourcing company and this wasnt meant to be a big transactional base dynamicaly used, more as a storage.
So now we got this problem... I though that something could be done using RIDs or any other suggestions.
This is an issue however: records in TableB ARE ALL EQUAL, so why even bother which one to delete? Just delete one and that doesnt have any effect on others ;)
 
Dont know much about DB2 and RIDs.
By using simple SQL you cant distinguish between identical records. If you try to delete one you deleteem all.
Some database engines support "top"/"limit" or other keyword for restricting the number of affected records, the DB2 syntax and options am I unfortunately unaware of.

/Kejpa
 
Hmm... I wonder how DB2 or SQL actualy inserts records? With no primary key or grouping specified it probably just adds row one after another? However in what order does a normal SELECT * FROM TABLE return these rows?
Ill try tomorrow.
Thanks for suggestions!
 
As I said, Ive never used DB2 my experience is from Access, SQL and mySQL.
If you insert three records they appear as 1, 2, 3. If you then delete #2 and insert a 4th and a 5th the order usually will be 1, 4, 3, 5. Records are inserted where there is "empty" space, if there is no primary key defined. So theres no telling which row was inserted last :(

HTH
/Kejpa
 
If this were me, Id add an "identity" column to your table first. You can make this field up yourself if you want - just make sure its a unique value for every row.

Then youre left with deleting the duplicates. That shouldnt be that hard. What I generally do is something like the following. This assumes you have two fields you want to match on, FirstName and LastName:
Code:
SELECT MIN(MyNewID) AS MyNewID, FirstName, LastName
INTO #temp
FROM Table1
GROUP BY FirstName, LastName

DELETE FROM Table1 WHERE MyNewID NOT IN (SELECT MyNewID FROM #temp)

This puts all matching records in a temp table (works in SQL Server, not sure about DB2). The GROUP BY will get you a single row - you said you dont care which one. The MIN(MyNewID) is to, again, get a single ID per matching set of values. I chose MIN but you could use any aggregate function. You said it didnt matter, so I prefer MIN - it may find it faster if the table has an index.

If you CANT add a column to the existing table, Id suggest first copying the table to a new table where you CAN add the unique column. Do your cleanup there then TRUNCATE the original table and copy everything back.

-ner
 
Nice and constructive suggestion, however there are numerous applications that use this table and the other replicated on SQL. I dont think this would work in a short term...

@Kejpa: thats interesting, didnt know this is the way it works. Does this mean that SQL actualy has some reference where a record was ("empty space")?
---
If you insert three records they appear as 1, 2, 3. If you then delete #2 and insert a 4th and a 5th the order usually will be 1, 4, 3, 5
---
 
In some sense yes.
Access use diskspace for this, a small database where you have made a number of deletes, inserts and updates can easily be over 10MB then when you compact it it shrinks back to fit a diskette.
SQL server (SQL is a language you talk to databases with) uses some other means for detecting "freed space"
Havent studied how mySQL does it, I nowadays always use primaries (<LOL> yeah, right!)

/Kejpa
 
In SQL Server with a clustered index, the engine stores rows on the disk in the order defined by the clustered index. The clustered index does not have to match the primary key, but often does. If you had a clustered index then inserting "row 2" where 4 million rows currently exist (1, followed by 3 through 4 million) would take a LONG time as the engine would have to move rows 3 through 4 million "up" on the disk to make room for row 2.

To go back to your original question - you need to identify and delete duplicate rows but you cant modify the table? The only solution that comes to mind is to use a cursor to loop over the rows and delete inside the loop. That seems really nasty compared to adding a column for a one time cleanup task - you could always remove the column if youre worried it will cause issues.

-ner
 
Heh :)
it is nasty, specialy with the deadline we have. But well probably use outsourcing for that. And yes, we cannot change the table structure.

This was another issue: as I said it really doesnt matter which (identical) record we delete, but we would need to make an logging table which records and when were deleted. That wouls solve this issue, but this is a large operation and we have limited resources in networking and daily deadlines itself.

Ill let you know what the solution was/will be :)
 
Back
Top