We are migrating from 5.5 enterprise mysql to 8.0 mysql enterprise edition. And we are migrating version by version like from 5.5 to 5.6 then 5.6 to 5.7 and lastly from 5.7 to 8.0 also os migration is there from the rhel5. 4 to 8.4 it's all okay but client want data validation value by value. Normal data validations like row counts, and checksum is not sufficient to client as data is senstive. Suggest any good tool, script or query to check value to value data validation
2 Answers
I would use pt-table-sync.
This is a free tool from Percona. It's a command-line tool, implemented in Perl. You use it by giving arguments describing connections to both the original database instance and the new database instance. It compares row by row (really small groups of rows at a time), and then generates a diff, so that you could make the new instance have the same data as the original.
You can use the tool to execute the change to synchronize the data, or optionally you can make it only report the diff, and do nothing to change data. Read the documentation I linked to for more information.
-
is this tool capable enough to validate row by row with 2 TB of data size?Sameer Bille– Sameer Bille2022-03-30 17:17:08 +00:00Commented Mar 30, 2022 at 17:17
-
1Yes, it works by iterating over the primary key of the table, comparing small groups of rows. There's no reason it would not work regardless of table size, it will just take time. I have used it on some pretty big tables (hundreds of GB), although I can't claim that I've used it for a 2TB database. Anyway, that is the tool I can recommend.Bill Karwin– Bill Karwin2022-03-30 17:20:10 +00:00Commented Mar 30, 2022 at 17:20
-
We are working on python script for compare data. Tool is ignored :(Sameer Bille– Sameer Bille2022-04-07 06:53:52 +00:00Commented Apr 7, 2022 at 6:53
-
Ok then, good luck!Bill Karwin– Bill Karwin2022-04-07 14:21:58 +00:00Commented Apr 7, 2022 at 14:21
(This addresses the Comment about the format of DATETIME and TIMESTAMP changing.)
There are (based on documentation) two ways that the internal format is changed:
If you dump (eg, via
mysqldump) the data (from the old version) and load it on the new server with a newer version, the format change is changed without any hint that it is happening. This is because of converting from the old internal format to a string, then to the new internal format.If you "upgrade in place", then the upgrade process takes the time to rebuild each table that has a datetime or timestamp. (There may be some option for delaying the format change, but I am not familiar with the details.)
The bottom line is that format change is automatic, though may add time to the upgrade.
The internal structure of an InnoDB tables representation on disk makes it essentially impossible to do anything akin to a byte-by-byte file comparison. (This statement is valid even when there is no change to an internal data format.)
You can, however, copy all the mysql files from one disk to another, then verify byte-by-byte that the files were still intact. This is a simple way to create a new Replica. However, once mysqld touches a table, the simple byte comparison is rendered invalid.