Recently there is a task to parse SQL statement to check the SQL with some custom specification with Python RE module & sqlparse
e.g.
CREATE TABLE `student_info` (
`id` INT (11) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'primary',
`stu_name` VARCHAR (10) NOT NULL DEFAULT '' COMMENT 'username',
`stu_class` VARCHAR (10) NOT NULL DEFAULT '' COMMENT 'class',
`stu_num` INT (11) NOT NULL DEFAULT '0' COMMENT 'study number',
`stu_score` SMALLINT UNSIGNED NOT NULL DEFAULT '0' COMMENT 'total',
`tuition` DECIMAL (5, 2) NOT NULL DEFAULT '0' COMMENT 'fee',
`phone_number` VARCHAR (20) NOT NULL DEFAULT '0' COMMENT 'mobile',
`create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'record created time',
`update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'record updated time',
`status` TINYINT NOT NULL DEFAULT '1' COMMENT 'some comment',
PRIMARY KEY (`id`),
UNIQUE KEY uniq_stu_num (`stu_num`),
KEY idx_stu_score (`stu_score`),
KEY idx_update_time_tuition (`update_time`, `tuition`)
) ENGINE = INNODB charset = utf8mb4 COMMENT 'Student table';
And I try to catch this statement with RE use some specification
- fields must have
COMMENT
- must have
PRIMARY KEY
, andPRIMARY KEY
mustAUTO_INCREMENT
- Every field must have
DEFAULT
value ENGINE
must beINNODB
charset
must beutf8mb4
And I use regex pattern like:
create\s+table\s*`\w*`\s*\(\n\s*`([\w\-_]*)`\s*([\w]*).*(auto_increment)([\n\s\w()',`]*)(primary key)\s*\(`([\w\-_]*)`\).*\n.*engine\s*=\s*(InnoDB).*charset\s*=\s*([\w\-]*);
to group all the key information and process later.
[Regex Demo]
But I cannot group every single field information, may be cause the order, can someone fix that regex expression, or just taught me some clue ?
AUTO_INCREMENT
PK.