1

Recently there is a task to parse SQL statement to check the SQL with some custom specification with Python RE module & sqlparse

e.g.

CREATE TABLE `student_info` (
`id` INT (11) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'primary',
`stu_name` VARCHAR (10) NOT NULL DEFAULT '' COMMENT 'username',
`stu_class` VARCHAR (10) NOT NULL DEFAULT '' COMMENT 'class',
`stu_num` INT (11) NOT NULL DEFAULT '0' COMMENT 'study number',
`stu_score` SMALLINT UNSIGNED NOT NULL DEFAULT '0' COMMENT 'total',
`tuition` DECIMAL (5, 2) NOT NULL DEFAULT '0' COMMENT 'fee',
`phone_number` VARCHAR (20) NOT NULL DEFAULT '0' COMMENT 'mobile',
`create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'record created time',
`update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'record updated time',
`status` TINYINT NOT NULL DEFAULT '1' COMMENT 'some comment',
PRIMARY KEY (`id`),
UNIQUE KEY uniq_stu_num (`stu_num`),
KEY idx_stu_score (`stu_score`),
KEY idx_update_time_tuition (`update_time`, `tuition`)
) ENGINE = INNODB charset = utf8mb4 COMMENT 'Student table';

And I try to catch this statement with RE use some specification

  • fields must have COMMENT
  • must have PRIMARY KEY, and PRIMARY KEY must AUTO_INCREMENT
  • Every field must have DEFAULT value
  • ENGINE must be INNODB
  • charset must be utf8mb4

And I use regex pattern like:

create\s+table\s*`\w*`\s*\(\n\s*`([\w\-_]*)`\s*([\w]*).*(auto_increment)([\n\s\w()',`]*)(primary key)\s*\(`([\w\-_]*)`\).*\n.*engine\s*=\s*(InnoDB).*charset\s*=\s*([\w\-]*);

to group all the key information and process later.
[Regex Demo]

But I cannot group every single field information, may be cause the order, can someone fix that regex expression, or just taught me some clue ?

3
  • I disagree with requiring an AUTO_INCREMENT PK.
    – Rick James
    Commented Aug 25, 2017 at 5:35
  • Observe how lame the 'required' comments are.
    – Rick James
    Commented Aug 25, 2017 at 5:36
  • Thanks for replying, the specification is other DBA's specification, i just try to solve this, and in this case comments are just for test, just for convenient.
    – Evilat
    Commented Aug 25, 2017 at 6:24

1 Answer 1

1

Note: I should note that it is not so good to do everything by regex.
Note: You can use multiple steps to validate a string by regex.

So, A start of using regex for it that I can think of it can be:

Step 1: Check whole create command:

"^\s*create\s+table\s*`([a-z]\w+)`\s*\(([\s\S]+)\)\s*
 engine\s*=\s*innodb\s+
 charset\s*=\s*utf8mb4\s+
 comment\s'[^']+'\s*;\s*$
"giu

\1 = name of table
\2 = body of create statement

[Regex Demo]

Step 2: Check structure of body of create command - from \2 -:

"([\s\S]+)\s*
 primary\s+key\s*\(\s*`([a-z]\w*)`\s*\)\s*
 (,\s*unique\s+key\s+uniq_\w+\s*\(`([a-z]\w*)`\))?
 (,\s*key\s+idx_\w+\s*\((\s*,?\s*`([a-z]\w*)`\s*)+\)\s*)+
"giu

\1 = fields info
\2 = primary key field name
\4 = unique key field name
\7 = keys field name

[Regex Demo]

Step 3: Check fields info

"(`([a-z]\w*)`\s+
 (timestamp|(tiny|small|)int(\s+\(\s*\d+\s*\))?(\s+unsigned)?|varchar\s*\(\d+\)|decimal\s*\(\s*\d+\s*,\s*\d+\s*\)))\s+
 not\s+null\s+
 (auto_increment|default\s+('[^']*'|current_timestamp))\s+
 comment\s+'[^']+',
"giu

\2 = fields name

[Regex Demo]

Step 4: Check fields name of Step 2 with fields name of Step 3
And Check primary key field name of Step 2 with \1 of below regex:

"
 `([a-z]\w)*`.+auto_increment
"giu

[Regex Demo]


If you want to have any sort of engine part and charset part your regex will change to:

^\s*create\s+table\s*`([a-z]\w+)`\s*\(([\s\S]+)\)\s*
 ((engine\s*=\s*innodb\s+)(charset\s*=\s*utf8mb4\s+)?|(charset\s*=\s*utf8mb4\s+)(engine\s*=\s*innodb\s+)?)?
 comment\s'[^']+'\s*;\s*
$

[Regex Demo]

HTH

3
  • Thanks @shA.t ;) I'll update code later, and good night from CN
    – Evilat
    Commented Aug 24, 2017 at 16:29
  • Hi, sorry, there is one more Q, sometime after all fields maybe has different order, like engine=xx charset=xxx, or charset=xxx engine=xxx, how can i fix this, (use ?=) ? thanks again
    – Evilat
    Commented Aug 24, 2017 at 16:55
  • OMG! it will be so pain full - plz, check this regex ;).
    – shA.t
    Commented Aug 24, 2017 at 17:09

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.