I've added a JSON format column metadata_json to a database table digilearning_support_files, and am trying to do an index on a value that will be stored in there.
In the MySQL client terminal, the following works fine:
mysql> ALTER TABLE digilearning_support_files ADD INDEX metadata_json_category (( CAST(metadata_json->>"$.category" as CHAR(255)) COLLATE utf8mb4_bin )) USING BTREE;
Query OK, 0 rows affected (0.19 sec)
Records: 0 Duplicates: 0 Warnings: 0
and I can see that it will be used, with an EXPLAIN query:
mysql> EXPLAIN SELECT * FROM digilearning_support_files WHERE metadata_json->>"$.category" = 'Lesson Plan'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: digilearning_support_files
partitions: NULL
type: ref
possible_keys: metadata_json_category
key: metadata_json_category
key_len: 1023
ref: const
rows: 1
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.00 sec)
So, the EXPLAIN is showing that it will use that index to test the value of the 'category' key in the stored JSON. So far so good.
I wrote a helper method to run in Rails migrations, to set up an index in this way. But, when it runs in Rails, it complains about the collation:
>> ActiveRecord::Base.connection.execute("ALTER TABLE digilearning_support_files ADD INDEX metadata_json_category (( CAST(metadata_json->>\"$.category\" as CHAR(255)) COLLATE utf8mb4_bin )) USING BTREE")
ActiveRecord::StatementInvalid: Mysql::Error: COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8mb3': ALTER TABLE digilearning_support_files ADD INDEX metadata_json_category (( CAST(metadata_json->>"$.category" as CHAR(255)) COLLATE utf8mb4_bin )) USING BTREE
Why is the MySQL client happy with this but not when the exact same SQL query runs in Rails?
As an experiment I changed the Rails query to use utf8mb3_bin instead. That let me add the index, but EXPLAIN says it won't use it, and I'm pretty sure that is the wrong value to use for JSON anyway.
This is the full definition for the metadata_json column: note Collation: NULL:
Field: metadata_json
Type: json
Collation: NULL
Null: YES
Key:
Default: NULL
Extra:
Privileges: select,insert,update,references
However, some other columns in the table do have utf8mb3:
Field: exclude_locales
Type: varchar(255)
Collation: utf8mb3_general_ci
Null: YES
Key:
Default: NULL
Extra:
Privileges: select,insert,update,references
This is the database default:
mysql> SELECT DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME = 'e_learning_resource_development' LIMIT 1;
+------------------------+
| DEFAULT_COLLATION_NAME |
+------------------------+
| utf8mb4_0900_ai_ci |
+------------------------+
1 row in set (0.00 sec)
I'm wondering if the NULL value on the collation means it can be set to different things depending on the enviroment, which would explain why the mysql terminal is different to rails for the same query.
COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8mb3'. This doesn't specify just the collation but the encoding of the text as well. Is the column's collationutf8orutf8mb3perhaps?utf8mb3is deprecated and uses up to 3 bytes to encode characters.utf8mb4uses up to 4.