CREATE INDEX
This is a legacy Apache Ignite documentation
The new documentation is hosted here: https://ignite.apache.org/docs/latest/
Create an index on the specified table.
CREATE [SPATIAL] INDEX [[IF NOT EXISTS] indexName] ON tableName
(columnName [ASC|DESC] [,...]) [(index_option [...])]
index_option := {INLINE_SIZE size | PARALLEL parallelism_level}
Parameters
indexName
- the name of the index to be created.ASC
- specifies ascending sort order (which is the default)DESC
- specifies descending sort order.SPATIAL
- specifies to create the spatial index. Presently, only geometry types are supported.IF NOT EXISTS
- do not throw an error if an index with the same name already exists. The database checks indexes' names only not considering columns types or count.index_option
- additional options for the index creation:INLINE_SIZE
- specifies index inline size in bytes. Depending on the size, Ignite will place the whole indexed value or a part of it directly into index pages, thus omitting extra calls to data pages and increasing queries' performance. The index inlining is enabled by default and the size is pre-calculated automatically based on a table structure. It's generally not recommended but setting the size to0
will disable the inlining capability. Refer to the Index Inlining section for more details.PARALLEL
- specifies the number of threads to be used in parallel for index creation. The more threads are set the faster the index will be created and built. If the value exceeds the number of CPUs, then it will be decreased to the number of cores. If the parameter is not specified, then the number of threads will be calculated as 25% of the CPU cores available.
Description
CREATE INDEX
constructs a new index on the specified table. Regular indexes are stored in the internal B+tree data structures. The B+tree gets distributed across the cluster as well as the actual data. A cluster node stores a part of the index for the data it owns.
If CREATE INDEX
is executed in runtime on the life data then the database will iterate over the specified columns synchronously indexing them. The rest of the DDL commands targeting the same table are blocked until CREATE INDEX
is in progress. DML commands execution is not affected and can be performed in parallel.
Schema changes applied by this command are persisted on disk if Ignite persistence is enabled. Thus, the changes can survive full cluster restarts.
A geospatial index is maintained by JTS Topology Suite and covered in respective geospatial documentation.
Index Inlining
Index inline size is internally limited by 2048
bytes. Values bigger than that will be silently reduced to 2048
.
If the INLINE_SIZE
parameter is not set, then Ignite will calculate the index size based on a table structure as follows -
If a table does not include variable-size columns like CHAR
, VARCHAR
or BINARY
, then the engine will accumulate the size of all the columns, adding an extra byte per column. If the variable-size columns are present in the table definition, then the index inline size is derived from the IGNITE_MAX_INDEX_PAYLOAD_SIZE
system property that has a default value of 10 bytes.
Indexes Tradeoffs
There are multiple things you should consider when choosing indexes for your application.
- Indexes are not free. They consume memory, also each index needs to be updated separately, thus the performance of write operations might drop if too many indexes are created. On top of that, if a lot of indexes are defined the optimizer might do more mistakes by choosing a wrong index while building the execution plan.
It is a bad strategy to index everything!
- Indexes are just sorted data structures (B+tree). If you define an index for the fields
(a,b,c)
then the records will be sorted first bya
, then byb
and only then byc
.
Example of Sorted Index
| A | B | C |
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 5 |Any condition like
a = 1 and b > 3
can be viewed as a bounded range, both bounds can be quickly looked up in in log(N) time, the result will be everything between.The following conditions will be able to use the index:
a = ?
a = ? and b = ?
a = ? and b = ? and c = ?
Condition
a = ? and c = ?
is no better thana = ?
from the index point of view.
Obviously half-bounded ranges likea > ?
can be used as well.
-
Indexes on single fields are no better than group indexes on multiple fields starting with the same field (index on (a) is no better than (a,b,c)). Thus it is preferable to use group indexes.
-
When
INLINE_SIZE
option is specified, indexes holds a prefix of field data in the B+tree pages. This improves search performance by doing less row data retrievals, however substantially increases size of the tree (with a moderate increase in tree height) and reduces data insertion and removal performance due to excessive page splits and merges. It's a good idea to consider page size when choosing inlining size for the tree: each B-tree entry requires16 + inline-size
bytes in the page (plus header and extra links for the page).
Examples
Create a regular index :
CREATE INDEX title_idx ON books (title);
Create a descending index only if it does not exist :
CREATE INDEX IF NOT EXISTS name_idx ON persons (firstName DESC);
Create a compound index :
CREATE INDEX city_idx ON sales (country, city);
Create an index specifying data inline size:
CREATE INDEX fast_city_idx ON sales (country, city) INLINE_SIZE 60;
Create a geospatial index:
CREATE SPATIAL INDEX idx_person_address ON Person (address);
Note
Java, .NET, C++ users can define indexes with special configuration classes and annotations:
See Also
Updated over 4 years ago