Indexes are important as our Database grows in size. We can consider Indexes as Keys that will help us perform faster lookups of rows. In this blog, we will understand the different types of Index in MySQL, their use cases, various strategies to decide the index for a table, and finally the benefits of having Indexes. So, let's get started.
Different Types of Index
This is the default type of Index in MySQL which is based on the B-Tree data structure.
The B-Tree index doesn't need a full table scan to find the desired data hence it speeds up the data access process. Similar to how a B-Tree works, in this index, the search starts from the root node and goes to the leaf node. Each node contains the minimum and maximum values for the child nodes. We find the right pointer by looking at these values and reaching the leaf node that has the pointer to the data needed. The depth of the B-Tree depends on how big our data is.
B-Tree stores the columns in an ordered manner which makes them useful for searching over a range of data,
ORDER BY and
GROUP BY operations.
B-Tree does a lookup from the leftmost indexed columns. Let's take an example to understand how adding a B-Tree Index will help us.
Consider the example of an Employee Database where we want to search for employees.
Employee Table Schema
CREATE TABLE EMPLOYEE ( first_name varchar(40) not null, last_name varchar(40) not null, department varchar(40) not null, joining_date date not null );
Consider the following data inserted in the table,
Now that we have the schema defined, let's try to search for some Employees in the table and understand how the indexes can be defined and used.
Single Column Index
ALTER TABLE EMPLOYEE ( ADD INDEX first_name, ADD INDEX department, ADD INDEX joining_date );
SELECT * FROM EMPLOYEE WHERE first_name = 'Donna' and department = 'Engineering' -- Uses the Index on first_name
Even though there is an Index on the
department column, we only use the index on
first_name as the three individual indexes act as three separate tables.
Now when we run this query, MySQL will have to scan across 7 rows where the
first_name is Donna, imagine the same scan across a huge table where around 1 million people have a similar name. Adding a single-column index won't help a lot.
ALTER TABLE EMPLOYEE ( ADD INDEX (department, first_name, joining_date) );
The following table contains a composite index on
(department, first_name, joining_date).
SELECT * FROM EMPLOYEE WHERE department = 'Engineering' AND first_name = 'Donna' -- Uses the Index on (department, first_name)
As compared to single column index, this will be faster as the number of rows to scan will get decreased by using the composite index on
SELECT * FROM EMPLOYEE WHERE last_name = 'Hardy' -- Doesn't use any Index
SELECT * FROM EMPLOYEE WHERE first_name = 'Donna' AND joining_date > 2010-07-28 -- Doesn't use any Index -- since the lookup follows the leftmost side of indexed columns -- we are not mentioning the department -- hence composite index is of no use
SELECT * FROM EMPLOYEE WHERE department = 'Engineering' AND first_name LIKE '%S%' AND joining_date = 2010-07-28 -- Uses the Index on (department, first_name) -- since for first_name we have a range query -- it will only use first two columns of index
In the case of using indexes for range queries, we should try using the range condition field last in the index. This is because after applying the range query there is no way for the B-Tree to apply the next filter, hence we should keep the range condition column at last.
For the Multi-column Indexes in this Table, we have index search capabilities on
(department, first_name) and
(department, first_name, joining_date)
Consider the following query
SELECT * FROM EMPLOYEE WHERE department = 'Engineering' ORDER BY last_name
This will use the
department index to give us the records, and then perform a sort on the records to sort them by
last_name. This is because we didn't have any index on
last_name hence the records were not sorted by that column. This type of sort fetches the data in a temporary buffer before returning it (File Sort) which leads to extra computation.
This can be avoided by adding an index on
last_name. The new composite index for us would be
(department, first_name, joining_date, last_name)
Proper Indexing also helps in GROUP BY operation.
Consider an example like
SELECT * FROM EMPLOYEE WHERE department = 'Engineering' GROUP BY first_name
The records would already be sorted
department hence speeding up the filtering. After these results are returned they are also then sorted by
first_name due to the second part of the index, and so they are already grouped for us. This won't require any other sorting hence making our query fast.
Index selectivity is the ratio of a number of distinct indexed values(cardinality) to the total number of rows in the table(#T). The range for selectivity is from 1/#T to 1.
An index is said to be highly selective if for each value we have fewer rows. Having a highly selective index is good as it filters out more rows when searching for query matches.
Choosing the Right Order
Now that we have understood what Index Selectivity is, let's see how it helps in deciding the column order.
Consider the same Employee Table Schema with the multicolumn index on
(department, first_name, joining_date, last_name) . This composite index will perform better than a single column on queries but let's check if this is the perfect column order for the given set of values in the Employee Table.
Firstly, let's check the Index Selectivity of each of the four columns,
first_name: 3/9 = 0.33
Department: 4/9 = 0.44
last_name: 9/9 = 1
joining_date is a date field and we might want to do range queries on it, it is good to have it last in the order of index.
Based on the Selectivity, the ideal order looks like this
(last_name, department, first_name, joining_date).
If we see the Query Examples above the initial order we defined works well for us as we can use Indexes for most queries. Hence along with selectivity, the use case of the
WHERE clause also matters.
In Conclusion, the Index order for the Multi-Column Index depends on:
WHEREconditions to effectively lookup data
Rangeconditions in the query
Index Selectivitywhen unsure about the column order
The columns we want to
SELECTin the Query
It is important to note that the indexes shouldn't be optimized only for the WHERE clause but for the entire query.
Hash Indexes are used when we want to do exact lookups on every column in the table. Hash codes are generated for each row on the indexed columns. Since the original values are converted to short values based on the hash function, they are compact.
In the case of Hash Indexes, the index contains only pointers to the values rather than the entire values, hence MySQL uses the in-memory rows to fetch the values from the pointers.
Some of the drawbacks of Hash Indexes are
Since we are generating Hash using a function, we can't use it for partial matching of values. Example: If we have an Index on (A, B), we can't use this Index in a query that only uses column A.
In case of two or more rows having the same values for a column, when generating the Hash Index, it will lead to a collision. In such a case, the engine will traverse all the row pointers and read the values for comparison to find the right value leading to extra computation.
Hash Indexes are not very useful for columns that have low selectivity since they will lead to hash collisions.
They can't be used for sorting because they don't store rows in a sorted fashion.
Adaptive Hash Index
In the InnoDB storage engine when the values are being used frequently it creates a hash index on top of the B-Tree Index which leads to faster lookups. This is known as adaptive hash indexes.
Other Types of Indexes
Some other available Index Types include the Spatial Index which is used for geometry and geography types and Full-text Indexes which split the text into words and make an index over the words and not the whole text. This works faster for text searches when looking for specific words.
Benefits of Indexing
The benefits of indexing are measured on the three-star system
One Star if the Index reduces the rows that we have to scan.
Two Star if it improves performance by removing the overhead of sorting and temporary tables.
Three Star if all the required columns for querying are a part of the index.
Points to Remember while Indexing
Use of Functions in Queries
MySQL can use columns only if they are isolated in the query. We should avoid using functions in the WHERE clause.
SELECT * FROM EMPLOYEE WHERE department = 'Engineering' AND YEAR(joining_date) = 2010
The above query will only use the index
department even though an index is present on
joining_date. This is because we have enclosed
joining_date within a function.
In the case of multiple single-column indexes, MySQL sometimes uses multiple indexes by performing Index Merge. But it is not efficient since in the case of OR conditions, operations like sorting, grouping can utilize a lot of resources. In the case of AND conditions, it makes more sense to use a multicolumn index over multiple single-column indexes.
Other Things to Note
Indexes are implemented at the storage engine level and not the server level, hence differs from engine to engine.
Even though we use an ORM, we can't rely on it for indexing as they usually generate queries that satisfy the underlying logic.
Indexes come with drawbacks as well. There is a performance impact on
DELETEoperations. Every time we perform a write on a table, the indexes have to be maintained. Hence, it's important to choose Index wisely.
Always Test the indexes on the queries the application is performing by measuring the response time.
EXPLAINstatement to understand how the query is executing and reveal why it is slow. Explain statement shows values such as the indexes used, order used for joining the table, rows accessed, use of Filesort or not, and a temporary table used or not.
I hope this article helps you in starting to use Indexes in your MySQL Databases.
Did you find this article valuable?
Support Shloka Shah by becoming a sponsor. Any amount is appreciated!