Mastering MySQL JOIN: A Comprehensive Guide to Execution and Efficiency
Written on
Understanding MySQL JOIN Operations
This article delves into the complex execution of MySQL JOIN operations, providing insights on how to enhance database efficiency through strategic optimizations. Whether you're a beginner or an experienced developer, grasping the fundamentals of JOIN operations can significantly improve your database interactions.
Have you ever considered how MySQL executes JOIN operations or questioned your understanding of them? If you're unsure how to validate your insights, reflect on these questions: How does MySQL determine the driving table? Is it selected sequentially from left to right, starting with the first table? What factors influence the order of table joins in multi-table scenarios?
Before we dive deeper, it's vital to understand the concept of the driving table. In a multi-table JOIN query, the driving table is the first table that MySQL processes, also known as the base table. Records from this table are then used to connect with other tables.
The selection of the driving table is based on a principle that prioritizes the table with the smallest result set, provided this does not alter the final outcome. Although identifying the smallest result set is straightforward, assessing its non-impact on the final result can be intricate. However, patterns do emerge.
Typically, in a LEFT JOIN, the left table is chosen as the driving table, while in a RIGHT JOIN, the right table is preferred. For INNER JOINs, the table with the smaller result set is generally designated as the driving table. If there are still uncertainties, the EXPLAIN statement can be utilized to reveal the driving table, with the first table listed in the results being the driving table. However, it's important to note that EXPLAIN is not infallible; the execution plan may shift during actual execution.
Setting Up the Database
Currently, we are using MySQL version 5.7 with the InnoDB storage engine. The initial SQL for setting up the database is as follows:
DROP TABLE IF EXISTS tb_user;
CREATE TABLE tb_user (
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
user_name VARCHAR(50) NOT NULL,
gender TINYINT(1) NOT NULL,
created_at datetime NOT NULL,
updated_at datetime NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO tb_user(user_name, gender, created_at, updated_at) VALUES
('Girvan', 1, NOW(), NOW()),
('Paul', 0, NOW(), NOW()),
('Max', 1, NOW(), NOW()),
('Brogan', 0, NOW(), NOW()),
('Aydan', 1, NOW(), NOW()),
('Colm', 0, NOW(), NOW()),
('Jason', 1, NOW(), NOW()),
('Quillan', 0, NOW(), NOW()),
('Donnacha', 1, NOW(), NOW()),
('Iarla', 0, NOW(), NOW());
Next, we create the login log table:
DROP TABLE IF EXISTS tb_login_log;
CREATE TABLE tb_login_log (
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
user_name VARCHAR(50) NOT NULL,
ip VARCHAR(15) NOT NULL,
created_at datetime NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO tb_login_log(user_name, ip, created_at) VALUES
('Girvan', '192.168.1.101', '2024-01-01 12:01:01'),
('Girvan', '192.168.1.102', '2024-01-02 12:02:02'),
('Girvan', '192.168.1.103', '2024-01-03 12:03:33'),
('Paul', '192.168.1.104', '2024-01-04 12:00:01'),
('Max', '192.168.1.105', '2024-01-05 12:00:01'),
('Brogan', '192.168.1.106', '2024-01-06 12:00:01'),
('Aydan', '192.168.1.107', '2024-01-07 12:00:01'),
('Colm', '192.168.1.108', '2024-01-08 12:00:01');
To initiate the querying process, we can retrieve data from both tables:
SELECT * FROM tb_user;
SELECT * FROM tb_login_log;
JOIN Algorithms in MySQL
The execution of a single-table query is relatively straightforward. However, the JOIN algorithm in MySQL is based on the nested-loop principle and involves a variety of algorithms chosen based on specific conditions.
1. Simple Nested-Loop Join
The Simple Nested-Loop join (SNL) involves matching records one at a time in a sequential manner.
for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions, send to client}
}
}
While this approach is simple, it can be inefficient, with a time complexity that grows exponentially based on the number of records and joined tables.
2. Block Nested-Loop Join
Block Nested-Loop join (BNL) optimizes SNL by caching multiple rows from the driving table in the Join Buffer for efficient matching.
for each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer
if buffer is full {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client}
}
empty join buffer
}
}
}
3. Index Nested-Loop Join
The Index Nested-Loop join (INL) leverages the index of the driven table, matching records from the driving table one at a time, which reduces the number of comparisons required.
4. Batched Key Access
Batched Key Access (BKA) optimizes the INL by placing values from the driving table into the join buffer, sorting them, and then performing sequential matching with the index of the driven table.
SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';
Conclusion
Understanding the intricacies of the JOIN algorithms, utilizing the EXPLAIN statement, and being mindful of indexing can greatly enhance your ability to optimize MySQL queries. MySQL dynamically selects the most efficient algorithm based on the current database conditions, leading to improved performance in query execution.
If you find this information helpful and wish to support my efforts, please consider giving a clap. Your encouragement is greatly appreciated!