Optimizing MySQL Tables with Hundreds of Millions of Rows
Written on
Chapter 1: Understanding the Challenge
In recent times, I faced a significant configuration challenge with a database that operated on a master-slave architecture. Each morning, an SLA (Service Level Agreement) alarm would signal a considerable delay in synchronization between the master and slave databases. This delay indicated that switching from the master to the slave database would be time-consuming, necessitating additional time to ensure data consistency.
Upon further investigation, it became clear that a primary contributor to this issue was the high volume of slow queries within the database. Slow queries are those that require more than one second to execute. We identified a particular task that occurs nightly—the deletion of data older than one month—as a major source of these slow executions.
Section 1.1: Analyzing the Slow Queries
To get to the root of the problem, we leveraged the pt-query-digest tool to analyze the mysql-slow.log file from the past week.
pt-query-digest --since=148h mysql-slow.log | less
The analysis revealed that the total slow query execution time reached 25,403 seconds over the week, with the longest execution taking 266 seconds and an average of 5 seconds per slow SQL query, scanning approximately 17.66 million rows on average.
The select arrival_record operations recorded over 40,000 slow queries, averaging a response time of 4 seconds, while the delete arrival_record operation was logged 6 times with an average response time of 258 seconds.
Section 1.2: Investigating Query Performance
The slow query logs for the select arrival_record operations exhibited similar patterns, with queries resembling the following:
SELECT count(*) FROM arrival_record WHERE product_id=26 AND receive_time BETWEEN '2019-03-25 14:00:00' AND '2019-03-25 15:00:00' AND receive_spend_ms>=0;
These queries scanned up to 56 million rows, leading to prolonged execution times.
After reviewing the execution plan, it was evident that the index IXFK_arrival_record was used, but the estimated number of scanned rows was alarmingly high, exceeding 30 million.
Section 1.3: Reassessing Index Efficiency
The existing composite index had a low cardinality for the product_id field, rendering it less effective for selection. The filtering conditions did not utilize the station_no field, which further hindered optimization.
To enhance performance, we proposed creating a new composite index focused on receive_time, which has higher selectivity, while also considering other fields that filtered access to the table.
Chapter 2: Practical Optimization Strategies
Video Resource:
To see practical implementations of data loading and performance testing, check out these videos:
The first video, "I loaded 100,000,000 rows into MySQL (fast) - YouTube," showcases efficient strategies for loading vast amounts of data into MySQL.
The second video, "How I add millions of rows to MySQL (for performance testing) - YouTube," offers insights on performance testing with large datasets.
Section 2.1: Capturing SQL Queries
Using tcpdump, we captured packets targeting select statements on the table.
tcpdump -i bond0 -s 0 -l -w - dst port 3316 | strings | grep select | egrep -i 'arrival_record' > /tmp/select_arri.log
This revealed various conditions leveraging the first three fields of the composite index, suggesting a need for an index redesign.
Section 2.2: Implementing Index Changes
As part of our optimization, we decided to delete the existing composite index and create a more efficient one. The plan included the following steps:
- Remove the IXFK_arrival_record index.
- Create a new composite index idx_sequence_station_no_product_id.
- Establish a separate index for indx_receive_time(receive_time).
Testing and Results
We transferred the arrival_record table to a test environment to perform deletions and re-indexing operations. The results indicated significant improvements in execution times and reduced overhead during DDL operations.
Chapter 3: Summary of Best Practices
When dealing with large tables in MySQL, it is crucial to focus on both response times and maintenance costs associated with the table. For efficient DDL operations, consider the table's characteristics, including concurrency and foreign keys, to choose suitable methods for changes.
For deleting significant amounts of data, implementing small batch deletions can alleviate pressure on the master instance and reduce master-slave latency.
By adopting these strategies, we successfully optimized the performance of our MySQL database, ensuring efficient data handling and improved query response times.