Performance schema

performance_schema.events_statements_summary_by_digest

“The performance schema includes a set of tables that give information on how statements are performing “[3].

This post focuses on the events_statements_summary_by_digest table. The events_statements_summary_by_digest provides aggregated statement digest information. It normalizes and groups statements with the same “signature” and allow questions to be answered about the types of statements the mysql server is executing and also their frequency. Normalized statements in this case means Stripping whitespace in queries, stripping comments, relacing literals (integer and string inputs) with a “?” placeholder etc. Some examples of normalized statements when used in this contest include [1]:

  • Replacing literals (integer and string inputs) with a “?” placeholder
    • SELECT foo, bar FROM foobar WHERE foo = 100; becomes SELECT foo, bar FROM foobar WHERE foo = ?
    • For a single value INSERT statement, INSERT INTO foobar VALUES (100); becomes INSERT INTO foobar VALUES (?);
  • Lists of values are folded
    • Folding lists of IN values
      • SELECT foo FROM foobar WHERE bar IN (1, 2, 3) becomes SELECT foo FROM foobar WHERE bar IN (…)

The schema_name/digest fields are the grouping fields for this table and determine the summary of events for each record. Though not directly defined in the schema, the schema_name and digest fields act more like composite primary keys for records on the table. So the combination of these two fields makes each record in the table unique.

The DIGEST field contains a MD5 hashed value, while the DIGEST_TEXT is the corresponding normalized statement text string.

On a specific schema, the DIGEST_TEXT is distinct (hence the term grouping) since the use of the same query will be recorded only once, with the COUNT_STAR field providing a counter value of how often the statement occurs, or in other words, how many digests are recorded in a record. Extra info indicating when the query was first and last used is also provided by the FIRST_SEEN and LAST_SEEN fields.

Times units in Performance Schema are displayed in picosecond granularity. But you can try to change to readable values.

Limitations

The events_statements_summary_by_digest table is limited in size by the performance_schema_digests_size system variable. The size set on our kdb servers (one of our main databases) is 10,000 i.e, unto 10,000 rows can be stored in this table. I think 10,000 is the default for Percona Servers. You can use the parameter performance_schema_digests_size to check the size limit on your server [2]. You can run the following command in mysql to check the size:

The events_statements_summary_by_digest table does not record anything about prepared statements. The only recorded statements recorded are those of full text regular SQL.

 mysql> show variables like 'performance_schema_digests_size';

To change this variable, e.g, to 20,000, you can add the parameter in the my.cnf file under [mysqld] like:

performance_schema_digests_size = 20000

This is a read-only variable and therefore cannot be added using the set command dynamically.

If you want to reset your statistics in the performance_schema, you can run the command below:

mysql> TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;

 

Some Helpful query examples

Q1. List normalized queries with the highest latency in kkox

SELECT 
    IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    COUNT_STAR AS exec_count,
    SUM_NO_INDEX_USED AS no_index_used_count,
    SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
    SEC_TO_TIME(SUM_TIMER_WAIT/1000000000000) AS exec_time_total,
    SEC_TO_TIME(MAX_TIMER_WAIT/1000000000000) AS exec_time_max,
    (AVG_TIMER_WAIT/1000000000) AS exec_time_avg_ms,
    SUM_ROWS_SENT AS rows_sent,
    ROUND(SUM_ROWS_SENT / COUNT_STAR) AS rows_sent_avg,
    SUM_ROWS_EXAMINED AS rows_scanned,
    FIRST_SEEN AS first_executed,
    LAST_SEEN AS last_executed,
    DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest 
WHERE SCHEMA_NAME='dbname'
ORDER BY SUM_TIMER_WAIT DESC\G

Q 2. List all normalized statements that use temporary tables ordered by number of on disktemporary tables descending first, then by the number of memory tables.

SELECT 
    IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
    COUNT_STAR AS exec_count,
    SUM_CREATED_TMP_TABLES AS memory_tmp_tables,
    SUM_CREATED_TMP_DISK_TABLES AS disk_tmp_tables,
    ROUND(SUM_CREATED_TMP_TABLES / COUNT_STAR) AS avg_tmp_tables_per_query,
    ROUND((SUM_CREATED_TMP_DISK_TABLES / SUM_CREATED_TMP_TABLES) * 100) AS tmp_tables_to_disk_pct,
    DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_CREATED_TMP_TABLES > 0 AND SCHEMA_NAME='dbname'
ORDER BY SUM_CREATED_TMP_DISK_TABLES DESC, SUM_CREATED_TMP_TABLES DESC\G

Q 3. List all normalized statements that have done sorts, ordered by sort_merge_passes, sort_scans and sort_rows, all descending.

SELECT 
    IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
    COUNT_STAR AS exec_count,
    SUM_SORT_MERGE_PASSES AS sort_merge_passes,
    ROUND(SUM_SORT_MERGE_PASSES / COUNT_STAR) AS avg_sort_merges,
    SUM_SORT_SCAN AS sorts_using_scans,
    SUM_SORT_RANGE AS sort_using_range,
    SUM_SORT_ROWS AS rows_sorted,
    ROUND(SUM_SORT_ROWS / COUNT_STAR) AS avg_rows_sorted,
    DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_SORT_ROWS > 0 AND SCHEMA_NAME='dbname'
ORDER BY SUM_SORT_MERGE_PASSES DESC, SUM_SORT_SCAN DESC, SUM_SORT_ROWS DESC\G

Q 4. List all normalized statements that use have done a full table scan ordered by the percentage of times a full scan was done, then by the number of times the statement executed

SELECT 
    IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
    COUNT_STAR AS exec_count,
    SUM_NO_INDEX_USED AS no_index_used_count,
    SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
    ROUND((SUM_NO_INDEX_USED / COUNT_STAR) * 100) no_index_used_pct,
    DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_NO_INDEX_USED > 0 
    OR SUM_NO_GOOD_INDEX_USED > 0 AND SCHEMA_NAME='dbname'
ORDER BY no_index_used_pct DESC, exec_count DESC\G

Q 5. List all normalized statements that have raised errors or warnings.

SELECT 
    IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
    COUNT_STAR AS exec_count,
    SUM_ERRORS AS errors,
    (SUM_ERRORS / COUNT_STAR) * 100 as error_pct,
    SUM_WARNINGS AS warnings,
    (SUM_WARNINGS / COUNT_STAR) * 100 as warning_pct,
    DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_ERRORS > 0
   OR SUM_WARNINGS > 0 AND SCHEMA_NAME='dbname'
ORDER BY SUM_ERRORS DESC, SUM_WARNINGS DESC\G

Q 6. First, i would like to list queries with the largest number of executions i.e, that has been run the most.

SELECT
    DIGEST AS digest,
    SCHEMA_NAME as db,
    DIGEST_TEXT AS query,
    COUNT_STAR AS execution_count,
    FIRST_SEEN AS first_executed,
    LAST_SEEN AS last_executed
FROM performance_schema.events_statements_summary_by_digest 
WHERE SCHEMA_NAME='kkbox'
ORDER BY COUNT_STAR DESC\G;

Q 7. But now i would like to know which statements are executed the most, but do not utilize indexes?

SELECT 
    IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    DIGEST AS digest,
    SCHEMA_NAME as db,
    DIGEST_TEXT AS query,
    COUNT_STAR AS execution_count,
    SUM_NO_INDEX_USED AS no_index_used_count,
    SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
    SUM_ROWS_EXAMINED AS rows_scanned,
    FIRST_SEEN AS first_executed,
    LAST_SEEN AS last_executed
FROM performance_schema.events_statements_summary_by_digest WHERE SCHEMA_NAME='kkbox' AND (SUM_NO_INDEX_USED > 0 OR SUM_NO_GOOD_INDEX_USED > 0)
ORDER BY COUNT_STAR DESC\G;

Q 8. But now i would like to know which statements scan the most rows in a table, yet run the least number of times on the server. I would also like to know if these same statements utilize indexes.

SELECT 
    IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    DIGEST AS digest,
    SCHEMA_NAME as db,
    DIGEST_TEXT AS query,
    COUNT_STAR AS execution_count,
    SUM_NO_INDEX_USED AS no_index_used_count,
    SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
    SUM_ROWS_EXAMINED AS rows_scanned,
    (SUM_ROWS_EXAMINED/COUNT_STAR) as avg_rows_scanned_per_exec,
    FIRST_SEEN AS first_executed,
    LAST_SEEN AS last_executed
FROM performance_schema.events_statements_summary_by_digest WHERE SCHEMA_NAME='kkbox' AND (SUM_NO_INDEX_USED > 0 OR SUM_NO_GOOD_INDEX_USED > 0)
ORDER BY SUM_ROWS_EXAMINED DESC, COUNT_STAR\G;

Q 9. Now, i would like to list all statements that use temporary tables, ordered by number of disk temporary tables first, then the number of memory tables.

SELECT 
    IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
       DIGEST AS digest,
       SCHEMA_NAME AS db,
       DIGEST_TEXT AS query,
       COUNT_STAR AS execution_count,
       SUM_CREATED_TMP_TABLES AS memory_tmp_tables,
       SUM_CREATED_TMP_DISK_TABLES AS disk_tmp_tables,
       ROUND(SUM_CREATED_TMP_TABLES / COUNT_STAR) AS avg_tmp_tables_per_query,
       ROUND((SUM_CREATED_TMP_DISK_TABLES / SUM_CREATED_TMP_TABLES) * 100) AS tmp_tables_to_disk_pct,
       FIRST_SEEN AS first_executed,
       LAST_SEEN AS last_executed    
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_CREATED_TMP_TABLES > 0 AND SCHEMA_NAME='kkbox'
ORDER BY SUM_CREATED_TMP_DISK_TABLES DESC, SUM_CREATED_TMP_TABLES DESC\G;

Q 10. I would now also like to list statements that produce errors.

SELECT 
    DIGEST AS digest,
    SCHEMA_NAME AS db,
    DIGEST_TEXT as query,
    COUNT_STAR AS execution_count,
    SUM_ERRORS AS errors_total,
    (SUM_ERRORS / COUNT_STAR) * 100 as error_pct,
    FIRST_SEEN AS first_executed,
    LAST_SEEN AS last_executed    
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_ERRORS > 0 AND SCHEMA_NAME='kkbox'
ORDER BY SUM_ERRORS DESC\G;

 

Normalized Statements

Q 1. List top 10 queries running the most

SELECT
    d.COUNT_STAR AS execution_count, 
    d.DIGEST, 
    c.SQL_TEXT AS original_query, 
    c.DIGEST_TEXT as normalized_query, 
    c.SOURCE AS source, 
    c.CURRENT_SCHEMA as db 
FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  
WHERE d.DIGEST=c.DIGEST AND c.SQL_TEXT IS NOT NULL
ORDER BY d.COUNT_STAR DESC limit 10\G

Q 2. List top 10 queries running the most, but not utilizing indexes.

SELECT 
    IF(d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    d.COUNT_STAR AS execution_count, 
    d.DIGEST, 
    c.SQL_TEXT AS original_query, 
    c.DIGEST_TEXT as normalized_query, 
    c.SOURCE AS source, 
    c.CURRENT_SCHEMA as db, 
    d.SUM_NO_GOOD_INDEX_USED AS total_bad_index, 
    d.SUM_NO_INDEX_USED AS total_no_index_used
FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  
WHERE d.DIGEST=c.DIGEST AND (d.SUM_NO_GOOD_INDEX_USED > 0 
OR d.SUM_NO_INDEX_USED > 0) AND c.SQL_TEXT IS NOT NULL
ORDER BY d.COUNT_STAR DESC LIMIT 10\G

Q 3. List top 10 queries scanning the most records without utilizing indexes.

SELECT 
    IF(d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    d.COUNT_STAR AS execution_count, 
    d.DIGEST, 
    c.SQL_TEXT AS original_query, 
    c.DIGEST_TEXT as normalized_query, 
    c.SOURCE AS source, 
    c.CURRENT_SCHEMA as db, 
    d.SUM_NO_GOOD_INDEX_USED AS total_bad_index, 
    d.SUM_NO_INDEX_USED AS total_no_index_used, 
    d.SUM_ROWS_EXAMINED AS total_rows_scanned
FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  
WHERE d.DIGEST=c.DIGEST AND (d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0) AND c.SQL_TEXT IS NOT NULL
ORDER BY d.SUM_ROWS_EXAMINED DESC, 
COUNT_STAR DESC LIMIT 10\G

Q 5. List top 10 queries scanning the most records without utilizing indexes.

SELECT 
    IF(d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    d.COUNT_STAR AS execution_count, 
    d.DIGEST, 
    c.SQL_TEXT AS original_query, 
    c.DIGEST_TEXT as normalized_query, 
    c.SOURCE AS source, 
    c.CURRENT_SCHEMA as db, 
    d.SUM_NO_GOOD_INDEX_USED AS total_bad_index, 
    d.SUM_NO_INDEX_USED AS total_no_index_used, 
    d.SUM_ROWS_EXAMINED AS total_rows_scanned
FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  
WHERE d.DIGEST=c.DIGEST AND (d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0) AND c.SQL_TEXT IS NOT NULL
ORDER BY d.SUM_ROWS_EXAMINED DESC, DESC LIMIT 10\G

Q 6. List the top 10 most update statements

SELECT
    digest_text,
    ((count_star/update_total) * 100) as percentage_of_all
FROM events_statements_summary_by_digest,
(SELECT sum(count_star) update_total
  FROM events_statements_summary_by_digest
  WHERE digest_text LIKE 'UPDATE%'
) update_totals
WHERE digest_text LIKE 'UPDATE%'
ORDER BY percentage_of_all DESC LIMIT 10

CONCLUSION

Using the Performance Schema, it is very easy to see what is actually running on your MySQL server. Though tools like mytop and commands like SHOW PROCESSLIST can help us to see what is happening on our mysql server in real time, the Performance Schema helps us to understand the different statements that run on our MySQL server over time. This makes the performance schema more helpful in planning  profiling and optimizing statements. A more detail review of the data provided by the performance schema would help us understand patterns of statement execution, and in return help us make decisions that would bring improvement on system scalability.

REFERENCES:
[1] http://www.markleith.co.uk/2012/07/04/mysql-performance-schema-statement-digests/
[2] https://mariadb.com/kb/en/mariadb/performance-schema-events_statements_summary_by_digest-table/
[3] https://www.vividcortex.com/blog/2014/11/03/mysql-query-performance-statistics-in-the-performance-schema/

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s