Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

MySQL has ranked as one of the leading open-source relational database management systems (RDBMS) for many years. It has various storage engines tailored to different performance, reliability, and functionality requirements.

Understanding these engines is crucial for database administrators, developers, and IT professionals.

In this tutorial, we’ll explain MySQL storage engines, their features, advantages, and applications.

2. Understanding Storage Engines

A storage engine is a component of an RDBMS that stores, retrieves, and manipulates data.

Storage engines define how to organize data on disk and execute queries. They oversee and process transactions, ensuring data security and integrity.

Storages can be optimized for various purposes, such as high-speed data access, transaction processing, or data archiving.

3. MySQL Storage Engines

The following command displays a list of available storage engines and their statuses:

SHOW ENGINES;

The result shows us the existing engines and indicates the available ones:

Engine Support Comment Transactions XA Savepoints
FEDERATED NO Federated MySQL storage engine NULL NULL NULL
MEMORY
YES Hash-based, stored in memory, useful for temporary tables NO NO NO
InnoDB
DEFAULT Supports transactions, row-level locking, and foreign keys YES YES YES
PERFORMANCE_SCHEMA YES Performance Schema NO NO NO
MyISAM YES MyISAM storage engine NO NO NO
MRG_MYISAM YES collection of identical MyISAM tables NO NO NO
BLACKHOLE YES /dev/null storage engine (anything you write to it disappears.) NO NO NO
CSV YES CSV storage engine NO NO NO
ARCHIVE YES Archive storage engine NO NO NO

4. Selecting a Storage Engine

When creating a table in MySQL, use the keyword ENGINE to select the storage engine:

CREATE TABLE table_name (
    <column_definition>
) ENGINE=<storage_engine>;

Here, <column_definition> outlines the attributes of each table’s columns: their names, data types, constraints, or other properties. ENGINE=<storage_engine> specifies the particular storage engine we want to use for the table.

MySQL uses the default storage engine if we don’t specify the engine in the CREATE TABLE statement.

4.1. Multiple Engines in the Same Database

In MySQL, we can use different storage engines for different tables in the same database. This option provides flexibility to optimize performance according to each table’s needs.

If our application requires the same features, we may not need different engines dedicated to specific tables. For example, if all tables need strong transactional support, using InnoDB for every table will simplify database management.

5. FEDERATED

FEDERATED tables can access data on remote MySQL servers, simplifying data integration across distributed systems. They are perfect for cases requiring data distribution without replication mechanisms.

We can leverage it to combine information from regional databases into a centralized reporting system.

6. MEMORY

The MEMORY storage engine provides high-speed data access. It stores data in RAM rather than on a disk. Therefore, we can read and write data amazingly fast.

However, data in MEMORY tables are lost when we restart the server. This volatility makes MEMORY unsuitable for permanent storage. This engine is suitable only for data we can regenerate easily or when data loss isn’t fatal for our application.

For example, we can use MEMORY for a real-time analytics dashboard that reports live metrics and statistics. Incoming data streams through MEMORY tables are stored and processed quickly. Hence, users can instantly access up-to-date information. If we restart the server and lose the current data, it doesn’t matter, as the new live data will come soon.

7. InnoDB: the Default MySQL Storage Engine

Since version 5.5, InnoDB has been the default storage engine for MySQL. It’s widely recognized for its robustness and flexibility. InnoDB provides full ACID compliance in transactions. This guarantees a high level of data integrity. It also guarantees users a consistent view of the data at any given time.

InnoDB supports row-level locking, which can help significantly reduce contention and enhance concurrency. It also supports foreign keys, which enforce referential integrity and maintain relationships between tables.

InnoDB’s fault-recovery mechanism ensures no data will be lost if the server fails. This reliability is important in production environments.

For instance, this engine is useful for online stores that handle many sales. It ensures all orders are processed correctly and keeps inventory counts accurate and consistent to prevent issues like double orders.

8. PERFORMANCE_SCHEMA

MySQL has a storage engine known as PERFORMANCE_SCHEMA. It’s a monitoring tool that offers a full picture of the server’s performance. This engine gathers performance data such as queries, transactions, I/O operations, and memory usage. It doesn’t store users or data, unlike the other storage engines.

PERFORMANCE_SCHEMA allows the database administrator to evaluate and optimize performance.

9. MyISAM

Before InnoDB, MySQL’s default storage engine was MyISAM. MyISAM is still available and can be used, but it’s not as complete and secure as InnoDB, which is ideal for many applications.

MyISAM uses table-level locking. This ensures faster reading. It can also be used for full-text indexing, which enables effective full-text searches on large datasets. It’s also helpful for applications with content management systems and document databases.

MyISAM is easy to set up but doesn’t support transactions or foreign keys. This restricts its potential for applications that need strong data integrity. A blog platform can benefit from its fast reading and full-text indexing capabilities.

10. MRG_MYISAM

MRG_MYISAM storage in MySQL manages multiple similar MyISAM tables as a single table (by merging them). For merging, the tables must share identical column structures. This means they should have the same column names, data types, and column order.

This engine is a good choice for managing large datasets partitioned across tables but queried as one.

The MRG_MYISAM engine supports MyISAM features such as fast read performance and table-level locking. It also enables the data to be distributed across several files for improved organization and is useful in applications requiring efficient data processing, such as logging systems or data warehousing.

However, MHG_MYISAM doesn’t support transactions or foreign key constraints. This is problematic for applications that require data integrity and transactional consistency. Additionally, since the engine uses table-level locking, write operations can become a bottleneck under high concurrency.

11. BLACKHOLE

The BLACKHOLE storage engine discards any data that’s written to it.

BLACKHOLE is an excellent choice for testing and debugging. It’s often used in replication setups and filters out unwanted data. We can also use it to produce dummy tables for testing purposes.

12. CSV

The CSV storage engine is designed for simplicity. It stores data in the comma-separated values (CSV) format.

CSV tables are easy to create and manipulate, so they’re ideal for quick data import and export tasks. However, CSV doesn’t support indexes or transactions. It’s limited to simple data storage and manipulation tasks.

Alternatively, when migrated from one system to another, a CSV table can serve to temporarily store data. The simplicity makes it easy to pull data from the source and import them into the target system.

13. ARCHIVE

This storage engine is suitable for large datasets. It stores them automatically in a compressed format. Compression reduces storage requirements and enhances I/O performance.

ARCHIVE is the ideal choice for historical data recording. It’s also optimized for mass data insertion, batch data processing, or data warehousing. However, ARCHIVE doesn’t support indexes. As a result, its design is simpler, but query performance isn’t as high as compared to other systems like MyISAM.

A practical use case for ARCHIVE is log storage since ARCHIVE can handle large volumes of data with little overhead.

14. Summary

The table below summarizes MySQL storage engines.

 

Storage engine Main Characteristics Use Cases
FEDERATED Accesses data on remote MySQL servers, no data replication Distributed data integration and centralized reporting systems.
MEMORY Data stored in RAM, high-speed access, volatility Temporary data storage, real-time analytics dashboards
InnoDB Default engine, supports transactions, row-level locking, foreign keys, ACID compliance, fault-recovery Transactional applications, e-commerce sites
PERFORMANCE_SCHEMA Monitoring server performance, doesn’t store user data Database performance evaluation and optimization
MyISAM Table-level locking, no transactions, supports full-text indexing Read-heavy applications, content management systems, document databases, blog platforms
MRG_MyISAM Merges multiple MyISAM tables, table-level locking, no transactions Large datasets partitioned across tables, logging systems, data warehousing
BLACKHOLE Discards written data, used for testing and debugging Replication setups, filtering unwanted data, creating dummy tables
CSV Stores data in CSV format, no indexes or transactions Quick data import/export tasks, temporary data storage during system migrations
ARCHIVE Compressed storage, for large volumes, no indexes Historical data recording, mass insertion, batch processing, log storage

Users often prefer InnoDB and MyISAM storage engines. InnoDB supports transactions, row-level locking, and other features such as foreign keys. These features maintain data integrity and reliability. On the other hand, MyISAM is most suitable for query performance. It offers full-text indexing and table-level locking for faster operations.

15. Conclusion

In this article, we explained the storage engines we can use in MySQL.

There are several engines in MySQL we can use. They serve different purposes, and choosing the most suitable one optimizes performance and ensures data integrity at the desired level.

InnoDB is usually a good choice for most cases because of its features and reliability. It’s also the default engine.