MySQL online DDL still blocking: ALGORITHM, LOCK, and the copy phase
Online DDL in MySQL is not lock-free. Even with ALGORITHM=INPLACE or ALGORITHM=INSTANT, and even with LOCK=NONE, every online DDL operation passes through a brief window where it upgrades its metadata lock (MDL) to exclusive. That window is short, but it is real, and it is the single most common reason an “online” schema change still stalls a busy production table.
This article explains the three-phase MDL model that online DDL uses, why ALGORITHM=INPLACE with LOCK=NONE can still block, what the copy phase actually does, and how third-party tools such as pt-online-schema-change and gh-ost change the picture. It assumes you already understand the broader MySQL mental model. See the hub page for the failure-pattern catalogue that frames metadata lock stalls as one of MySQL’s characteristic outage archetypes.
What online DDL actually promises
“Online” in MySQL terminology means concurrent DML (INSERT, UPDATE, DELETE) is permitted during the execution phase of the DDL. It does not mean the DDL acquires no exclusive locks. The promise is narrower than most operators assume:
ALGORITHM=INPLACEwithLOCK=NONEallows concurrent reads and writes during the bulk of the operation.ALGORITHM=INSTANTmodifies only data dictionary metadata and touches no table data.- Both still require metadata locks, and both still upgrade to an exclusive MDL at commit time.
If your expectation is “the DDL will never block another session,” that expectation is wrong. The correct expectation is “the DDL blocks for a very short time at the start and at the end, and during that window anything holding a shared MDL on the table will block the DDL, which in turn blocks everything queuing behind the DDL.”
For the full background on how metadata locks cascade into partial outages, see the hub page section on the metadata lock cascade pattern.
The three-phase MDL model
The MySQL 8.4 Reference Manual documents online DDL as three phases. Understanding the phases is the key to understanding every “online DDL still blocking” symptom.
flowchart TD A[Phase 1: Initialization] -->|shared upgradeable MDL| B[Phase 2: Execution] B -->|concurrent DML permitted for INPLACE/INSTANT| C[Phase 3: Commit table definition] C -->|exclusive MDL, brief| D[New table definition visible] D -->|all new DML on table resumes| E[Operation complete] C -->|any session holding shared MDL blocks commit| F[DDL waits, new DML queues behind DDL]
Phase 1: initialization
The DDL takes a shared, upgradeable metadata lock on the table. This lock is compatible with concurrent DML, which is why the operation can proceed while queries run. The server assesses whether the operation can be done INPLACE or must fall back to COPY.
Phase 2: execution
For ALGORITHM=INPLACE with LOCK=NONE, concurrent DML is permitted. InnoDB may build a new version of the table (or index) in the background while the original table continues to serve reads and writes. For ALGORITHM=INSTANT, this phase does effectively nothing to table data because only data dictionary metadata changes.
However, the manual notes that an exclusive metadata lock may be taken briefly during the execution phase, during statement preparation. Whether the upgrade happens depends on factors assessed in the initialization phase. If it is required, it is brief, but it is still exclusive.
Phase 3: commit table definition
This is where the “still blocking” symptom originates. The manual states that in the commit table definition phase, the metadata lock is upgraded to exclusive. The server must evict the old table definition from every session’s table cache and publish the new one. While the exclusive MDL is held:
- New DML against the table queues behind the exclusive request.
- Any session that still holds a shared MDL on the table prevents the DDL from acquiring the exclusive MDL.
- The DDL therefore waits for that shared MDL to release, and every new DML statement queues behind the DDL.
This is the metadata lock cascade, triggered by a DDL that is otherwise fully “online” during execution.
Why ALGORITHM=INPLACE with LOCK=NONE still blocks
LOCK=NONE governs what level of table access the DDL requires during execution. It does not remove the exclusive MDL requirement at commit. The two clauses operate at different layers:
LOCK=NONEcontrols InnoDB-level table access during the bulk of the operation.- Metadata locks are a server-level layer, separate from InnoDB row locks, and they apply regardless of the LOCK clause.
Bug #106480 (filed February 2022, last publicly modified May 2022) traced this to wait_while_table_is_used() in sql/sql_base.cc, which unconditionally upgrades to an exclusive MDL before commit regardless of the locking mode requested. The MySQL 8.4 manual text now acknowledges that an exclusive MDL may be taken briefly during execution and is taken at commit.
The practical consequence is that the presence of any long-running transaction holding a shared MDL on the table will block the DDL’s commit, and every new DML statement on the table will queue behind the DDL. The DDL is not stuck because of row locks or because the copy is slow. It is stuck because the server cannot get exclusive control of the table definition to swap in the new one.
Why ALGORITHM=INSTANT still blocks
ALGORITHM=INSTANT modifies only data dictionary metadata. No table data is copied, no rows are rebuilt, and concurrent DML is permitted during execution. Despite this, INSTANT still requires the exclusive MDL at commit. The same wait_while_table_is_used() code path applies.
INSTANT in MySQL 8.4 supports operations such as adding or dropping a column (at any position since 8.0.29), renaming a column, setting or dropping a column default, adding or dropping a virtual generated column, and changing index type. INSTANT is the default algorithm in MySQL 8.4. The server tries INSTANT first, then INPLACE, then COPY.
Always specify ALGORITHM explicitly. If you request ALGORITHM=INSTANT and the operation cannot be done INSTANT, MySQL returns an error rather than silently degrading to INPLACE or COPY. That error is cheap insurance against an unexpected full table rebuild.
One subtlety: ALGORITHM=INSTANT combined with LOCK=NONE, LOCK=SHARED, or LOCK=EXCLUSIVE produces an error. Only LOCK=DEFAULT is compatible with ALGORITHM=INSTANT.
The copy phase
ALGORITHM=COPY is the legacy path. It creates a temporary table, copies every row from the original, applies the schema change to the copy, and then swaps the new table in for the old. The copy phase blocks concurrent DML and holds an exclusive MDL for the entire operation, plus the brief commit-phase upgrade.
You can confirm a COPY operation by checking the rows-affected count. A nonzero rows-affected count after an ALTER indicates data was copied. INPLACE operations that do not rebuild the table report zero rows affected.
INPLACE rebuild operations
Some INPLACE operations rebuild the table internally even though they are classified as INPLACE rather than COPY. Examples include adding or dropping a primary key, changing a column type, and reordering columns. These operations consume disk space roughly equal to the original table size in the InnoDB temporary directory, controlled by innodb_tmpdir. They still permit concurrent DML when combined with LOCK=NONE, but the rebuild itself is more expensive than a metadata-only change.
The online alter log and the commit window
For ALGORITHM=INPLACE with LOCK=NONE, InnoDB buffers concurrent DML changes during execution in the online alter log, sized by innodb_online_alter_log_max_size (default 128 MB). These buffered changes are replayed at the end of the operation, during the exclusive MDL window at commit. The larger the log, the more replay work happens under the exclusive MDL, and the longer the commit window.
If the online alter log overflows during execution, the DDL fails and rolls back any uncommitted concurrent changes. Monitoring innodb_online_alter_log_max_size relative to write rate during a long INPLACE operation is important on write-heavy tables.
The INSTANT row-version limit
Every INSTANT ADD COLUMN or DROP COLUMN increments a per-table counter, visible as TOTAL_ROW_VERSIONS in INFORMATION_SCHEMA.INNODB_TABLES. In MySQL 8.0 through 8.4, the limit is 64. MySQL 9.1.0 raised the limit to 255.
When the limit is reached, MySQL silently degrades to INPLACE or COPY unless ALGORITHM is explicitly specified. If you always specify ALGORITHM=INSTANT, the server returns an error instead of silently downgrading, which is the safer operational behavior.
Monitor row versions with:
SELECT NAME, TOTAL_ROW_VERSIONS
FROM INFORMATION_SCHEMA.INNODB_TABLES
WHERE TOTAL_ROW_VERSIONS > 0
ORDER BY TOTAL_ROW_VERSIONS DESC;
Reset the counter with OPTIMIZE TABLE or ALTER TABLE ... ENGINE=InnoDB. Both rebuild the table and reset TOTAL_ROW_VERSIONS to zero.
pt-online-schema-change and gh-ost
Third-party schema change tools work around the copy phase by doing the copy outside of MySQL’s online DDL machinery, but they have their own MDL behavior.
pt-online-schema-change
pt-online-schema-change (pt-osc) creates a shadow table, copies rows in chunks, and keeps the shadow in sync with the original using INSERT, UPDATE, and DELETE triggers on the original table. Because of the trigger design, pt-osc cannot run on tables that already have triggers, and it requires exclusive access to the original table during trigger creation.
pt-osc holds metadata locks during the copy phase for operations such as trigger creation, the final table swap, and foreign key updates. Its lock_wait_timeout defaults to 60 seconds for metadata lock operations. If any session holds a long transaction on the table, pt-osc blocks at the same MDL layer as native online DDL.
gh-ost
gh-ost also holds MDL during the copy phase but uses row-based binlog streaming instead of triggers. This allows it to run on tables that already have triggers, which pt-osc cannot do. The trigger-free design changes some of the operational constraints but does not eliminate metadata lock requirements at the swap phase.
Both tools are useful when the operation cannot be done INPLACE or INSTANT, or when you need finer control over chunking, throttling, and pause-resume than native online DDL provides. Neither tool is lock-free.
Where this shows up in production
The most common production symptom is an ALTER TABLE that appears to hang, followed within minutes by a flood of application timeouts against the affected table. The pattern is:
- A DDL is issued during a window when a long transaction is open on the same table. The long transaction may be a monitoring query, a reporting query, an ORM that began a transaction without committing, or a
mysqldump --single-transactionbackup. - The DDL reaches commit and requests the exclusive MDL.
- The DDL waits for the long transaction to release its shared MDL.
- New DML statements on the table queue behind the DDL’s exclusive request.
- The connection pool fills with waiting sessions. Other tables continue to work, producing a confusing partial outage.
The DDL is not doing work during the wait. It is blocked at the MDL layer. CPU and disk I/O may be completely normal, which makes the incident harder to diagnose if you are only watching resource metrics.
Tradeoffs and when to use what
| Goal | Recommended approach | Tradeoff |
|---|---|---|
| Add or drop a column, rename a column, change index type | ALGORITHM=INSTANT | Still needs exclusive MDL at commit, but commit is fast |
| Add secondary index, change column type, rebuild primary key | ALGORITHM=INPLACE, LOCK=NONE | Concurrent DML allowed, but watch online alter log size and commit replay |
| Operation not supported by INPLACE | ALGORITHM=COPY or pt-osc / gh-ost | COPY blocks DML for the whole operation; pt-osc / gh-ost avoid that but add tooling complexity |
| Force failure instead of silent downgrade | Always specify ALGORITHM explicitly | Operation errors out if the requested algorithm is not supported |
Run DDL when no long transactions are open on the target table. This is the single most effective way to avoid the commit-phase MDL stall.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
performance_schema.metadata_locks rows with LOCK_STATUS = 'PENDING' | Direct evidence of MDL queue building | More than 3 sessions pending on the same object, sustained |
SHOW PROCESSLIST state “Waiting for table metadata lock” | Sessions blocked behind a DDL or behind each other | Count rising on a single table |
INNODB_TRX oldest transaction age | Long transactions hold shared MDLs | Any OLTP transaction older than 60 seconds |
information_schema.INNODB_TABLES.TOTAL_ROW_VERSIONS | Approaching the 64-row-version ceiling before silent INPLACE/COPY fallback | Counter approaching 50 for frequently altered tables |
Innodb_buffer_pool_pages_dirty ratio during INPLACE rebuild | Rebuild dirties many pages, can pressure checkpoint | Sustained dirty ratio above 75% |
innodb_online_alter_log_max_size relative to write rate | Online alter log overflow aborts the DDL | Log filling during long INPLACE on write-heavy table |
How Netdata helps
- The MySQL collector exposes
Threads_running,Threads_connected, andQuestionsrate, which together reveal the moment a DDL starts to stall query processing. - Netdata’s correlation across the metadata lock cascade makes it easier to see the relationship between a rising MDL wait count and a corresponding drop in throughput on the affected table.
- Monitoring long-lived transactions via
INNODB_TRXage gives early warning before a DDL is issued against a table with an open transaction. - History list length tracking (
trx_rseg_history_len) catches the related failure mode where an idle transaction is also blocking purge. - Per-table query digest latency from
performance_schema.events_statements_summary_by_digestshows whether a stall is isolated to one table or system-wide. - Alerting on
performance_schema.metadata_lockspending counts detects the cascade before it reaches connection exhaustion.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL ERROR 1045 (28000): Access denied for user - diagnosis
- MySQL adaptive hash index latch contention: high CPU, low throughput
- MySQL authentication failure spike: brute force vs broken credential rotation
- MySQL binary logs filling the disk: expiry, lagging replicas, and purge
- MySQL InnoDB buffer pool hit ratio collapse: the cliff edge
- MySQL slow after restart: buffer pool warm-up and the cold cache
- MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks
- MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches
- MySQL connection exhaustion: detection, diagnosis, and prevention







