Skip to content
This repository was archived by the owner on Mar 1, 2026. It is now read-only.

Make parts of Rdb_transaction objects safe to access from other threads#1511

Closed
laurynas-biveinis wants to merge 1 commit into
facebook:fb-mysql-8.0.32from
laurynas-biveinis:trx-obj-concurrency
Closed

Make parts of Rdb_transaction objects safe to access from other threads#1511
laurynas-biveinis wants to merge 1 commit into
facebook:fb-mysql-8.0.32from
laurynas-biveinis:trx-obj-concurrency

Conversation

@laurynas-biveinis

Copy link
Copy Markdown
Contributor

Almost all accesses to Rdb_transaction objects are from the owning
query-executing thread, but some fields are read from other threads, for
example, to execute SHOW ENGINE ROCKSDB STATUS, causing intermittent crashes due
to e.g. wild snapshot pointer read to get its timestamp.

Fix by moving all shared fields to the beginning of the class, documenting their
protection and making atomic as needed. For atomic fields, use only relaxed
memory order accesses, which should result in the same compiled code as before.
For the cases where something is read through a pointer, cache that data instead
in an atomic field and don't dereference the pointer from other threads:
Rdb_transaction::m_snapshot_ts instead of
m_read_opts[USER_TABLE].snapshot->GetUnixTime() and m_num_ongoing_bulk_load
instead of m_bulk_load_ctx->num_bulk_load().

Almost all accesses to Rdb_transaction objects are from the owning
query-executing thread, but some fields are read from other threads, for
example, to execute SHOW ENGINE ROCKSDB STATUS, causing intermittent crashes due
to e.g. wild snapshot pointer read to get its timestamp.

Fix by moving all shared fields to the beginning of the class, documenting their
protection and making atomic as needed. For atomic fields, use only relaxed
memory order accesses, which should result in the same compiled code as before.
For the cases where something is read through a pointer, cache that data instead
in an atomic field and don't dereference the pointer from other threads:
Rdb_transaction::m_snapshot_ts instead of
m_read_opts[USER_TABLE].snapshot->GetUnixTime() and m_num_ongoing_bulk_load
instead of m_bulk_load_ctx->num_bulk_load().
@facebook-github-bot

Copy link
Copy Markdown

@luqun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

statement_snapshot_type.store(snapshot_type::NONE,
std::memory_order_relaxed);
rdb->ReleaseSnapshot(m_read_opts[table_type].snapshot);
m_read_opts[table_type].snapshot = nullptr;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For line 6231 and line 6232, may also not safe for other threads(use after delete issue)

current thread finished execute "rdb->ReleaseSnapshot()" but before execute "m_read_opts[table_type].snapshot = nullptr", while another thread is calling tx->get_snapshot_ts();(https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.32/storage/rocksdb/ha_rocksdb.cc#L3987). the snapshot has been cleared but m_read_opts[USER_TABLE].snapshot may still reference old memory

maybe change to

      m_read_opts[table_type].snapshot = nullptr; 
      rdb->ReleaseSnapshot(m_read_opts[table_type].snapshot);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_snapshot_ts does not race here, because this PR introduces a caching field exactly for this scenario.

Having said that, m_read_opts[table_type].snapshot is indeed accessed incorrectly in other contexts (see my e-mail) and will be fixed one way or another in a follow-up

@facebook-github-bot

Copy link
Copy Markdown

This pull request has been merged in 7b2d125.

@laurynas-biveinis laurynas-biveinis deleted the trx-obj-concurrency branch January 16, 2025 08:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

3 participants