Make parts of Rdb_transaction objects safe to access from other threads#1511
Make parts of Rdb_transaction objects safe to access from other threads#1511laurynas-biveinis wants to merge 1 commit into
Conversation
Almost all accesses to Rdb_transaction objects are from the owning query-executing thread, but some fields are read from other threads, for example, to execute SHOW ENGINE ROCKSDB STATUS, causing intermittent crashes due to e.g. wild snapshot pointer read to get its timestamp. Fix by moving all shared fields to the beginning of the class, documenting their protection and making atomic as needed. For atomic fields, use only relaxed memory order accesses, which should result in the same compiled code as before. For the cases where something is read through a pointer, cache that data instead in an atomic field and don't dereference the pointer from other threads: Rdb_transaction::m_snapshot_ts instead of m_read_opts[USER_TABLE].snapshot->GetUnixTime() and m_num_ongoing_bulk_load instead of m_bulk_load_ctx->num_bulk_load().
|
@luqun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
| statement_snapshot_type.store(snapshot_type::NONE, | ||
| std::memory_order_relaxed); | ||
| rdb->ReleaseSnapshot(m_read_opts[table_type].snapshot); | ||
| m_read_opts[table_type].snapshot = nullptr; |
There was a problem hiding this comment.
For line 6231 and line 6232, may also not safe for other threads(use after delete issue)
current thread finished execute "rdb->ReleaseSnapshot()" but before execute "m_read_opts[table_type].snapshot = nullptr", while another thread is calling tx->get_snapshot_ts();(https://github.com/facebook/mysql-5.6/blob/fb-mysql-8.0.32/storage/rocksdb/ha_rocksdb.cc#L3987). the snapshot has been cleared but m_read_opts[USER_TABLE].snapshot may still reference old memory
maybe change to
m_read_opts[table_type].snapshot = nullptr;
rdb->ReleaseSnapshot(m_read_opts[table_type].snapshot);
There was a problem hiding this comment.
get_snapshot_ts does not race here, because this PR introduces a caching field exactly for this scenario.
Having said that, m_read_opts[table_type].snapshot is indeed accessed incorrectly in other contexts (see my e-mail) and will be fixed one way or another in a follow-up
|
This pull request has been merged in 7b2d125. |
Almost all accesses to Rdb_transaction objects are from the owning
query-executing thread, but some fields are read from other threads, for
example, to execute SHOW ENGINE ROCKSDB STATUS, causing intermittent crashes due
to e.g. wild snapshot pointer read to get its timestamp.
Fix by moving all shared fields to the beginning of the class, documenting their
protection and making atomic as needed. For atomic fields, use only relaxed
memory order accesses, which should result in the same compiled code as before.
For the cases where something is read through a pointer, cache that data instead
in an atomic field and don't dereference the pointer from other threads:
Rdb_transaction::m_snapshot_ts instead of
m_read_opts[USER_TABLE].snapshot->GetUnixTime() and m_num_ongoing_bulk_load
instead of m_bulk_load_ctx->num_bulk_load().