Skip to content

Break out of epoll_wait when adding loop callbacks during event loop#2559

Open
benoit-nexthop wants to merge 1 commit into
facebook:mainfrom
benoit-nexthop:loopbreak
Open

Break out of epoll_wait when adding loop callbacks during event loop#2559
benoit-nexthop wants to merge 1 commit into
facebook:mainfrom
benoit-nexthop:loopbreak

Conversation

@benoit-nexthop

Copy link
Copy Markdown
Contributor

Thrift RPC Latency Bug

Summary

Thrift RPC calls experience multi-second latency (5-7 seconds) when they should complete in milliseconds. The root cause is that EventBase::runInLoop() adds callbacks to loopCallbacks_ without waking up the EventBase if it's blocked in eb_event_base_loop().

Environment

  • folly EventBase with libevent backend
  • fbthrift with Rocket protocol
  • WriteBatcher for batching writes

Reproducer

Here is minimal test case to be built in the FBOSS OSS tree demonstrates the issue:

/*
 * Minimal reproducer for Thrift RPC latency issue.
 * Expected: RPC call should complete in < 10ms
 * Observed: RPC call takes several seconds due to WriteBatcher callback delay
 */

#include <chrono>
#include <iostream>
#include <memory>
#include <thread>

#include <folly/init/Init.h>
#include <folly/io/async/AsyncSocket.h>
#include <folly/io/async/EventBase.h>
#include <thrift/lib/cpp2/async/RocketClientChannel.h>
#include <thrift/lib/cpp2/util/ScopedServerInterfaceThread.h>

#include "fboss/agent/if/gen-cpp2/FbossCtrl.h"

using namespace facebook::fboss;
using namespace apache::thrift;

// Simple handler that returns a fixed timestamp
class SimpleHandler : public FbossCtrlSvIf {
 public:
  void getConfigAppliedInfo(ConfigAppliedInfo& info) override {
    info.lastAppliedInMs() = 12345;
    info.lastColdbootAppliedInMs() = 0;
  }
};

int main(int argc, char** argv) {
  folly::init(&argc, &argv);

  std::cout << "Creating Thrift server..." << std::endl;

  // Use ScopedServerInterfaceThread which handles startup/shutdown properly
  auto handler = std::make_shared<SimpleHandler>();
  ScopedServerInterfaceThread server(handler, "::1", 0);

  auto port = server.getAddress().getPort();
  std::cout << "Server listening on port " << port << std::endl;

  // Create client using ScopedServerInterfaceThread's helper
  auto client = server.newClient<FbossCtrlAsyncClient>(
      nullptr, RocketClientChannel::newChannel);

  std::cout << "Making RPC call..." << std::endl;

  // Measure RPC latency
  auto start = std::chrono::steady_clock::now();

  ConfigAppliedInfo result;
  client->sync_getConfigAppliedInfo(result);

  auto end = std::chrono::steady_clock::now();
  auto duration =
      std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

  std::cout << "RPC call completed in " << duration.count() << " ms"
            << std::endl;
  std::cout << "Result: lastAppliedInMs = " << *result.lastAppliedInMs()
            << std::endl;

  if (duration.count() > 100) {
    std::cout << "ERROR: RPC took too long! Expected < 100ms" << std::endl;
  } else {
    std::cout << "OK: RPC latency is acceptable" << std::endl;
  }

  return duration.count() > 100 ? 1 : 0;
}

Save this file to fboss/cli/fboss2/test/thrift_latency_test.cpp and add this at the end of cmake/CliFboss2Test.cmake:

add_executable(thrift_latency_test
  fboss/cli/fboss2/test/thrift_latency_test.cpp
)

target_link_libraries(thrift_latency_test
  ctrl_cpp2
  Folly::folly
  FBThrift::thriftcpp2
)

Before the fix this test binary takes 5-7 seconds to execute, after the fix it takes a few milliseconds.

Root Cause Analysis

The Problem

When a Thrift RPC response is ready to be sent, the WriteBatcher calls runInLoop(this, true) to schedule the write. The thisIteration=true parameter indicates the callback should run in the current loop iteration.

However, the callback is only added to runOnceCallbacks_ if we're currently inside runLoopCallbacks(). Otherwise, it's added to loopCallbacks_ for the next iteration.

The issue occurs when:

  1. The EventBase is inside eb_event_base_loop() (blocked in epoll_wait)
  2. A socket event triggers a callback that calls runInLoop()
  3. runOnceCallbacks_ is null (we're not in runLoopCallbacks())
  4. The callback is added to loopCallbacks_
  5. After the socket callback returns, eb_event_base_loop() continues blocking
  6. The callback sits in loopCallbacks_ until epoll_wait times out (~5-6 seconds typically)

Code Flow (Before Fix)

eb_event_base_loop(EVLOOP_ONCE)  <- blocking in epoll_wait
  |
  +-> socket event fires
  |     |
  |     +-> WriteBatcher::enqueueWrite()
  |           |
  |           +-> runInLoop(this, true)
  |                 |
  |                 +-> runOnceCallbacks_ == nullptr
  |                 +-> loopCallbacks_.push_back(callback)  <- callback queued
  |
  +-> epoll_wait continues blocking  <- NO WAKEUP!
  |
  ... 5-6 seconds pass ...
  |
  +-> epoll_wait times out
  +-> eb_event_base_loop returns
  |
runLoopCallbacks()  <- finally processes the callback

Instrumentation

I sprinkled a ton of debug logging in folly and fbthrift to debug the issue:

EventBase.cpp

  • Log when entering eb_event_base_loop (blocking vs non-blocking)
  • Log when runInLoop adds to loopCallbacks_ vs runOnceCallbacks_
  • Log when runLoopCallbacks starts/ends

WriteBatcher-inl.h

  • Log when enqueueWrite calls runInLoop

Reproduction (Without Fix)

Debug Output

Excerpt of the output from the reproduction binary shared above with the extra debug logging I added:

[WRITEBATCHER tid=X] enqueueWrite: calling runInLoop (immediate)
[EVENTBASE tid=X] runInLoop: adding to loopCallbacks_ (runOnceCallbacks_=(nil), thisIteration=1)
... 6 seconds of silence on this thread ...
[EVENTBASE tid=X] loopMain: eb_event_base_loop returned 0
[EVENTBASE tid=X] runLoopCallbacks: starting with 2 callbacks

(See below for captured strace output showing the 6-second epoll_wait)

The Fix

Add a call to eb_event_base_loopbreak() when adding to loopCallbacks_ while the loop is running:

void EventBase::runInLoop(LoopCallback* callback, bool thisIteration, ...) {
  // ... existing code ...
  if (runOnceCallbacks_ != nullptr && thisIteration) {
    runOnceCallbacks_->push_back(*callback);
  } else {
    loopCallbacks_.push_back(*callback);
    // NEW: Wake up the EventBase if it's blocked in eb_event_base_loop()
    if (isRunning()) {
      evb_->eb_event_base_loopbreak();
    }
  }
}

This causes eb_event_base_loop() to return immediately, allowing the loop to process the pending callbacks. This if/else sequence was duplicated so rather than fixing this in the same way in two different places, I've factored this out in a private helper.

Results

Based on thrift_latency_test.cpp above, running on AMD EPYC 9655P:

Metric Before Fix After Fix
RPC Latency ~6500 ms ~1 ms
Test Duration ~6.5 seconds ~6 ms

Captured Traces

Before Fix (Issue Reproduction)

strace output

key events for thread 1910218, the server IO thread:

# Thread 1910218 returns from a short 6ms epoll_wait with an event
[pid 1910218] 13:50:40.269315 epoll_wait(10, [...], 32, 6) = 1

# While processing that event, WriteBatcher adds callbacks to loopCallbacks_
[WRITEBATCHER tid=1910218] enqueueWrite: calling runInLoop (immediate)
[EVENTBASE tid=1910218] runInLoop: adding to loopCallbacks_ (runOnceCallbacks_=(nil), thisIteration=1)
[EVENTBASE tid=1910218] runInLoop: adding to loopCallbacks_ (runOnceCallbacks_=(nil), thisIteration=0)

# PROBLEM: libevent goes back into epoll_wait INSIDE eb_event_base_loop!
# It doesn't know about our loopCallbacks_ - only folly knows about those.
[pid 1910218] 13:50:40.270073 epoll_wait(10,  <unfinished ...>

# 5 seconds later, epoll_wait times out (timeout was 5042ms)
[pid 1910218] 13:50:45.317251 <... epoll_wait resumed>[], 32, 5042) = 0

# NOW eb_event_base_loop returns and we finally process the callbacks
[EVENTBASE tid=1910218] loopMain: eb_event_base_loop returned 0
[EVENTBASE tid=1910218] runLoopCallbacks: starting with 2 callbacks

Debug output

Making RPC call...
[WRITEBATCHER tid=1911408] enqueueWrite: calling runInLoop (immediate)
[EVENTBASE tid=1911408] runInLoop: adding to loopCallbacks_ (runOnceCallbacks_=(nil), thisIteration=1)
[WRITEBATCHER tid=1911408] enqueueWrite: calling runInLoop (immediate)
[EVENTBASE tid=1911408] runInLoop: adding to loopCallbacks_ (runOnceCallbacks_=(nil), thisIteration=1)
RPC call completed in 7062 ms
ERROR: RPC took too long! Expected < 100ms

Observations

  1. WriteBatcher calls runInLoop(this, true): The thisIteration=1 parameter indicates
    the callback should run immediately in the current loop iteration.

  2. Callback goes to loopCallbacks_ instead of runOnceCallbacks_: The log shows
    runOnceCallbacks_=(nil) because we're NOT inside runLoopCallbacks() at this point.
    We're inside a libevent callback (epoll event handler), so the callback is queued.

  3. libevent goes back into epoll_wait after processing the event: This is the bug.
    The flow is:

    • eb_event_base_loop(EVLOOP_ONCE) calls epoll_wait
    • An event fires, epoll_wait returns
    • libevent calls the event handler (which triggers WriteBatcher)
    • WriteBatcher adds callbacks to folly's loopCallbacks_
    • Event handler returns to libevent
    • libevent goes back into epoll_wait because it doesn't know about folly's loopCallbacks_!
    • The 5042ms timeout expires
    • eb_event_base_loop finally returns
    • folly's runLoopCallbacks() processes the 2 queued callbacks
  4. The ~5 second delay is a single epoll_wait timeout: The timeout of 5042ms is set by
    libevent based on the next timer event. The callbacks sit in loopCallbacks_ for the
    entire duration because libevent has no way to know they're waiting.


After Fix

strace output

# WriteBatcher enqueues callback - now with eb_event_base_loopbreak() called
[pid 1939305] 13:53:45.777230 epoll_wait(26, [WRITEBATCHER tid=1939302] enqueueWrite: calling runInLoop (immediate)
[pid 1939305] 13:53:45.780801 epoll_wait(26, [WRITEBATCHER tid=1939302] enqueueWrite: calling runInLoop (immediate)

# RPC completes in just 8ms!
RPC call completed in 8 ms
OK: RPC latency is acceptable

Debug output

Making RPC call...
[WRITEBATCHER tid=1939834] enqueueWrite: calling runInLoop (immediate)
[EVENTBASE tid=1939834] runInLoop: adding to loopCallbacks_ (runOnceCallbacks_=(nil), thisIteration=1)
[WRITEBATCHER tid=1939834] enqueueWrite: calling runInLoop (immediate)
[EVENTBASE tid=1939834] runInLoop: adding to loopCallbacks_ (runOnceCallbacks_=(nil), thisIteration=1)
RPC call completed in 1 ms
OK: RPC latency is acceptable

Observations

  1. Same callback path: The callback still goes to loopCallbacks_ (since runOnceCallbacks_
    is still null), but now we also call eb_event_base_loopbreak().

  2. Immediate wakeup: The eb_event_base_loopbreak() call causes epoll_wait to return
    immediately, allowing the EventBase to process the pending callback.

  3. RPC latency reduced from ~6 seconds to ~1-8ms: this helps all OSS FBOSS users keep their sanity.

  4. No timeout waiting: The epoll_wait no longer blocks for the full timeout period.
    The loop breaks immediately and processes callbacks.


Conclusion

The fix adds a single call to eb_event_base_loopbreak() when adding callbacks to
loopCallbacks_ while the loop is running. This ensures the EventBase wakes up promptly
to process the callback instead of waiting for the next event or timeout.

This is a fix for any code path that uses runInLoop() with thisIteration=true
while inside an epoll callback handler, as the callback will otherwise be delayed until
the next event loop timeout (typically 5-7 seconds).

It has been reported that this issue has not been observed inside Meta, and I cannot explain why.

When `runInLoop()` is called while the EventBase loop is running (but not
inside `runLoopCallbacks`), the callback is added to `loopCallbacks_` but
the loop may be blocked in `epoll_wait` with a long timeout. This causes
significant latency as the callback won't be processed until the timeout
expires or another event occurs.

Fix this by calling `eb_event_base_loopbreak()` when adding callbacks to
`loopCallbacks_` while the loop is running. This breaks out of `epoll_wait`
so the callback can be processed promptly in the next loop iteration.

This is particularly important for Thrift clients where responses arrive
on a different thread and are queued via `runInLoop()`. Without this fix,
each RPC call could be delayed by up to the epoll timeout (e.g., 30s).

In OSS FBOSS this fixes the 5-7s delay seen on every Thrift call.  Those
can add up quickly, for example FBOSS CLI end-to-end tests now take 1.4s
instead of ~155s without this fix.
@meta-cla meta-cla Bot added the CLA Signed label Jan 16, 2026
@benoit-nexthop

Copy link
Copy Markdown
Contributor Author

I previously attempted to work around this issue in FBOSS with facebook/fboss#628, which contained a commit facebook/fboss@b66e380 that used server.setIdleTimeout() to make the event loop not wait so long, but I felt like this wasn't the right fix, and now I know why.

@benoit-nexthop

benoit-nexthop commented Jan 16, 2026

Copy link
Copy Markdown
Contributor Author

I amended my commit just above to change the condition from if (isRunning()) { to if (isRunning() && !inRunningEventBaseThread()) { because of failing unit tests, however now I realize that in doing so the fix becomes ineffective, because the calls are always coming from an EventBase thread. Maybe the unit tests need to be adjusted accordingly.

A typical call stack looks like this:

Thread 11 (Thread 0x7f4e50ff9640 (LWP 2941198) "ThriftIO0"):
#0  folly::EventBase::addLoopCallback (this=this@entry=0x7f4e3c000f80, callback=warning: RTTI symbol not found for class 'folly::EventBaseAtomicNotificationQueue<std::variant<apache::thrift::QueueReplyInfo, apache::thrift::StreamReplyInfo, apache::thrift::SinkConsumerReplyInfo, apache::thrift::TilePromiseReplyInfo, apache::thrift::BiDiStreamReplyInfo>, apache::thrift::ReplyInfoConsumer>' ..., thisIteration=<optimized out>) at /var/FBOSS/tmp_bld_dir/repos/github.com-facebook-folly.git/folly/io/async/EventBase.cpp:867
#1  0x0000000000b61527 in folly::EventBase::runInLoop (this=0x7f4e3c000f80, callback=0x7f4e4803fb00, thisIteration=false, rctx=std::shared_ptr<folly::RequestContext> (empty) = {...}) at /var/FBOSS/tmp_bld_dir/repos/github.com-facebook-folly.git/folly/io/async/EventBase.cpp:884
#2  0x0000000000cd754b in folly::EventBaseAtomicNotificationQueue<std::variant<apache::thrift::QueueReplyInfo, apache::thrift::StreamReplyInfo, apache::thrift::SinkConsumerReplyInfo, apache::thrift::TilePromiseReplyInfo, apache::thrift::BiDiStreamReplyInfo>, apache::thrift::ReplyInfoConsumer>::execute() ()
#3  0x0000000000cd6e78 in non-virtual thunk to folly::EventBaseAtomicNotificationQueue<std::variant<apache::thrift::QueueReplyInfo, apache::thrift::StreamReplyInfo, apache::thrift::SinkConsumerReplyInfo, apache::thrift::TilePromiseReplyInfo, apache::thrift::BiDiStreamReplyInfo>, apache::thrift::ReplyInfoConsumer>::handlerReady(unsigned short) ()
#4  0x0000000000b68aca in folly::EventHandler::libeventCallback (fd=<optimized out>, events=2, arg=0x7f4e4803fb28) at /var/FBOSS/tmp_bld_dir/repos/github.com-facebook-folly.git/folly/io/async/EventHandler.cpp:160
#5  0x00007f4e629dd918 in event_persist_closure (ev=<optimized out>, base=0x7f4e3c001290) at /usr/src/debug/libevent-2.1.12-8.el9.x86_64/event.c:1623
#6  event_process_active_single_queue (base=base@entry=0x7f4e3c001290, activeq=0x7f4e3c0016e0, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at /usr/src/debug/libevent-2.1.12-8.el9.x86_64/event.c:1682
#7  0x00007f4e629df267 in event_process_active (base=0x7f4e3c001290) at /usr/src/debug/libevent-2.1.12-8.el9.x86_64/event.c:1783
#8  event_base_loop (base=0x7f4e3c001290, flags=<optimized out>) at /usr/src/debug/libevent-2.1.12-8.el9.x86_64/event.c:2006
#9  0x0000000000b60806 in folly::EventBase::loopMain (this=this@entry=0x7f4e3c000f80, flags=flags@entry=0, options=...) at /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/unique_ptr.h:191
#10 0x0000000000b6052d in folly::EventBase::loopBody (this=this@entry=0x7f4e3c000f80, flags=flags@entry=0, options=options@entry=...) at /var/FBOSS/tmp_bld_dir/repos/github.com-facebook-folly.git/folly/io/async/EventBase.cpp:540
#11 0x0000000000b604db in folly::EventBase::loop (this=0x7f4e3c000f80) at /var/FBOSS/tmp_bld_dir/repos/github.com-facebook-folly.git/folly/io/async/EventBase.cpp:501
#12 0x0000000000b6156f in folly::EventBase::loopForever (this=0x7f4e3c000f80) at /var/FBOSS/tmp_bld_dir/repos/github.com-facebook-folly.git/folly/io/async/EventBase.cpp:811
#13 0x0000000000cc772a in folly::IOThreadPoolExecutor::threadRun (this=0x34d43a40, thread=...) at /var/FBOSS/tmp_bld_dir/repos/github.com-facebook-folly.git/folly/executors/IOThreadPoolExecutor.cpp:252
#14 0x0000000000bac982 in std::__invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__f=<optimized out>, __t=<optimized out>, __args=...) at /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:74
#15 0x00007f4e62090ae4 in std::execute_native_thread_routine (__p=0x7f4e4803e8d0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#16 0x00007f4e61d3f2fa in start_thread (arg=<optimized out>) at pthread_create.c:443
#17 0x00007f4e61dc3624 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

So I'm undoing this amendment for now.

meta-codesync Bot pushed a commit to facebook/fboss that referenced this pull request Jan 22, 2026
Summary:
This diff just added the experimental thrift_latency_test.cpp from facebook/folly#2559

With the oss build, this simple thrift call can take 6418ms
```
[root@0eb3f90dd235 fboss]# ./thrift_latency_test
Creating Thrift server...
I0116 22:30:33.090906 42865 ThriftServer.cpp:2060] Using randomly generated TLS ticket keys.
I0116 22:30:33.091259 42865 ThriftServer.cpp:988] Using resource pools on address/port [::1]:0: thrift flag: true, enable gflag: false, disable gflag: false, runtime actions:
I0116 22:30:33.091567 42865 ThriftServer.cpp:1538] Resource pools configured: 2
I0116 22:30:33.092444 42865 ThriftServer.cpp:746] ThriftServer listening on address/port: [::1]:37999
Server listening on port 37999
Making RPC call...
RPC call completed in 6418 ms
Result: lastAppliedInMs = 12345
ERROR: RPC took too long! Expected < 100ms
```

While in our internal build, the same test without using any FBOSS internal logic only take 16ms
```
❯❯❯ buck2 run fbcode//fboss/cli/fboss2/test:thrift_latency_test
Buck UI: https://www.internalfb.com/buck2/fc447728-e42e-45da-8e9b-a29ae80abeb5
Network: Up: 0B  Down: 0B
Command: run.
Time elapsed: 0.0s
BUILD SUCCEEDED - starting your binary
Creating Thrift server...
I0116 14:35:48.140404 328824 ThriftServer.cpp:2060] Using randomly generated TLS ticket keys.
I0116 14:35:48.142755 328824 ThriftServer.cpp:988] Using resource pools on address/port [::1]:0: thrift flag: true, enable gflag: false, disable gflag: false, runtime actions:
I0116 14:35:48.145723 328824 ThriftServer.cpp:1538] Resource pools configured: 2
I0116 14:35:48.281738 328824 ThriftServer.cpp:746] ThriftServer listening on address/port: [::1]:40581
Server listening on port 40581
Making RPC call...
RPC call completed in 16 ms
Result: lastAppliedInMs = 12345
OK: RPC latency is acceptable
```

This is a huge difference b.w OSS build and internal build: **6418ms vs 16ms**
As you can see in thrift_latency_test.cpp, we used `ScopedServerInterfaceThread` to spawn a simple fboss agent thrift server, then use `FbossCtrlAsyncClient` with `RocketClientChannel` to call a thrift function.
No FBOSS internal logic is used here.

I am suspecting there's performance discrepancy b/w our internal build and OSS build for fbthrift.

Based on the new comments from https://fb.workplace.com/groups/560979627394613/permalink/3541014286057784/
I asked Metamate to generate a similar test but use EpollBackend and IoUring to create a thrift server

Reviewed By: kevin645

Differential Revision: D90894498

fbshipit-source-id: c8d04d6434c7aca326bc33133d40f514ea8252bb
@benoit-nexthop

Copy link
Copy Markdown
Contributor Author

Related FBOSS change for context: facebook/fboss@43100ad — switching away from libevent to the epoll based backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants