Skip to content

[Accton][wedge800cact] Fix warmboot VerifyHostToQueueMappingClassID failure#1334

Open
BrandonCheng0121 wants to merge 1 commit into
facebook:mainfrom
BrandonCheng0121:warmboot_stats_race
Open

[Accton][wedge800cact] Fix warmboot VerifyHostToQueueMappingClassID failure#1334
BrandonCheng0121 wants to merge 1 commit into
facebook:mainfrom
BrandonCheng0121:warmboot_stats_race

Conversation

@BrandonCheng0121

Copy link
Copy Markdown
Contributor

Regarding the T1 warmboot.AgentQueuePerHostL2Test.VerifyHostToQueueMappingClassID failure: the issue is a warmboot stats race condition.
The test gets stale cached ACL stats within milliseconds of warmboot completing because the background stats thread updates them only every 1 second.

Solution:
Explicitly call updateStats() in SwSwitch::initialConfigApplied() after warmboot completes to immediately sync hardware stats to the software cache.

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run --files fboss/agent/SwSwitch.cpp fboss/agent/test/agent_hw_tests/AgentQueuePerHostL2Tests.cpp

Summary

After warmboot, tests immediately reading stats would get stale cached values because the background thread updates only every 1s.

Test Plan

Test command:
for i in {1..10}; do echo "=== run $i attempts ==="; time ./bin/run_test.py sai_agent --agent-run-mode mono --filter=AgentQueuePerHostL2Test.VerifyHostToQueueMappingClassID --skip-known-bad-tests "leaba/25.11.4210/25.11.4210/graphene202x" --enable-production-features g202x --config /opt/fboss/share/hw_test_configs/wedge800cact.agent.materialized_JSON --fruid-path /home/Go_FBOSS_Test/W800CA-Fix/./fboss-configs/fboss/oss/scripts/run_configs/fruid.json --mgmt-if eth0 --platform_mapping_override_path /home/Go_FBOSS_Test/W800CA-Fix/./fboss-configs/fboss/lib/platform_mapping_v2/generated_platform_mappings/wedge800cact_platform_mapping-2026-0418-v0.7-honglim_20260409-del_pie.json 2>&1 | tee 463e_W800CACT_VerifyHostToQueueMappingClassID_DUT35_DVT1_v5_$(date '+%Y-%m-%d-%H:%M')_$i.log; done

The test consistently passes for 10 consecutive runs.

…ailure

Regarding the T1 warmboot.AgentQueuePerHostL2Test.VerifyHostToQueueMappingClassID failure,
the issue is a warmboot stats race condition.
The test gets stale cached ACL stats within milliseconds of warmboot completing because the background stats thread only updates every 1 second.

Solution:
Explicitly call updateStats() in SwSwitch::initialConfigApplied() after warmboot completes to immediately sync hardware stats to software cache.
@BrandonCheng0121 BrandonCheng0121 requested a review from a team as a code owner June 25, 2026 08:19
@meta-cla meta-cla Bot added the CLA Signed label Jun 25, 2026
@BrandonCheng0121 BrandonCheng0121 marked this pull request as draft June 25, 2026 08:25
@BrandonCheng0121 BrandonCheng0121 marked this pull request as ready for review June 25, 2026 08:45
@BrandonCheng0121

Copy link
Copy Markdown
Contributor Author

@Tianyu-Meta
Please take a look at this PR, thanks.

@BrandonCheng0121 BrandonCheng0121 changed the title [Accton][wedge800cact] Fix warmboot VerifyHostToQueueMappingClassID f… Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant