Skip to content

[Celestica] Support long duration test for AgentEnsembleLinkSanityTestDataPlaneFlood and ASIC-ASIC PRBS Tests#1066

Open
lihua-cls wants to merge 1 commit into
facebook:mainfrom
lihua-cls:tahansb_link_stress_duration
Open

[Celestica] Support long duration test for AgentEnsembleLinkSanityTestDataPlaneFlood and ASIC-ASIC PRBS Tests#1066
lihua-cls wants to merge 1 commit into
facebook:mainfrom
lihua-cls:tahansb_link_stress_duration

Conversation

@lihua-cls

Copy link
Copy Markdown
Contributor

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run
[INFO] Stashing unstaged files to /root/.cache/pre-commit/patch1775635465-3066528.
clang-format.............................................................Passed
shellcheck...........................................(no files to check)Skipped
shfmt................................................(no files to check)Skipped
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check for merge conflicts................................................Passed
ruff check...............................................................Passed
ruff format..............................................................Passed
[INFO] Restored changes from /root/.cache/pre-commit/patch1775635465-3066528.

Summary

Modify below tests to support a long duration tests, i.e: 48-hour continuous run:

Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity
AgentEnsembleLinkSanityTestDataPlaneFlood.warmbootIsHitLess
AgentEnsembleLinkSanityTestDataPlaneFlood.qsfpWarmbootIsHitLess

Solution

  1. add a flag "--link_stress_duration " for both run_test.py and single binary run, which will be used to specify the time duration (in minutes) the case running.

    Note: There's already another flag --link_stress_test which will run Prbs Test for 10 minutes. If both "link_stress_test" and "link_stress_duration" specified, only the old one "link_stress_test" will take effect to keep the behavior no change. If no "link_stress_duration" specified, the behavior keep no change as before.

  2. Longer the Prbs check interval (previously was 10s) to 3 minutes if duration more than 10 minutes.

  3. For DataPlaneFlood test cases, periodically pump traffic every 10 seconds during the test, until the duration timeout.

  4. In run_test.py, overwrite the test_run_timeout so that the case won't be timeout.

Test Plan

  1. run single binary with link_stress_duration for the 3 cases, ensure the run duration matches the expected value.
  2. run single binary with link_stress_duration for other cases, ensure the parameter won't take effect
  3. run single binary without link_stress_duration, ensure the parameter won't take effect, behavior is the same as before.
  4. use run_test.py to test 1~3, ensure the results are the same.
  5. 24 hours duration test for the 3 cases, ensure they all passed

Test Result

[       OK ] cold_boot.Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity (86569893 ms)
[       OK ] cold_boot.AgentEnsembleLinkSanityTestDataPlaneFlood.warmbootIsHitLess (86485485 ms)
[       OK ] cold_boot.AgentEnsembleLinkSanityTestDataPlaneFlood.qsfpWarmbootIsHitLess (86507338 ms)

Full logs in Gdrive

@lihua-cls lihua-cls requested review from a team as code owners April 8, 2026 08:11
@meta-cla meta-cla Bot added the CLA Signed label Apr 8, 2026
@togthoma

Copy link
Copy Markdown
Contributor

@lihua-cls could you move the branch to the latest?

@lihua-cls lihua-cls force-pushed the tahansb_link_stress_duration branch from b50b5b2 to a7cb7df Compare April 24, 2026 05:23
@lihua-cls

Copy link
Copy Markdown
Contributor Author

@lihua-cls could you move the branch to the latest?

Hi @togthoma
I've updated the branch to latest.
BTW, there are two failing checks, but it seems none of them related to my code changes.

Thanks

@lihua-cls lihua-cls force-pushed the tahansb_link_stress_duration branch from a7cb7df to 440fea5 Compare April 28, 2026 08:16
@meta-codesync

meta-codesync Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

@togthoma has imported this pull request. If you are a Meta employee, you can view this in D102812358.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants