Skip to content

[Celestica] Leh800bcls: Support NPU-specific yamlConfig to allow differing polarity settings on multi-NPU platforms#1350

Open
gang-tao wants to merge 1 commit into
facebook:mainfrom
gang-tao:leh800bcls_agent_fix13
Open

[Celestica] Leh800bcls: Support NPU-specific yamlConfig to allow differing polarity settings on multi-NPU platforms#1350
gang-tao wants to merge 1 commit into
facebook:mainfrom
gang-tao:leh800bcls_agent_fix13

Conversation

@gang-tao

Copy link
Copy Markdown
Contributor

Pre-submission checklist

  • [ ✓ ] I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • [ ✓ ] pre-commit run
    clang-format.............................................................Passed
    shellcheck...........................................(no files to check)Skipped
    shfmt................................................(no files to check)Skipped
    trim trailing whitespace.................................................Passed
    fix end of files.........................................................Passed
    check yaml...........................................(no files to check)Skipped
    check json...........................................(no files to check)Skipped
    check for merge conflicts................................................Passed
    ruff check...........................................(no files to check)Skipped
    ruff format..........................................(no files to check)Skipped
    Prevent sai_impl in fboss manifest.......................................Passed

Summary

Support NPU-specific yamlConfig loading on multi-NPU platforms (like leh800bcls) to allow differing hardware configurations (such as distinct PHY polarities) per NPU.

Background

Due to hardware routing and layout differences on the new dual-NPU leh800bcls platform, NPU 0 and NPU 1 require different polarity settings (RX_POLARITY_FLIP and TX_POLARITY_FLIP in HSDK's yamlConfig).
Previously, SaiBcmPlatform::getHwConfig() always loaded configuration from asicConfig.common()->get_yamlConfig(), forcing both NPUs to apply the exact same hardware YAML settings. Under split agent setups, this makes it impossible to configure distinct polarities for different NPUs.

Changes

Refactored SaiBcmPlatform::getHwConfig() in fboss/agent/platforms/sai/SaiBcmPlatform.cpp:
- Retrieve the current platform's active logic switch index via the first-class HwAsic API getAsic()->getSwitchIndex(). This is guaranteed to be populated on both split agent test runners and unified software agents (e.g. fboss_sw_agent).
- Query the npuEntries map in PlatformConfig using the switch index.
- If an NPU-specific entry is found and of type cfg::AsicConfigEntry::Type::yamlConfig, extract and load its corresponding yamlConfig.
- Fall back to asicConfig.common()->get_yamlConfig() if no NPU-specific configuration is specified.

Test Plan

  1. With two different NPU configurations, all ports are up.
agent config:
  "platform": {
    "chip": {
      "asicConfig": {
        "common": {
          "yamlConfig": ""
        },
        "npuEntries": {
          "0": {
            "yamlConfig": "---\ndevice:\n    0:\n        PC_PM_CORE:\n            ?\n                PC_PM_ID: 1\n                ...
          },
          "1": {
            "yamlConfig": "---\ndevice:\n    0:\n        PC_PM_CORE:\n            ?\n                PC_PM_ID: 1\n                ...
          }
        }
      }
    }
  },

[root@localhost cfg]# grep -B2 yamlConfig agent.conf | cut -c 1-200
      "asicConfig": {
        "common": {
          "yamlConfig": ""
--
        "npuEntries": {
          "0": {
            "yamlConfig": "---\ndevice:\n    0:\n        PC_PM_CORE:\n            ?\n                PC_PM_ID: 1\n                CORE_INDEX: 0\n            :\n                RX_LANE_MAP_AUTO: 0\n
          },
          "1": {
            "yamlConfig": "---\ndevice:\n    0:\n        PC_PM_CORE:\n            ?\n                PC_PM_ID: 1\n                CORE_INDEX: 0\n            :\n                RX_LANE_MAP_AUTO: 0\n

[root@localhost agent_ensemble]# diff /dev/shm/fboss/hw_config_idx0 /dev/shm/fboss/hw_config_idx1
351,352c351,352
<                 RX_POLARITY_FLIP: 0xf2
<                 TX_POLARITY_FLIP: 0x31
---
>                 RX_POLARITY_FLIP: 0x0d
>                 TX_POLARITY_FLIP: 0xce
363,364c363,364
<                 RX_POLARITY_FLIP: 0xf2
<                 TX_POLARITY_FLIP: 0x11
---
>                 RX_POLARITY_FLIP: 0x0d
>                 TX_POLARITY_FLIP: 0xee
387,388c387,388
<                 RX_POLARITY_FLIP: 0xf2
<                 TX_POLARITY_FLIP: 0x11
---
>                 RX_POLARITY_FLIP: 0x0d
>                 TX_POLARITY_FLIP: 0xee
399,400c399,400
<                 RX_POLARITY_FLIP: 0xb
<                 TX_POLARITY_FLIP: 0x77
---
>                 RX_POLARITY_FLIP: 0xf4
>                 TX_POLARITY_FLIP: 0x88
411,412c411,412
<                 RX_POLARITY_FLIP: 0xb
<                 TX_POLARITY_FLIP: 0x77
---
>                 RX_POLARITY_FLIP: 0xf4
>                 TX_POLARITY_FLIP: 0x88
423,424c423,424
<                 RX_POLARITY_FLIP: 0xb
<                 TX_POLARITY_FLIP: 0x77
---
>                 RX_POLARITY_FLIP: 0xf4
>                 TX_POLARITY_FLIP: 0x88
435,436c435,436
<                 RX_POLARITY_FLIP: 0xb
<                 TX_POLARITY_FLIP: 0x37
---
>                 RX_POLARITY_FLIP: 0xf4
>                 TX_POLARITY_FLIP: 0xc8

[root@localhost agent_ensemble]# fb show port | grep Up | wc -l
466

  1. Using the previous/legacy configuration (i.e., placing yamlConfig in common) also works normally.
[root@localhost cfg]# grep -B2 yamlConfig agent.conf | cut -c 1-200
      "asicConfig": {
        "common": {
          "yamlConfig": "---\ndevice:\n    0:\n        PC_PM_CORE:\n            ?\n                PC_PM_ID: 1\n                CORE_INDEX: 0\n            :\n                RX_LANE_MAP_AUTO: 0\n
[root@localhost log]# fb show port | grep Up | wc -l
466
[root@localhost log]#

@gang-tao gang-tao requested a review from a team as a code owner June 30, 2026 13:58
@meta-cla meta-cla Bot added the CLA Signed label Jun 30, 2026
}
}
if (yamlConfig.empty()) {
yamlConfig = asicConfig.common()->get_yamlConfig();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if common yamlConfig is set to "". Then there will be no throw and code will not fallback to bcm yaml config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point!
Added a check if (yamlConfig.empty()) { throw FbossError(...); } right after the fallback.
If the common entry is explicitly set to "", it successfully deserializes without throwing, but yamlConfig remains empty. The subsequent yamlConfig.empty() check will evaluate to true and explicitly throw FbossError, safely triggering the outer catch block to fall back to bcm.yamlConfig().
will update soon.

cfg::AsicConfigEntry::Type::yamlConfig) {
XLOG(INFO) << "Loading NPU-specific yamlConfig for switchIndex="
<< switchIndex;
yamlConfig = npuEntry->second.get_yamlConfig();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this code block throws? you have a single catch try for multiple errors. if per npu entry is not present, you should use the common entry. if common entry is also not present, you should fallback to the bcm yaml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great catch!
To address your concerns, I have wrapped the npuEntries loading in a nested try-catch block. If NPU entries throw or are missing, the code now safely logs a warning and still proceeds to attempt the common configuration rather than skipping directly to bcm.yamlConfig.
will update soon.

yamlConfig = npuEntry->second.get_yamlConfig();
} else {
XLOG(INFO) << "No NPU-specific yamlConfig found for switchIndex="
<< switchIndex << ", falling back to common config";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is no common config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point!
I've added a defensive union type check (getType() == Type::yamlConfig) before accessing common to avoid potential bad_field_access crashes. If both NPU-specific and common configs are missing or invalid, we now explicitly throw FbossError to safely trigger the outer fallback to bcm.yamlConfig.
code will update soon.

…ering polarity settings on multi-NPU platforms
@gang-tao gang-tao force-pushed the leh800bcls_agent_fix13 branch from e643d0b to 45a9fa8 Compare July 1, 2026 01:18
@gang-tao

gang-tao commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@msomasundaran Thank you very much for your review. I have made the modifications based on your comments. Currently, the code has been tested separately, and it can be fully ported up using both the distinct YAML configurations for the two NPUs and the shared common YAML configuration.
Please help to take another look. If there are any further issues, please let me know, and I will continue to make revisions.

[root@localhost log]# fb show port | grep Up | wc -l
466

@gang-tao gang-tao requested a review from msomasundaran July 1, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants