Skip to content

[PlatformManager] allow pciDeviceConfigs diff across versioned PmUnits#1137

Open
aalamsi22 wants to merge 4 commits into
facebook:mainfrom
aalamsi22:versionedPciDeviceConfigsConfigs
Open

[PlatformManager] allow pciDeviceConfigs diff across versioned PmUnits#1137
aalamsi22 wants to merge 4 commits into
facebook:mainfrom
aalamsi22:versionedPciDeviceConfigsConfigs

Conversation

@aalamsi22

Copy link
Copy Markdown
Contributor

Summary

Allow a pmunit's pci Device content to differ across default and versioned PmUnits.

Depends on the changes here #1065 which add pmUnitVersion to versionedPmUnitConfigs

Test Plan

Testing on Typical SCM Configs

# weutil --eeprom scm
...
Product Production State: 1
Product Version: 2
Product Sub-Version: 3
...

Testing versioned pm unit (SCM) with modified configs

  "versionedPmUnitConfigs": {
    "SCM": [
      {
        "pmUnitConfig":         {
          "pluggedInSlotType": "SCM_SLOT",
          ...
          "pciDeviceConfigs": [
            {
              ...
              "i2cAdapterBlockConfigs": [
                {
                  "pmUnitScopedNamePrefix": "SCM_I2C_MASTER",
                  "deviceName": "i2c_master",
                  "csrOffsetCalc": "0x8000 + ({adapterIndex} - {startAdapterIndex})*0x80",
                  "startAdapterIndex": 0,
 ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶n̶u̶m̶A̶d̶a̶p̶t̶e̶r̶s̶"̶:̶ ̶2̶,̶
                  "numAdapters": 4,
                  "numBusesPerAdapter": 8
                }
              ],
            }
          ],
          "spiMasterConfigs": [],
          "infoRomConfigs": [
 ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶{̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶p̶m̶U̶n̶i̶t̶S̶c̶o̶p̶e̶d̶N̶a̶m̶e̶"̶:̶ ̶"̶S̶C̶M̶_̶F̶P̶G̶A̶_̶I̶N̶F̶O̶_̶R̶O̶M̶"̶,̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶d̶e̶v̶i̶c̶e̶N̶a̶m̶e̶"̶:̶ ̶"̶f̶p̶g̶a̶_̶i̶n̶f̶o̶_̶i̶o̶b̶"̶,̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶c̶s̶r̶O̶f̶f̶s̶e̶t̶"̶:̶ ̶"̶0̶x̶1̶0̶0̶"̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶}̶
          ],
        ...
        },
        "pmUnitVersion": {
          "productProductionState": 1,
          "productVersion": 2,
          "productSubVersion": 3
        }
      }
    ]
  },
    ...

Observe SCM_FPGA_INFO_ROM is not created anymore and 4 i2c adapters are created.

...
DataStore.cpp:171] Resolved / to versioned PmUnitConfig of SCM with version 1.2.3
PlatformExplorer.cpp:188] Exploring PmUnit SCM at /
PlatformExplorer.cpp:191] Exploring PCI Devices for PmUnit SCM at SlotPath /. Count 1
...
PciExplorer.cpp:444] Successfully created device SCM_I2C_MASTER_0 ...
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0@0 in SlotPath / to bus number 2 (i2c-2)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0[@1](http://cl/1) in SlotPath / to bus number 3 (i2c-3)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0[@2](http://cl/2) in SlotPath / to bus number 4 (i2c-4)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0[@3](http://cl/3) in SlotPath / to bus number 5 (i2c-5)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0[@4](http://cl/4) in SlotPath / to bus number 6 (i2c-6)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0[@5](http://cl/5) in SlotPath / to bus number 7 (i2c-7)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0[@6](http://cl/6) in SlotPath / to bus number 8 (i2c-8)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_0[@7](http://cl/7) in SlotPath / to bus number 9 (i2c-9)
PciExplorer.cpp:358] Creating device SCM_I2C_MASTER_1 ...
PciExplorer.cpp:444] Successfully created device SCM_I2C_MASTER_1 ...
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_1@0 in SlotPath / to bus number 10 (i2c-10)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_1[@1](http://cl/1) in SlotPath / to bus number 11 (i2c-11)
DataStore.cpp:34] Updating bus SCM_I2C_MASTER_1[@2](http://cl/2) in SlotPath / to bus number 12 (i2c-12)
...
PciExplorer.cpp:358] Creating device SCM_I2C_MASTER_2...
PciExplorer.cpp:444] Successfully created device SCM_I2C_MASTER_2...
...
PciExplorer.cpp:358] Creating device SCM_I2C_MASTER_3...
PciExplorer.cpp:444] Successfully created device SCM_I2C_MASTER_3...
...

ExplorationSummary.cpp:47] Explored MERU800BIA with 2 unexpected errors and 0 expected errors...
ExplorationSummary.cpp:54] =========== UNEXPECTED ERRORS ===========
ExplorationSummary.cpp:56] /[SCM_FPGA_INFO_ROM]: Failed to create symlink /run/devmap/fpgas/MERU_SCM_CPLD_INFO_ROM for DevicePath /[SCM_FPGA_INFO_ROM]. Reason: Couldn't find PciDeviceConfig for SCM_FPGA_INFO_ROM at /
ExplorationSummary.cpp:56] /[SCM_FPGA_INFO_ROM]: Failed to create symlink /run/devmap/inforoms/MERU_SCM_CPLD_INFO_ROM for DevicePath /[SCM_FPGA_INFO_ROM]. Reason: Couldn't find PciDeviceConfig for SCM_FPGA_INFO_ROM at /

Changing versioned SCM to a non-matching version

        "pmUnitVersion": {
 ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶p̶r̶o̶d̶u̶c̶t̶P̶r̶o̶d̶u̶c̶t̶i̶o̶n̶S̶t̶a̶t̶e̶"̶:̶ ̶1̶,̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶p̶r̶o̶d̶u̶c̶t̶S̶u̶b̶V̶e̶r̶s̶i̶o̶n̶"̶:̶ ̶3̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶p̶r̶o̶d̶u̶c̶t̶V̶e̶r̶s̶i̶o̶n̶"̶:̶ ̶2̶,̶
          "productProductionState": 4,
          "productVersion": 5,
          "productSubVersion": 6
        }

As expected, platform manager loads the default PmUnits

DataStore.cpp:182] Resolved / to default PmUnitConfig of SCM. No versioned config matches version 1.2.3

Testing on Typical SMB Configs

# weutil --eeprom smb
...
Product Production State: 1
Product Version: 2
Product Sub-Version: 3
...

Testing versioned pm unit (SMB) with modified configs. Add "TEST" suffix to pmUnitScopedNames

      "pciDeviceConfigs": [
        {
          ...
          "spiMasterConfigs": [
            {
              "fpgaIpBlockConfig": {
                "pmUnitScopedName": "SMB_SPI_MASTER0_TEST",
                ...
              },
              "spiDeviceConfigs": [
                {
                  "pmUnitScopedName": "SMB_SPI_MASTER0_DEVICE1_TEST",
                  ...
                }
              ]
            }
          ],
          "ledCtrlBlockConfigs": [
            {
              "pmUnitScopedNamePrefix": "OSFP_TEST",
              ...
            },
            {
              "pmUnitScopedNamePrefix": "QSFP_TEST",
              ...
            }
          ],
          "sysLedCtrlConfigs": [
            {
              "pmUnitScopedName": "SYSTEM_STATUS_LED_TEST",
              ...
            },
            ...
          ],
          "xcvrCtrlBlockConfigs": [
            {
              "pmUnitScopedNamePrefix": "OSFP_TEST",
              ...
            },
            {
              "pmUnitScopedNamePrefix": "QSFP_TEST",
              ...
            }
          ],
          ...
          "miscCtrlConfigs": [
 ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶{̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶"̶p̶m̶U̶n̶i̶t̶S̶c̶o̶p̶e̶d̶N̶a̶m̶e̶"̶:̶ ̶"̶S̶M̶B̶_̶A̶D̶C̶"̶,̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶.̶.̶.̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶}̶
          ]
        }
      ]

SMB_ADC doesn't get created as expected when the versioned pmunit is used.

...
PciExplorer.cpp:444] Successfully created device SYSTEM_STATUS_LED_TEST at ...
...
PciExplorer.cpp:444] Successfully created device OSFP_TEST_XCVR_CTRL_PORT_38 at
...
PciExplorer.cpp:444] Successfully created device SMB_SPI_MASTER0_TEST ...
PciExplorer.cpp:608] Completed initializing SpiDevice spi2006.0 as /dev/spidev2006.0 for SpiDevice SMB_SPI_MASTER0_DEVICE1_TEST
DataStore.cpp:100] Updating CharDevPath for /SMB_SLOT@0/[SMB_SPI_MASTER0_DEVICE1_TEST] to /dev/spidev2006.0
...
DataStore.cpp:171] Resolved /SMB_SLOT@0 to versioned PmUnitConfig of SMB with version 1.2.3
mmary.cpp:47] Explored MERU800BIA with 40 unexpected errors and 0 expected errors...
ExplorationSummary.cpp:54] =========== UNEXPECTED ERRORS ===========
ExplorationSummary.cpp:56] /SMB_SLOT@0/[OSFP_XCVR_CTRL_PORT_10]: Failed to create symlink /run/devmap/xcvrs/xcvr_ctrl_
ExplorationSummary.cpp:56] /SMB_SLOT@0/[OSFP_XCVR_CTRL_PORT_11]: Failed to create symlink /run/devmap/xcvrs/xcvr_ctrl_
...
ExplorationSummary.cpp:56] /SMB_SLOT@0/[SMB_SPI_MASTER0_DEVICE1]: Failed to create symlink /run/devmap/flashes/SMB_SPI_MASTER0_DEVICE1 for DevicePath /SMB_SLOT@0/[SMB_SPI_MASTER0_DEVICE1]
...

Hw & Sw Tests

Passed:
xgs_psamp_mod_test weutil_crc16_ccitt_test platform_helpers_platform_fs_utils_test platform_helpers_platform_utils_test platform_helpers_platform_name_lib_test async_logger_test transceiver_properties_manager_test rackmon_test weutil_fboss_eeprom_interface_test weutil_parser_utils_test platform_manager_data_store_test platform_manager_utils_test platform_manager_i2c_explorer_test platform_manager_cpld_manager_test platform_manager_pci_explorer_test platform_manager_device_path_resolver_test platform_manager_presence_checker_test thrift_node_tests thrift_cow_visitor_tests fboss2_framework_test fboss2_cmd_config_test fboss2_cmd_test fsdb_cgo_wrapper_test platform_manager_config_validator_test cross_config_validator_test platform_config_lib_config_lib_test runtime_config_builder_test platform_data_corral_sw_test fan_service_sw_test pci_device_check_test mac_address_check_test sensor_service_utils_test xcvr_lib_test build_from_xcvr_lib_test sensor_service_sw_test platform_manager_platform_explorer_test platform_manager_handler_test

platform_manager_hw_test ran against default & versioned PmUnit

platform_manager_hw_test
[       OK ] PlatformManagerHwTest.XcvrLedFiles (5306 ms)
[----------] 8 tests from PlatformManagerHwTest (39049 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (39049 ms total)
[  PASSED  ] 8 tests.
platform_hw_test
[       OK ] PlatformHwTest.PCIDevicesPresent (34 ms)
[----------] 2 tests from PlatformHwTest (512 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (512 ms total)
[  PASSED  ] 2 tests.
sensor_service_hw_test
[       OK ] SensorServiceHwTest.CheckAllSensors (93 ms)
[----------] 6 tests from SensorServiceHwTest (1728 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 1 test suite ran. (1728 ms total)
[  PASSED  ] 6 tests.
data_corral_service_hw_test
[       OK ] DataCorralServiceHwTest.getUncachedFruid (450 ms)
[----------] 4 tests from DataCorralServiceHwTest (910 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (910 ms total)
[  PASSED  ] 4 tests.
weutil_hw_test
[       OK ] WeutilTest.getInfoJson (893 ms)
[----------] 4 tests from WeutilTest (1788 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (1788 ms total)
[  PASSED  ] 4 tests.
@somasun

somasun commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Same testing comment as #1136. I am also waiting for your response on #1138 before taking this PR in.

@aalamsi22 aalamsi22 force-pushed the versionedPciDeviceConfigsConfigs branch from e6910da to 45591a4 Compare June 8, 2026 20:25
@somasun

somasun commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@lohithchittineni - can you please verify how this works with all the validation for the BlockConfigs you added.

@meta-codesync

meta-codesync Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@lohithchittineni has imported this pull request. If you are a Meta employee, you can view this in D109851161.

@lohithchittineni lohithchittineni left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds pciDeviceConfigs to the fields allowed to differ across versioned PmUnits. Since ledCtrlBlockConfigs/xcvrCtrlBlockConfigs are nested inside pciDeviceConfigs, a version can now ship an under-covering set of LED/xcvr ctrl blocks.

That breaks the coverage validators: isValidLedCtrlBlockXcvrCoverage and isValidXcvrCtrlBlockXcvrCoverage only iterate *config.pmUnitConfigs() and never look at versionedPmUnitConfigs(). Previously safe (versioned PCI had to match default), but now a version that drops an LED/xcvr block leaves xcvrs uncovered and still passes validation, surfacing only at runtime.

numXcvrs is a single version-invariant PlatformConfig field, so every version's coverage target is the same *config.numXcvrs(); only the resolved block set varies.

Suggested fix: validate each constructed config independently, not a merged set. The existing default-only coverage check stays. In addition, for each versionedPmUnitConfig, build the config the runtime would actually use (the default pmUnitConfigs map, but with the overridden PmUnit's pciDeviceConfigs swapped in for its default), and re-run the [1, numXcvrs] coverage check against that. Each version must independently cover all xcvrs; don't union default and versioned blocks together, since that could hide a version that drops coverage. Please also add ConfigValidator tests that exercise these versioned configs, including a case where a version under-covers xcvrs and is expected to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants