Skip to content

[PlatformManager] Symlink creation to adapt to versioned devices#1138

Open
aalamsi22 wants to merge 1 commit into
facebook:mainfrom
aalamsi22:versionedSymlinkCreation
Open

[PlatformManager] Symlink creation to adapt to versioned devices#1138
aalamsi22 wants to merge 1 commit into
facebook:mainfrom
aalamsi22:versionedSymlinkCreation

Conversation

@aalamsi22

Copy link
Copy Markdown
Contributor

Summary

  • Add the ability to create symlinks to devices that only exist in versioned pmunits.
  • Avoid reporting "unexpected errors" on symlinks created by default pmunit (or other versioned pmUnits) when matching to a versioned PmUnit.
  • Testing (SW & HW) to account for the above two changes.

Depends on the following in order:
[PlatformManager] allow pciDeviceConfigs diff across versioned PmUnits #1137
[PlatformManager] allow embeddedSensorConfigs diff across versioned PmUnits #1136
[PlatformManager] pmUnitVersion support for versionedPmUnitConfigs #1065

Testing

Testing on Versioned PmUnit

When a symlink points to a device that doesn't exist at all

FAILS BUILD
ConfigValidator.cpp:1043] Validating Symbolic links...
ConfigValidator.cpp:857] Invalid DeviceName DOES_NOT_EXIST_ANYWHERE at SlotPath /
FAILS RUNNING PLATFORM_MANAGER
ConfigValidator.cpp:1043] Validating Symbolic links...
ConfigValidator.cpp:857] Invalid DeviceName DOES_NOT_EXIST_ANYWHERE at SlotPath /

Matching to a versioned PmUnit

# weutil scm
...
Product Production State: 1
Product Version: 2
Product Sub-Version: 3
...

DataStore.cpp:171] Resolved / to versioned PmUnitConfig of SCM with version 1.2.3

Where devices

  • /[DOES_NOT_EXIST_IN_DEFAULT] only exists in the versioned SCM v1.2.3
  • /[ONLY_EXISTS_IN_DEFAULT] only exists in default SCM pmUnit

As expected, ONLY_EXISTS_IN_DEFAULT does not exist. Nor is the symlink created

# ls /run/devmap/sensors/ONLY_EXISTS_IN_DEFAULT
ls: cannot access '/run/devmap/sensors/ONLY_EXISTS_IN_DEFAULT': No such file or directory

And DOES_NOT_EXIST_IN_DEFAULT device and symlink are created

ls /run/devmap/sensors/DOES_NOT_EXIST_IN_DEFAULT
curr1_input  curr1_max        device  ...

Without the changes in this PR, build/platform_manager crashes on /[DOES_NOT_EXIST_IN_DEFAULT]

FAILS BUILD
ConfigValidator.cpp:857] Invalid DeviceName DOES_NOT_EXIST_IN_DEFAULT at SlotPath /

FAILS PLATFORM MANAGER
ConfigValidator.cpp:857] Invalid DeviceName DOES_NOT_EXIST_IN_DEFAULT at SlotPath /

And if /[DOES_NOT_EXIST_IN_DEFAULT] is removed, we still get "unexpected errors" on /[ONLY_EXISTS_IN_DEFAULT] when loading default pmunit.

ExplorationSummary.cpp:47] Explored ... with 1 unexpected errors and 0 expected errors...
ExplorationSummary.cpp:54] =========== UNEXPECTED ERRORS ===========
ExplorationSummary.cpp:56] /[ONLY_EXISTS_IN_DEFAULT]: Failed to create symlink /run/devmap/sensors/ONLY_EXISTS_IN_DEFAULT for DevicePath /[ONLY_EXISTS_IN_DEFAULT]. Reason: Could not find SysfsPath for /[ONLY_EXISTS_IN_DEFAULT]

How about loading default PmUnit? (Non-matching versioned PmUnit)

Resolved / to default PmUnitConfig of SCM. No versioned config matches version 4.5.6
...

# ls /run/devmap/sensors/ONLY_EXISTS_IN_DEFAULT
curr1_input  curr1_max        device ...

# ls /run/devmap/sensors/DOES_NOT_EXIST_IN_DEFAULT
ls: cannot access '/run/devmap/sensors/DOES_NOT_EXIST_IN_DEFAULT': No such file or directory

Hw & Sw Tests

Passed:
xgs_psamp_mod_test weutil_crc16_ccitt_test platform_helpers_platform_fs_utils_test platform_helpers_platform_utils_test platform_helpers_platform_name_lib_test async_logger_test transceiver_properties_manager_test rackmon_test weutil_fboss_eeprom_interface_test weutil_parser_utils_test platform_manager_data_store_test platform_manager_utils_test platform_manager_i2c_explorer_test platform_manager_cpld_manager_test platform_manager_pci_explorer_test platform_manager_device_path_resolver_test platform_manager_presence_checker_test thrift_node_tests thrift_cow_visitor_tests fboss2_framework_test fboss2_cmd_config_test fboss2_cmd_test fsdb_cgo_wrapper_test platform_manager_config_validator_test cross_config_validator_test platform_config_lib_config_lib_test runtime_config_builder_test platform_data_corral_sw_test fan_service_sw_test pci_device_check_test mac_address_check_test sensor_service_utils_test xcvr_lib_test build_from_xcvr_lib_test sensor_service_sw_test platform_manager_platform_explorer_test platform_manager_handler_test

platform_manager_hw_test ran against default & versioned PmUnit

platform_manager_hw_test
[       OK ] PlatformManagerHwTest.XcvrLedFiles (5306 ms)
[----------] 8 tests from PlatformManagerHwTest (39049 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (39049 ms total)
[  PASSED  ] 8 tests.
platform_hw_test
[       OK ] PlatformHwTest.PCIDevicesPresent (34 ms)
[----------] 2 tests from PlatformHwTest (512 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (512 ms total)
[  PASSED  ] 2 tests.
sensor_service_hw_test
[       OK ] SensorServiceHwTest.CheckAllSensors (93 ms)
[----------] 6 tests from SensorServiceHwTest (1728 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 1 test suite ran. (1728 ms total)
[  PASSED  ] 6 tests.
data_corral_service_hw_test
[       OK ] DataCorralServiceHwTest.getUncachedFruid (450 ms)
[----------] 4 tests from DataCorralServiceHwTest (910 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (910 ms total)
[  PASSED  ] 4 tests.
weutil_hw_test
[       OK ] WeutilTest.getInfoJson (893 ms)
[----------] 4 tests from WeutilTest (1788 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (1788 ms total)
[  PASSED  ] 4 tests.
@aalamsi22 aalamsi22 requested a review from a team as a code owner April 29, 2026 01:50
@meta-cla meta-cla Bot added the CLA Signed label Apr 29, 2026
@aalamsi22 aalamsi22 changed the title [WIP][PlatformManager] Symlink creation to adapt to versioned devices Jun 3, 2026

@somasun somasun left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far our versioning has been such that all symlinks work on all versions of hardware. The device which gets pointed to could differ based on the version (eg /run/devmap/sensors/A_SENSOR/temp_input could be a different i2cdevice of platform variant x compared variant y), but the symlinks exist on all versions of hardware.

According to this PR, we could end with different set of symlinks based on the version of the hardware. For example, /run/devmap/sensors/A_SENSOR/temp_input exists only on variant x, and does not exist on variant y. This breaks some assumptions. For example, fan_service of fw_util does not bother checking versions of the hardware before trying to resolve a symlink. Even if the device symlink which you plan to support based on the PR, will not be referenced from fan/sensor/fw_util or any downstream consumer, there is no guarantee that this feature will not allow creation of such device symlinks in the future. If our canary/release testing has only variant x hardware (and not variant y), we will miss such issues until it lands in production and causes issues.

I suggest retaining the exact same symlinks between all variants of any particular hardware.

@aalamsi22

Copy link
Copy Markdown
Contributor Author

So far our versioning has been such that all symlinks work on all versions of hardware. The device which gets pointed to could differ based on the version (eg /run/devmap/sensors/A_SENSOR/temp_input could be a different i2cdevice of platform variant x compared variant y), but the symlinks exist on all versions of hardware.

According to this PR, we could end with different set of symlinks based on the version of the hardware. For example, /run/devmap/sensors/A_SENSOR/temp_input exists only on variant x, and does not exist on variant y. This breaks some assumptions. For example, fan_service of fw_util does not bother checking versions of the hardware before trying to resolve a symlink. Even if the device symlink which you plan to support based on the PR, will not be referenced from fan/sensor/fw_util or any downstream consumer, there is no guarantee that this feature will not allow creation of such device symlinks in the future. If our canary/release testing has only variant x hardware (and not variant y), we will miss such issues until it lands in production and causes issues.

I suggest retaining the exact same symlinks between all variants of any particular hardware.

Fair enough, I'm not opposed to maintaining the same symlinks. However there are a few challenges that still remain with that and they need to be addressed some other way:

  • Any device that a versioned pmUnit creates but the default pmUnit doesn't can NOT have a symlink.
  • Any device that a default pmUnit creates but a versioned pmUnit doesn't will result in a symlink creation error and platform_manager "UNEXPECTED ERRORS" warning.

The second point can be addressed by acknowledging the errors as “Expected”. The first point is challenging because it makes hwmon devices harder to reference by other services, e.g. a versioned sensor config accessing devices created by a versioned platform_manager config. We can work around this by always setting the default PMUnit as the latest revision and accept that any devices that are removed in later revisions won’t have symlinks created on older revisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants