Skip to content

Support multi-EFA instances with public IPs#3865

Merged
r4victor merged 4 commits into
masterfrom
pr_efa_public_ips
May 8, 2026
Merged

Support multi-EFA instances with public IPs#3865
r4victor merged 4 commits into
masterfrom
pr_efa_public_ips

Conversation

@r4victor

@r4victor r4victor commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Support launching AWS instances with multiple EFA interfaces and public IPs. Previously, multi-EFA instances required public_ips: False because AWS can't automatically assign a public IP if an instance has multiple network interfaces. This limitation is dropped by explicitly allocating and assigning/releasing public IPs instead of relying on IP auto-assign.

Tested launching p4d.24xlarge in eu-north-1: all EFA interfaces configured and public IP assigned. Also tested the same setup with public_ips: false for regressions.

@r4victor

r4victor commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

Ran NCCL tests on 2x p4d.24xlarge with public IPs. The results as the same as with public_ips: False.

           8             2     float     sum      -1   182.04    0.00    0.00       0   181.35    0.00    0.00       0
          16             4     float     sum      -1   179.80    0.00    0.00       0   176.53    0.00    0.00       0
          32             8     float     sum      -1   176.78    0.00    0.00       0   176.18    0.00    0.00       0
          64            16     float     sum      -1   176.98    0.00    0.00       0   175.23    0.00    0.00       0
         128            32     float     sum      -1   176.00    0.00    0.00       0   180.10    0.00    0.00       0
         256            64     float     sum      -1   176.22    0.00    0.00       0   178.06    0.00    0.00       0
         512           128     float     sum      -1   180.12    0.00    0.01       0   179.21    0.00    0.01       0
        1024           256     float     sum      -1   177.83    0.01    0.01       0   178.19    0.01    0.01       0
        2048           512     float     sum      -1   183.32    0.01    0.02       0   183.40    0.01    0.02       0
        4096          1024     float     sum      -1   187.05    0.02    0.04       0   182.93    0.02    0.04       0
        8192          2048     float     sum      -1   188.79    0.04    0.08       0   189.22    0.04    0.08       0
       16384          4096     float     sum      -1   202.46    0.08    0.15       0   200.69    0.08    0.15       0
       32768          8192     float     sum      -1   231.63    0.14    0.27       0   230.66    0.14    0.27       0
       65536         16384     float     sum      -1   239.24    0.27    0.51       0   234.09    0.28    0.52       0
      131072         32768     float     sum      -1   237.73    0.55    1.03       0   238.19    0.55    1.03       0
      262144         65536     float     sum      -1   253.59    1.03    1.94       0   254.91    1.03    1.93       0
      524288        131072     float     sum      -1   308.22    1.70    3.19       0   314.00    1.67    3.13       0
     1048576        262144     float     sum      -1   402.68    2.60    4.88       0   404.50    2.59    4.86       0
     2097152        524288     float     sum      -1   583.80    3.59    6.74       0   583.21    3.60    6.74       0
     4194304       1048576     float     sum      -1   859.03    4.88    9.15       0   863.63    4.86    9.11       0
     8388608       2097152     float     sum      -1   987.56    8.49   15.93       0   982.72    8.54   16.01       0
    16777216       4194304     float     sum      -1  1180.87   14.21   26.64       0  1181.65   14.20   26.62       0
    33554432       8388608     float     sum      -1  1514.40   22.16   41.54       0  1523.52   22.02   41.30       0
    67108864      16777216     float     sum      -1  2362.20   28.41   53.27       0  2347.93   28.58   53.59       0
   134217728      33554432     float     sum      -1  3995.74   33.59   62.98       0  4014.00   33.44   62.70       0
   268435456      67108864     float     sum      -1  7172.96   37.42   70.17       0  7125.56   37.67   70.64       0
   536870912     134217728     float     sum      -1  13368.9   40.16   75.30       0  13333.1   40.27   75.50       0
  1073741824     268435456     float     sum      -1  25979.1   41.33   77.50       0  25928.6   41.41   77.65       0
  2147483648     536870912     float     sum      -1  50919.0   42.17   79.08       0  50898.8   42.19   79.11       0
  4294967296    1073741824     float     sum      -1   101120   42.47   79.64       0   101064   42.50   79.68       0
  8589934592    2147483648     float     sum      -1   201525   42.62   79.92       0   201274   42.68   80.02       0
ip-172-31-5-190:163:274 [0] NCCL INFO comm 0x5980b4c3e8b0 rank 0 nranks 16 cudaDev 0 busId 101c0 - Destroy COMPLETE
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 22.2693
@r4victor r4victor merged commit e86c432 into master May 8, 2026
25 checks passed
@r4victor r4victor deleted the pr_efa_public_ips branch May 8, 2026 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant