Skip to content

[ARK] Support gemm using sycl-tla#1968

Draft
Zhenzhong1 wants to merge 5 commits into
mainfrom
zhenzhong/sycltla-gemm
Draft

[ARK] Support gemm using sycl-tla#1968
Zhenzhong1 wants to merge 5 commits into
mainfrom
zhenzhong/sycltla-gemm

Conversation

@Zhenzhong1

@Zhenzhong1 Zhenzhong1 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Perf:

==========================================================================================
dtype = torch.float16
==========================================================================================
m=   1 k=4096 n=4096 dt= torch.float16  match=True  max_diff=0.007812
  oneDNN        :    0.197 ms     0.170 TFLOPS
  matmul_sycl_tla:    0.196 ms     0.171 TFLOPS   speedup= 1.01x

m=   8 k=4096 n=4096 dt= torch.float16  match=True  max_diff=0.007812
  oneDNN        :    0.083 ms     3.252 TFLOPS
  matmul_sycl_tla:    0.193 ms     1.390 TFLOPS   speedup= 0.43x

m=  16 k=4096 n=4096 dt= torch.float16  match=True  max_diff=0.007812
  oneDNN        :    0.082 ms     6.581 TFLOPS
  matmul_sycl_tla:    0.189 ms     2.834 TFLOPS   speedup= 0.43x

m=  32 k=4096 n=4096 dt= torch.float16  match=True  max_diff=0.000000
  oneDNN        :    0.084 ms    12.739 TFLOPS
  matmul_sycl_tla:    0.185 ms     5.815 TFLOPS   speedup= 0.46x

m= 128 k=4096 n=4096 dt= torch.float16  match=True  max_diff=0.015625
  oneDNN        :    0.145 ms    29.631 TFLOPS
  matmul_sycl_tla:    0.225 ms    19.056 TFLOPS   speedup= 0.64x

m=1024 k=4096 n=4096 dt= torch.float16  match=True  max_diff=0.007812
  oneDNN        :    0.409 ms    84.034 TFLOPS
  matmul_sycl_tla:    0.592 ms    58.048 TFLOPS   speedup= 0.69x

==========================================================================================
dtype = torch.bfloat16
==========================================================================================
m=   1 k=4096 n=4096 dt=torch.bfloat16  match=True  max_diff=0.000000
  oneDNN        :    0.143 ms     0.234 TFLOPS
  matmul_sycl_tla:    0.250 ms     0.134 TFLOPS   speedup= 0.57x

m=   8 k=4096 n=4096 dt=torch.bfloat16  match=True  max_diff=0.000000
  oneDNN        :    0.103 ms     2.606 TFLOPS
  matmul_sycl_tla:    0.247 ms     1.087 TFLOPS   speedup= 0.42x

m=  16 k=4096 n=4096 dt=torch.bfloat16  match=True  max_diff=0.000000
  oneDNN        :    0.091 ms     5.873 TFLOPS
  matmul_sycl_tla:    0.227 ms     2.360 TFLOPS   speedup= 0.40x

m=  32 k=4096 n=4096 dt=torch.bfloat16  match=True  max_diff=0.000000
  oneDNN        :    0.085 ms    12.633 TFLOPS
  matmul_sycl_tla:    0.198 ms     5.414 TFLOPS   speedup= 0.43x

m= 128 k=4096 n=4096 dt=torch.bfloat16  match=True  max_diff=0.000000
  oneDNN        :    0.121 ms    35.589 TFLOPS
  matmul_sycl_tla:    0.179 ms    23.956 TFLOPS   speedup= 0.67x

m=1024 k=4096 n=4096 dt=torch.bfloat16  match=True  max_diff=0.000000
  oneDNN        :    0.407 ms    84.480 TFLOPS
  matmul_sycl_tla:    0.592 ms    58.017 TFLOPS   speedup= 0.69x
Zhenzhong1 and others added 5 commits June 30, 2026 09:43
Signed-off-by: Zhenzhong1 <zhenzhong.xu@intel.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zhenzhong1 <zhenzhong.xu@intel.com>
Signed-off-by: Zhenzhong1 <zhenzhong.xu@intel.com>
Signed-off-by: Zhenzhong1 <zhenzhong.xu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant