The fabricated example below demonstrates a bizarre case I ran into. (If it matters, this is with Mma 14.1 on Apple ARM, but I see the same behavior with Mma 12.3.1 on Intel x86.)
On a whim, I tried replacing a matrix-like collection of memoized downvalues
n = 10000; sn = Floor[Sqrt[n]];
First@AbsoluteTiming[
Do[ m[i,j] = N[ (i+j)/n ], {i,0,n}, {j,0,sn} ];
]
First@AbsoluteTiming@
Table[ Product[ m[i,k] + m[n-i,k], {i,0,n-k^2} ], {k,0,sn} ]
0.982767
0.601142
with a 2-dimensional array:
First@AbsoluteTiming[
t1 = ConstantArray[ N[0], {1+n, 1+sn} ];
Do[ t1[[1+i,1+j]] = N[ (i+j)/n ], {i,0,n}, {j,0,sn}];
]
First@AbsoluteTiming@
Table[ Product[ t1[[1+i,1+k]] + t1[[1+n-i,1+k]], {i,0,n-k^2} ], {k,0,sn} ]
I did this under the assumption packed contiguously-stored arrays are inherently faster and require less memory, furthermore their use sets the stage for other types of subsequent improvements involving list-related built-ins (e.g., Table instead of ConstantArray/Do).
Indeed, with just the replacement shown above, the construction phase is already faster, but I was surprised by the dramatic slowdown in the calculation phase:
0.77669
15.733 (?!?)
Transposing the storage doesn't seem to matter:
First@AbsoluteTiming[
t2 = ConstantArray[N[0], {1+sn, 1+n}];
Do[ t2[[1+j,1+i]] = N[(i+j)/n], {j,0,sn}, {i,0,n}];
]
First@AbsoluteTiming@
Table[ Product[ t2[[1+k,1+i]] + t2[[1+k,1+n-i]], {i,0,n-k^2} ], {k,0,sn} ]
0.731359
15.5293
Since the calculation phase (in the transposed version) works with one row at a time, I can extract each row to then use 1-dimensional indexing:
First@AbsoluteTiming@
Table[ With[ {v = t2[[1+k]]}, Product[ v[[1+i]] + v[[1+n-i]], {i,0,n-k^2} ]], {k,0,sn} ]
This gets me back to the speed seen with memoized downvalues.
0.564264
Question: Why is 2-dimensional array access so slow compared to “2-dimensional” downvalue lookup, or compared to row-extraction followed by 1-dimensional array access?
Producthere. When I replaceProductbyTimes @@ Tablein the matrix-access version, things get a lot faster (but of course not as fast as the fasterProductversion as some array have to be created and filled first.) I think we cannot do anything about it. Please report this to Wolfram Support. I am pretty sure that this is not intended behavior. $\endgroup$ProductwithTimes @@ Tablealso gives me identical results and indeed eliminates the dramatic slowdown. Thanks for the insight! The slowdown apparently depends on the context in which matrix access is occurring, and thus has more to do withProductthan with matrix access! And not justProduct. I saw the same type of matrix-access slowdown in a different calculation involvingSum. I'll report to Wolfram Support and update here. $\endgroup$Product:First@AbsoluteTiming[Table[Tr[t1[[1 ;; 1 + n - k^2, 1 + k]] + t1[[1 + n ;; 1 + k^2 ;; -1, 1 + k]], Times], {k, 0, sn}]]$\endgroup$