Laser physics simulation

Question

This is for a computation for laser physics. The speed is quite slow, which seems to be delayed a lot by the three nested for loops with indices j,k, and l (with comments below). And the computation time seems to be proportional to the fourth power of nx, but I want to increase nx more to ~100.

I also guess the 7 indices Numpy Array S drags the computation a lot. It would be better if I can remove the three nested for loops. However, that seems to be impossible. Are there any ways to improve speed?

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import matplotlib as mpl
import math
from scipy import linalg, optimize
from scipy.integrate import quad
import scipy.integrate as integrate
import scipy.special as special
from matplotlib.colors import LogNorm
import time
from datetime import datetime
import copy

nz=70
nx=3
ny=nx
ntau=70
gamsp = 1/(160.*10.**-12)
Rincrease=10**0
R = Rincrease*400*10.**-9
nnu = 1.6*10.**25*Rincrease**-2
c = 3.*10.**8
n = nnu*math.pi*R**2
lambda0= 12.4/0.88*10**-10
delo = 4.*10.**-6
tau_max= 2./gamsp
z_max=3.*10**-4
L=z_max
dtau = tau_max/ntau
dx=2*R/nx
dy=dx
dz = z_max/nz
Omega=2*math.pi*c/lambda0
alpha=np.arctan(R/L)

S=np.empty((nx,ny,nz,nx,ny,nz,ntau),dtype='complex_')
S[:,:,:,:,:,:,0]=0
rho=np.empty((nx,ny,nz,ntau),dtype='complex_')


for i in range(nx):
    for j in range(ny):
        rho[i,j,:,0]=1 if math.sqrt((i-(nx-1)/2)**2+(j-(ny-1)/2)**2)<nx/2 else 0  
Intensitytrans=np.zeros((nx,ny,nz,ntau),dtype='complex_')
Intensity=np.zeros((nz,ntau),dtype='complex_')
Intensityn=np.zeros((nz,ntau),dtype='complex_')
photonn=np.zeros((nz),dtype='complex_')


g=np.zeros((2*(nx-1)+1,2*(ny-1)+1,2*(nz-1)+1),dtype ='complex_')
g2=np.zeros((2*(nx-1)+1,2*(ny-1)+1,2*(nz-1)+1),dtype ='complex_')

for i in range (2*(nx-1)+1): # store numerical values in g
    if i>nx-1:
        g[i]=copy.copy(np.conjugate(g[2*nx-2-i]))
    else:
        for j in range (2*(ny-1)+1):
            if j>ny-1:
                g[i,j]=copy.copy(np.conjugate(g[i,2*ny-2-j]))
            else:
                for k in range (2*(nz-1)+1):
                    if k>nz-1:
                        g[i,j,k]=copy.copy(np.conjugate(g[i,j,2*nz-2-k]))
                    else:
                        xvalue=copy.copy(-2*R+4*R*i/(2*(nx-1)))
                        yvalue=copy.copy(-2*R+4*R*j/(2*(nx-1)))
                        radius=copy.copy(math.sqrt(xvalue**2+yvalue**2))
                        zvalue=copy.copy(-L+2*L*k/(2*(nz-1)))
                        Funcre = "2*math.pi*np.sin(x)*(np.cos(x))**2*np.cos(Omega/c*({})*(np.cos(x)-1))*special.jv(0,Omega/c*{}*math.sin(x))".format(zvalue,radius)
                        Funcim = "2*math.pi*np.sin(x)*(np.cos(x))**2*np.sin(Omega/c*({})*(np.cos(x)-1))*special.jv(0,Omega/c*{}*math.sin(x))".format(zvalue,radius)
                        def fre(x):
                            fre=eval(Funcre)
                            return fre
                        def fim(x):
                            fim=eval(Funcim)
                            return fim
                        g[i,j,k]=quad(fre,0,alpha)[0]+1j*quad(fim,0,alpha)[0]

for i in range(2*(nz-1)+1):
    g2[:,:,i]=copy.copy(g[:,:,i])*(0 if i>nz else 1)
 

for i in range(ntau):
    i1=np.zeros((nx,ny,nz),dtype='complex_')
    i2=np.zeros((nx,ny,nz),dtype='complex_')
    if i<ntau-1:
        snoise1=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
        snoise2=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
        snoise=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
        s1=np.zeros((nx,ny,nz),dtype='complex_')
        s2=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
        s3=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
    for j in range(nx):
        for k in range(ny):
            for l in range(nz):  # THESE LOOPS MAY SLOW THE COMPUTATION
                gc=copy.copy(g[(nx-1)-j:2*(nx-1)+1-j,(ny-1)-k:2*(ny-1)+1-k,(nz-1)-l:2*(nz-1)+1-l])
                g2c=copy.copy(g2[(nx-1)-j:2*(nx-1)+1-j,(ny-1)-k:2*(ny-1)+1-k,(nz-1)-l:2*(nz-1)+1-l])
                i1[j,k,l]=np.sum(S[:,:,:,:,:,:,i]*np.tensordot(np.conjugate(g2c),g2c,axes=0))*(dx*dy*dz)**2
                i2[j,k,l]=np.sum((1+rho[:,:,:,i])/2*np.absolute(g2c)**2)*dx*dy*dz
                if i<ntau-1:
                     s1+=S[:,:,:,j,k,l,i]*np.conjugate(g2c)*dx*dy*dz
                     s2+=np.tensordot(np.conjugate(g2c)*rho[:,:,:,i],S[j,k,l,:,:,:,i],axes = 0)*dx*dy*dz
                     s3+=np.tensordot(S[:,:,:,j,k,l,i],g2c*rho[:,:,:,i],axes = 0)*dx*dy*dz
                     snoise1[:,:,:,j,k,l]=rho[:,:,:,i]*(1+rho[j,k,l,i])/2*np.conjugate(gc)
                     snoise2[j,k,l,:,:,:]=rho[j,k,l,i]*(1+rho[:,:,:,i])/2*np.conjugate(gc)
    Intensitytrans[:,:,:,i]=3/(32*math.pi*lambda0**2*delo)*gamsp*(nnu**2*i1+nnu*i2)
    if i<ntau-1:
        for j in range(nz):
            for k in range(nz):
                snoise[:,:,j,:,:,k]=snoise1[:,:,j,:,:,k] if j>k else snoise2[:,:,j,:,:,k]
        rho[:,:,:,i+1]=rho[:,:,:,i]+gamsp*(-2*(1+rho[:,:,:,i])/2-3*nnu/(8*math.pi)*2*np.real(s1))*dtau
        S[:,:,:,:,:,:,i+1]=S[:,:,:,:,:,:,i]+gamsp*(-S[:,:,:,:,:,:,i]+3/(16*math.pi)*(nnu*(s2+s3)+snoise))*dtau      
    

                
    
for i in range(ntau):
    for jz in range (nz):
        Intensity[jz,i]=np.sum(Intensitytrans[:,:,jz,i])/(nx*ny)
                
            

peak=np.empty((nz,2))
for i in range(nz):
    photonn[i]=np.sum(Intensity[i,:])/ntau*tau_max*delo*math.pi*R**2
    peak[i,0]=np.argmax(Intensity[i,:])/ntau
    peak[i,1]=i/nz
    if max(Intensity[i,:])!=0:
        Intensityn[i,:]=Intensity[i,:]/max(Intensity[i,:])

Why are your fre and fim functions redefined in a nested loop? As an eval, no less. Looping that much and that deep while you're working with numpy is probably a red flag by itself, but I'm quite confused what made you go this particular route for fre and fim. Have you profiled any of this? — Mast
– Mast ♦, Commented Feb 8, 2022 at 15:25
I intended to store numeric values for the array of g first, then use the value in the main loop where the comment is — user16308
– user16308, Commented Feb 8, 2022 at 15:32
I don't think the loops containing eval increase the computation time that much — user16308
– user16308, Commented Feb 8, 2022 at 15:37
The only output I need is 'Intensityn', 'peak', and 'photonn'. Any other intermediate values can be discarded! — user16308
– user16308, Commented Feb 8, 2022 at 18:01

spyr03 · Accepted Answer · 2022-02-08 18:34:50Z

The speed is quite slow, which seems to be delayed a lot by the three nested for loops with indices j,k, and l

This is a good start to finding out which part of the code is causing a bottleneck. Guessing which part is slow can be good for building up an intuition, so let's test the code and see what the slowdown is.

We can measure the performance of the code in a few ways. Since I have no understanding of the code, I would begin by throwing a general profiler at it. Python comes with a profiler called cProfile. Another profiler I find useful is pyinstrument.

Running the code through cProfile python -m cProfile -s time cr_laser_physics.py and killing it after a minute gave me a profile. The top of it looks like this

801051 function calls (786162 primitive calls) in 62.441 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   45.036   45.036   62.804   62.804 cr_laser_physics.py:1(<module>)
13081/8182   13.425    0.001   15.052    0.002 {built-in method numpy.core._multiarray_umath.implement_array_function}
    26476    1.239    0.000    1.612    0.000 {built-in method builtins.eval}
    12644    1.195    0.000    1.195    0.000 {method 'reduce' of 'numpy.ufunc' objects}
     4719    0.226    0.000   13.868    0.003 numeric.py:943(tensordot)
  104/102    0.201    0.002    0.203    0.002 {built-in method _imp.create_dynamic}
      448    0.073    0.000    0.073    0.000 {method 'read' of '_io.FileIO' objects}
      448    0.072    0.000    0.072    0.000 {built-in method marshal.loads}
    14217    0.057    0.000    0.057    0.000 {method 'reshape' of 'numpy.ndarray' objects}
        1    0.034    0.034    0.034    0.034 {built-in method mkl._py_mkl_service.get_version}
...

Unfortunately, this doesn't look all that helpful. Since the majority of the time is spent in the module (45 seconds of it) and the remainder is a numpy method that we aren't calling directly, we haven't learned much. We can definitely say that the code overall is slow.

A google search lead me to this question. The problem they are having looks similar to ours, cProfile isn't being specific about where the performance problem is. The suggested answer is to split the code into more functions, so that cProfile can say which ones are slow.

So this looks like a good time to go through the code, and make whatever improvements we can. Some nice things to do would be

Remove unused imports. They add a small bit of noise to the profiler. Pylance suggests that most of the imports are not needed. You can always add them back later when needed. The special import is falsely marked as unused, because Pylance cannot tell that the strings of code will be run here. This is a good indicator that the code is doing something in a very strange way. Can this be simplified?
Put things into functions. While every function call will have an overhead, we need it so that cProfile can profile well.
Split long lines into smaller chunks. This will make it easier to see what parts of the code depend on what.

I won't implement all of these suggestions, but I'll try and outline what the file contents will look like.

# Imports
import copy
import math

import numpy as np
import scipy.special as special

from scipy.integrate import quad

# Global constants. (The usual convention is to make these all uppercase).
nz = 70
ny = nx = 3
ntau = 70
...

def main():
    S=np.empty((nx,ny,nz,nx,ny,nz,ntau),dtype='complex_')
    S[:,:,:,:,:,:,0]=0
    rho=np.empty((nx,ny,nz,ntau),dtype='complex_')
    ...


if __name__ == "__main__":
    main()

From this point, I would pick at the contents of the main method until it is split into as many functions as make sense. The hard part here is tracking which variables depend on which other variables.

Doing so, I end up with a main method that looks something like

def main():
    rho = make_rho()
    g = make_g()
    g2 = make_g2(g)

    Intensitytrans = big_loop(g, g2, rho)
    Intensity = make_intensity(Intensitytrans)

    Intensityn, photonn = smaller_loop(Intensity)

Here I've kept the variable the same, but moved as much code out into functions as I could. Each function follows a similar pattern of 'create variables -> loop to fill them with values -> return the variables referenced later'. As an example, here is make_g2.

def make_g2(g):
    g2=np.zeros((2*(nx-1)+1,2*(ny-1)+1,2*(nz-1)+1),dtype ='complex_')
    for i in range(2*(nz-1)+1):
        g2[:,:,i]=copy.copy(g[:,:,i])*(0 if i>nz else 1)
    return g2

Running the profiler on the code now gives: (cut to just show the 5 most expensive functions)

516776 function calls (504160 primitive calls) in 60.096 seconds

Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   44.271   44.271   59.010   59.010 cr_laser_physics.py:75(big_loop)
19107/11944   12.793    0.001   14.640    0.001 {built-in method numpy.core._multiarray_umath.implement_array_function}
    19102    1.385    0.000    1.385    0.000 {method 'reduce' of 'numpy.ufunc' objects}
    26460    0.803    0.000    1.060    0.000 {built-in method builtins.eval}
     7163    0.236    0.000   13.265    0.002 numeric.py:943(tensordot)

This tells us two pieces of information. You were right that most of the time (45 seconds when the code is run for 60 seconds). is spent in the heavily-nested loops. The other functions aren't noticably slow when compared to those loops, so we don't need to worry about them now.

The slow function looks like this

@profile
def big_loop(g, g2, rho):
    # This has a lot going on, can it be simplified?
    S=np.empty((nx,ny,nz,nx,ny,nz,ntau),dtype='complex_')
    S[:,:,:,:,:,:,0]=0

    Intensitytrans=np.zeros((nx,ny,nz,ntau),dtype='complex_')

    for i in range(ntau):
        i1=np.zeros((nx,ny,nz),dtype='complex_')
        i2=np.zeros((nx,ny,nz),dtype='complex_')
        if i<ntau-1:
            snoise1=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
            snoise2=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
            snoise=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
            s1=np.zeros((nx,ny,nz),dtype='complex_')
            s2=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
            s3=np.zeros((nx,ny,nz,nx,ny,nz),dtype='complex_')
        for j in range(nx):
            for k in range(ny):
                for l in range(nz):  # THESE LOOPS MAY SLOW THE COMPUTATION
                    gc=copy.copy(g[(nx-1)-j:2*(nx-1)+1-j,(ny-1)-k:2*(ny-1)+1-k,(nz-1)-l:2*(nz-1)+1-l])
                    g2c=copy.copy(g2[(nx-1)-j:2*(nx-1)+1-j,(ny-1)-k:2*(ny-1)+1-k,(nz-1)-l:2*(nz-1)+1-l])
                    i1[j,k,l]=np.sum(S[:,:,:,:,:,:,i]*np.tensordot(np.conjugate(g2c),g2c,axes=0))*(dx*dy*dz)**2
                    i2[j,k,l]=np.sum((1+rho[:,:,:,i])/2*np.absolute(g2c)**2)*dx*dy*dz
                    if i<ntau-1:
                        s1+=S[:,:,:,j,k,l,i]*np.conjugate(g2c)*dx*dy*dz
                        s2+=np.tensordot(np.conjugate(g2c)*rho[:,:,:,i],S[j,k,l,:,:,:,i],axes = 0)*dx*dy*dz
                        s3+=np.tensordot(S[:,:,:,j,k,l,i],g2c*rho[:,:,:,i],axes = 0)*dx*dy*dz
                        snoise1[:,:,:,j,k,l]=rho[:,:,:,i]*(1+rho[j,k,l,i])/2*np.conjugate(gc)
                        snoise2[j,k,l,:,:,:]=rho[j,k,l,i]*(1+rho[:,:,:,i])/2*np.conjugate(gc)
        Intensitytrans[:,:,:,i]=3/(32*math.pi*lambda0**2*delo)*gamsp*(nnu**2*i1+nnu*i2)
        if i<ntau-1:
            for j in range(nz):
                for k in range(nz):
                    snoise[:,:,j,:,:,k]=snoise1[:,:,j,:,:,k] if j>k else snoise2[:,:,j,:,:,k]
            rho[:,:,:,i+1]=rho[:,:,:,i]+gamsp*(-2*(1+rho[:,:,:,i])/2-3*nnu/(8*math.pi)*2*np.real(s1))*dtau
            S[:,:,:,:,:,:,i+1]=S[:,:,:,:,:,:,i]+gamsp*(-S[:,:,:,:,:,:,i]+3/(16*math.pi)*(nnu*(s2+s3)+snoise))*dtau

    return Intensitytrans

Since splitting this function up doesn't look easy, the next thing to do is use a different profiler. A line-by-line profiler should be more helpful here.

I installed line_profiler through pip, added @profile to specify the function to profile, and ran it using kernprof -l cr_laser_physics.py, killing it after 60 seconds. I then viewed the profile using python -m line_profiler cr_laser_physics.py.lprof. Keeping only the lines which contribute more than 5% of the time shows three lines are really slow.

Timer unit: 1e-06 s

Total time: 60.4413 s
File: cr_laser_physics.py
Function: big_loop at line 74

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    74                                           @profile
    75                                           def big_loop(g, g2, rho):
    97      2433   22926489.0   9423.1     37.9                      i1[j,k,l]=np.sum(S[:,:,:,:,:,:,i]*np.tensordot(np.conjugate(g2c),g2c,axes=0))*(dx*dy*dz)**2
   101      2433   18182798.0   7473.4     30.1                          s2+=np.tensordot(np.conjugate(g2c)*rho[:,:,:,i],S[j,k,l,:,:,:,i],axes = 0)*dx*dy*dz
   102      2433   18384856.0   7556.5     30.4                          s3+=np.tensordot(S[:,:,:,j,k,l,i],g2c*rho[:,:,:,i],axes = 0)*dx*dy*dz

Each of those lines are 'hot', they take a long time and are run many many times. I'm not too confident about what can be changed, so I'll leave some suggestions below.

i1[j,k,l] = np.sum(S[:,:,:,:,:,:,i]*np.tensordot(np.conjugate(g2c),g2c,axes=0))*(dx*dy*dz)**2

Both np.sum(S[:,:,:,:,:,:,i] and (dx*dy*dz)**2 will always give the same result, even when j, k, and l change. They can be done once before that triple-nested loop. I suspect that the sum is very costly to compute so often.

Each of the slow lines has a call np.conjugate(g2c). I would recommend doing this once and storing it in a variable.

The latter two slow lines are both adding a tensor product to a sum. Could you specify more axes to np.tensordot in order to remove some of the loop nesting? In general looping and summing in pure python is slower than an equivalent call to a numpy function.

Thank you for the detailed reply. np.sum(S[:,:,:,:,:,:,I] is invariant under change of j,k, and l, but g2c is not. I changed the code to store the value of np.conjugate(g2c). For now, I don't think the axes to np.tensordot can be specified further. After the revision, the speed increase by factor of ~2. I'm trying to increase further. — user16308
– user16308, Commented Feb 8, 2022 at 23:18
@user16308 that is good to hear. I failed to notice that the tensor product is inside np.sum, apologies. Since you can run the code to completition, maybe a follow up question with a profile from the machine you run it one would be good. The profiles I've produced might not be representitive of the code running on your machine (but hopefully they are decently close). — spyr03
– spyr03, Commented Feb 9, 2022 at 10:25

Stack Exchange Network

Laser physics simulation

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Laser physics simulation

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions