1,149 questions
1
vote
0
answers
42
views
Modin + Dask distributed: AttributeError: type object 'ABCMeta' has no attribute 'deploy_axis_func'
I'm trying to use Modin with a Dask LocalCluster to parallelize pandas DataFrame operations in a Django application (Python 3.13). Even with processes=False (thread-based workers, same process), the ...
1
vote
0
answers
37
views
IPv4 ip encoding in parallel processing with dask_cudf
I am new to parallel processing with dask. I have 2 columns with IPV4 ip values in a loaded multi-partition dataframe, I cannot seem to find a good method to encode them in order to train a ...
1
vote
0
answers
26
views
Converting unchunked HDF5 to OME-Zarr with Dask
Hi I'm doing a conversion of HDF5 to OME-Zarr with Dask. Right now I'm using a small dataset with shape (150, 3768, 2008) approximately 4.5gb in size. My target chunks are (64, 64, 64). I'm running ...
3
votes
1
answer
91
views
Dask client connects successfully but no workers are available [closed]
I am using Dask for some processing. The client starts successfully, but I am seeing zero workers.
This is how I am creating the client:
client = Client("tls://localhost:xxxx")
This is the ...
3
votes
1
answer
75
views
task works on local, but errors on Dask cluster: "SystemError: error return without exception set"
I have the following codes that pass an array to the task and submit to Dask cluster. The Dask cluster is running in Docker with several Dask workers. Docker starts with:
scheduler:
docker run -d \
-...
3
votes
0
answers
91
views
How to optimize NetCDF files and dask for processing long-term climataological indices with xclim (ex. SPI using 30-day rolling window)?
I am trying to analyze the 30 day standardized precipitation index for a multi-state range of the southeastern US for the year 2016. I'm using xclim to process a direct pull of gridded daily ...
0
votes
0
answers
57
views
Dask distributed stores old version of my code
I am analysing some data using dask distributed on a SLURM cluster. I am also using jupyter notebook. I am changing my codebase frequently and running jobs. Recently, a lot of my jobs started to crash....
0
votes
0
answers
75
views
Why does the Dask dashboard become unresponsive over time?
I maintain a production Dask cluster. Every few weeks or so I need to restart the scheduler because it becomes progressively slower over time. The dashboard can take well over a minute to display the &...
1
vote
1
answer
57
views
Using Streamz.Dask and matplotlib and tkiniter window to display graphs and histograms in realtime?
I already have a code using threadpool tkiniter and matplotlib to process signals which are getting written to a file from another process. The Synchronization between the two process is by reading ...
0
votes
1
answer
95
views
Dask adaptive deployment in azure kubernetes
I am trying to deploy a dask cluster with 0 workers and 1 scheduler, based on the work load need to scale up the worker to required, i found that the adaptive deployment is the correct way, i am using ...
1
vote
0
answers
119
views
Dask concat on multiple dataframe axis=1
I am new to Dask. While attempting to run concat on a list of DataFrames, I noticed it is consuming more time, resources, and tasks than expected. Here are the details of my run:
Scheduler (same as ...
0
votes
1
answer
296
views
How to Set Dask Dashboard Address with SLURMRunner (Jobqueue) and Access It via SSH Port Forwarding?
I am trying to run a Dask Scheduler and Workers on a remote cluster using SLURMRunner from dask-jobqueue. I want to bind the Dask dashboard to 0.0.0.0 (so it’s accessible via port forwarding) and ...
0
votes
0
answers
135
views
Initializing a local cluster in Dask takes forever
I'm trying out some things with Dask for the first time, and while I had it running a few weeks ago, I now find that I can't get the LocalCluster initiated. I've cut if off after running 30 minutes at ...
0
votes
0
answers
136
views
dask_cuda problem with Local CUDA Cluster
I am trying to get this code to work and then use it to train various models on two gpu's:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
if __name__ == "__main__&...
1
vote
1
answer
81
views
How to nest dask.delayed functions within other dask.delayed functions
I am trying to learn dask, and have created the following toy example of a delayed pipeline.
+-----+ +-----+ +-----+
| baz +--+ bar +--+ foo |
+-----+ +-----+ +-----+
So baz has a dependency on ...