I have a Python celery application utilising Apache Spark for large-scale processing. Everything was going fine until today, when I received:
Exception in thread "main" java.nio.file.NoSuchFileException: /tmp/tmpkqdh2glm/connection16758665990471584352.info
Below is my docker-compose file. I have tried everything, but it seems I am missing something. Also, in between, it starts working normally and then goes back to throwing NoSuchFileException. Do you have any hint what I am doing wrong?
My PySpark and local Spark setup for the celery machine is also 4.0.0. The celery spark master and bindAddress are also set in the app. This setup was working perfectly fine till yesterday.
version: '3.9'
services:
spark-master:
image: bitnami/spark:4.0.0
container_name: spark-master
environment:
- SPARK_MODE=master
- SPARK_MASTER_HOST=spark-master
- SPARK_MASTER_PORT=7077
- SPARK_DRIVER_MEMORY=1g
- SPARK_EXECUTOR_MEMORY=1g
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
- SPARK_WORKER_LOG_DIR=/app/.logs
- SPARK_LOCAL_DIRS=/tmp/spark
- SPARK_WORKER_DIR=/tmp/spark
- SPARK_MASTER_DIR=/tmp/spark
ports:
- "8080:8080"
- "7077:7077"
networks:
- network
restart: always
volumes:
- logs:/app/.logs
- spark-tmp:/tmp/spark
spark-worker-1:
image: bitnami/spark:4.0.0
container_name: spark-worker-1
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_WORKER_MEMORY=1g
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
- SPARK_WORKER_LOG_DIR=/app/.logs
- SPARK_LOCAL_DIRS=/tmp/spark
- SPARK_WORKER_DIR=/tmp/spark
ports:
- "8081:8081"
networks:
- network
restart: always
volumes:
- logs:/app/.logs
- spark-tmp:/tmp/spark
celery-worker-1:
container_name: celery-worker-1
image: backend:latest
command: celery -A utils.celery_utils worker --loglevel=info --concurrency=4
env_file:
- .env
environment:
- SPARK_DRIVER_HOST=celery-worker-1
- CELERYD_PREFETCH_MULTIPLIER=1
- SPARK_LOCAL_DIRS=/tmp/spark
depends_on:
- redis
- spark-master
networks:
- network
restart: always
volumes:
- logs:/app/.logs
- spark-tmp:/tmp/spark
volumes:
pgdata:
logs:
spark-tmp:
driver: local
driver_opts:
type: tmpfs
device: tmpfs
networks:
network:
driver: bridge
So far, I have tried mounting the tmp directory and creating tmpfs.
I expected the data to be written via Spark, as it always does, but in between, it crashed.