1

I have a Python celery application utilising Apache Spark for large-scale processing. Everything was going fine until today, when I received:

Exception in thread "main" java.nio.file.NoSuchFileException: /tmp/tmpkqdh2glm/connection16758665990471584352.info

Below is my docker-compose file. I have tried everything, but it seems I am missing something. Also, in between, it starts working normally and then goes back to throwing NoSuchFileException. Do you have any hint what I am doing wrong?

My PySpark and local Spark setup for the celery machine is also 4.0.0. The celery spark master and bindAddress are also set in the app. This setup was working perfectly fine till yesterday.

version: '3.9'

services:
  spark-master:
    image: bitnami/spark:4.0.0
    container_name: spark-master
    environment:
      - SPARK_MODE=master
      - SPARK_MASTER_HOST=spark-master
      - SPARK_MASTER_PORT=7077
      - SPARK_DRIVER_MEMORY=1g
      - SPARK_EXECUTOR_MEMORY=1g
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark
      - SPARK_WORKER_LOG_DIR=/app/.logs
      - SPARK_LOCAL_DIRS=/tmp/spark
      - SPARK_WORKER_DIR=/tmp/spark
      - SPARK_MASTER_DIR=/tmp/spark
    ports:
      - "8080:8080"
      - "7077:7077"
    networks:
      - network
    restart: always
    volumes:
      - logs:/app/.logs
      - spark-tmp:/tmp/spark

  spark-worker-1:
    image: bitnami/spark:4.0.0
    container_name: spark-worker-1
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=1g
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark
      - SPARK_WORKER_LOG_DIR=/app/.logs
      - SPARK_LOCAL_DIRS=/tmp/spark
      - SPARK_WORKER_DIR=/tmp/spark
    ports:
      - "8081:8081"
    networks:
      - network
    restart: always
    volumes:
      - logs:/app/.logs
      - spark-tmp:/tmp/spark

  celery-worker-1:
    container_name: celery-worker-1
    image: backend:latest
    command: celery -A utils.celery_utils worker --loglevel=info --concurrency=4
    env_file:
      - .env
    environment:
      - SPARK_DRIVER_HOST=celery-worker-1
      - CELERYD_PREFETCH_MULTIPLIER=1
      - SPARK_LOCAL_DIRS=/tmp/spark
    depends_on:
      - redis
      - spark-master
    networks:
      - network
    restart: always
    volumes:
      - logs:/app/.logs
      - spark-tmp:/tmp/spark

volumes:
  pgdata:
  logs:
  spark-tmp:
    driver: local
    driver_opts:
      type: tmpfs
      device: tmpfs

networks:
  network:
    driver: bridge

So far, I have tried mounting the tmp directory and creating tmpfs.

I expected the data to be written via Spark, as it always does, but in between, it crashed.

2
  • 1
    Please post the full stacktrace of the error if you can. It can help people get more context on how your process fails. Commented Jun 15 at 5:37
  • You can't mount your volume into both the driver and workers. They need to be separated Commented Jun 15 at 9:39

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.