215 questions
1
vote
0
answers
67
views
Databricks always loads built-in BigQuery connector (0.22.2), can’t override with 0.43.x
I am using Databricks Runtime 15.4 (Spark 3.5 / Scala 2.12) on AWS.
My goal is to use the latest Google BigQuery connector because I need the direct write method (BigQuery Storage Write API):
option(&...
1
vote
0
answers
98
views
What is the meaning of availability: SPOT_WITH_FALLBACK on AWS databricks [closed]
To reduce the computing cost on Databricks, I changed the databricks job bundle configuration as below:
Original
job_clusters:
- job_cluster_key: ...
new_cluster:
...
...
...
0
votes
1
answer
434
views
Error Accessing Volume Data in Databricks Trial: "Maximum Number of Retries Exceeded"
I’m currently learning Databricks using a trial account. I created a volume and successfully loaded data into it. However, when trying to access the file using Spark, I encountered the following error:...
0
votes
0
answers
85
views
Databricks delta live tables metadata comments for created columns file_path and last_modified_date
I have been getting my schema passed in with
custom_schema = create_StructType_schema (access_key,secret_access_key,schema_bucket_name,schema_folder,schema_file_name)
metadata_fp = {"comment"...
1
vote
1
answer
429
views
Databricks Unity Catalog Error: IAM role is not self-assuming when creating external location via Terraform
I'm setting up an external location in Databricks using Unity Catalog via Terraform. During terraform apply, I encounter the following error:
> 2025-05-12T21:28:43 Error: cannot create external ...
0
votes
0
answers
46
views
How can I schedule a Complete Python Project in Databricks
I have a simple Python project with the following structure:
root/
│── src/
│ ├── package_name/
│ │ ├── __init__.py
│ │ ├── main.py
│ │ ├── submodules1/
│ │ │ ├── ...
0
votes
0
answers
5
views
Assign a Search Case ID Number based on 2 indicators
I have website tracking data that has session_id's and the hits are presented with the timestamp they occurred. I'm trying to create search cases with an ID number within those sessions. Every time ...
0
votes
0
answers
131
views
Why does databricks autoloader crash after error and how can I fix it?
We are using databricks autoloader to process parquet files into delta format. The job is scheduled to run once per day and the code looks like this:
def run_autoloader(table_name, checkpoint_path, ...
0
votes
1
answer
37
views
databricks JSONPath wildcard is missing results
I have a json structure that I am trying to match all the cpe_match nodes for, using a JSONPath expression.
Using databricks sql, I have the following query, where "nodes" contains my json:
...
3
votes
0
answers
184
views
Are there any techniques to solve skew data in databricks?
I created skewed data to test a salting approach and tried three different solutions, but none achieved the desired results with a significant runtime improvement. Can you guide me on the best ...
0
votes
1
answer
106
views
Pyspark Databricks optimization techniques
below my code snippet.
spark.read.table('schema.table_1').createOrReplaceTempView('d1') # 400 million records
spark.read.table('schema.table_2').createOrReplaceTempView('d1') $ 300 million records
...
0
votes
1
answer
827
views
Azure DataBricks Cluster usage metrics
Is there any way that I can log Azure databricks cluster usage metrics like CPU, Memory, Network, Throughput usage and etc.... ?
There are ways before 13.3 series, is there any way post 13.3 series (...
1
vote
1
answer
278
views
Unable to create a workspace in databricks using AWS
I am trying to create a workspace in databricks linked to AWS. Its failing on the last step.
It says-
MALFORMED_REQUEST: Failed storage configuration validation checks: List,Put,...
0
votes
2
answers
340
views
Read CSV with "§" as delimiter using Databricks autoloader
I'm very new to spark streaming and autoloader and had a query on how we might be able to get autoloader to read a text file with "§" as the delimiter. Below I tried reading the file as a ...
0
votes
1
answer
117
views
Password protection Excel(.xlsx) file using Python in Databricks
I want to protect an Excel with password which is available in S3 bucket and save it back to s3, I tried with openpyxl and xlsxwriter, it is generating xlsx file, but it opens without asking for ...