Newest 'aws-databricks' Questions

1 vote

0 answers

67 views

Databricks always loads built-in BigQuery connector (0.22.2), can’t override with 0.43.x

I am using Databricks Runtime 15.4 (Spark 3.5 / Scala 2.12) on AWS. My goal is to use the latest Google BigQuery connector because I need the direct write method (BigQuery Storage Write API): option(&...

Thilina

157

asked Nov 13 at 12:24

1 vote

0 answers

98 views

What is the meaning of availability: SPOT_WITH_FALLBACK on AWS databricks [closed]

To reduce the computing cost on Databricks, I changed the databricks job bundle configuration as below: Original job_clusters: - job_cluster_key: ... new_cluster: ... ... ...

Morgan Fisher

1

asked Nov 10 at 6:06

0 votes

1 answer

434 views

Error Accessing Volume Data in Databricks Trial: "Maximum Number of Retries Exceeded"

I’m currently learning Databricks using a trial account. I created a volume and successfully loaded data into it. However, when trying to access the file using Spark, I encountered the following error:...

Learn Hadoop

3,058

asked Jul 5 at 12:12

0 votes

0 answers

85 views

Databricks delta live tables metadata comments for created columns file_path and last_modified_date

I have been getting my schema passed in with custom_schema = create_StructType_schema (access_key,secret_access_key,schema_bucket_name,schema_folder,schema_file_name) metadata_fp = {"comment"...

Ethan Bonsall

1

asked Jun 9 at 15:24

1 vote

1 answer

429 views

Databricks Unity Catalog Error: IAM role is not self-assuming when creating external location via Terraform

I'm setting up an external location in Databricks using Unity Catalog via Terraform. During terraform apply, I encounter the following error: > 2025-05-12T21:28:43 Error: cannot create external ...

Olfa2

131

asked May 13 at 8:31

0 votes

0 answers

46 views

How can I schedule a Complete Python Project in Databricks

I have a simple Python project with the following structure: root/ │── src/ │ ├── package_name/ │ │ ├── __init__.py │ │ ├── main.py │ │ ├── submodules1/ │ │ │ ├── ...

NaineeL SoyantaR

39

asked Apr 1 at 3:40

0 votes

0 answers

5 views

Assign a Search Case ID Number based on 2 indicators

I have website tracking data that has session_id's and the hits are presented with the timestamp they occurred. I'm trying to create search cases with an ID number within those sessions. Every time ...

RainGardner

1

asked Mar 20 at 11:00

0 votes

0 answers

131 views

Why does databricks autoloader crash after error and how can I fix it?

We are using databricks autoloader to process parquet files into delta format. The job is scheduled to run once per day and the code looks like this: def run_autoloader(table_name, checkpoint_path, ...

Boris

906

asked Feb 18 at 14:48

0 votes

1 answer

37 views

databricks JSONPath wildcard is missing results

I have a json structure that I am trying to match all the cpe_match nodes for, using a JSONPath expression. Using databricks sql, I have the following query, where "nodes" contains my json: ...

Neil P

3,250

asked Nov 26, 2024 at 16:52

3 votes

0 answers

184 views

Are there any techniques to solve skew data in databricks?

I created skewed data to test a salting approach and tried three different solutions, but none achieved the desired results with a significant runtime improvement. Can you guide me on the best ...

Learn Hadoop

3,058

asked Nov 16, 2024 at 10:28

0 votes

1 answer

106 views

Pyspark Databricks optimization techniques

below my code snippet. spark.read.table('schema.table_1').createOrReplaceTempView('d1') # 400 million records spark.read.table('schema.table_2').createOrReplaceTempView('d1') $ 300 million records ...

Learn Hadoop

3,058

asked Nov 5, 2024 at 15:02

0 votes

1 answer

827 views

Azure DataBricks Cluster usage metrics

Is there any way that I can log Azure databricks cluster usage metrics like CPU, Memory, Network, Throughput usage and etc.... ? There are ways before 13.3 series, is there any way post 13.3 series (...

MJ029

319

asked Nov 4, 2024 at 15:48

1 vote

1 answer

278 views

Unable to create a workspace in databricks using AWS

I am trying to create a workspace in databricks linked to AWS. Its failing on the last step. It says- MALFORMED_REQUEST: Failed storage configuration validation checks: List,Put,...

Priyanka Gupta

39

asked Oct 29, 2024 at 9:01

0 votes

2 answers

340 views

Read CSV with "§" as delimiter using Databricks autoloader

I'm very new to spark streaming and autoloader and had a query on how we might be able to get autoloader to read a text file with "§" as the delimiter. Below I tried reading the file as a ...

beingmanny

11

asked Sep 19, 2024 at 14:44

0 votes

1 answer

117 views

Password protection Excel(.xlsx) file using Python in Databricks

I want to protect an Excel with password which is available in S3 bucket and save it back to s3, I tried with openpyxl and xlsxwriter, it is generating xlsx file, but it opens without asking for ...

MOHAMMED SALMAN

1

asked Sep 18, 2024 at 20:46

Collectives™ on Stack Overflow

Databricks always loads built-in BigQuery connector (0.22.2), can’t override with 0.43.x

What is the meaning of availability: SPOT_WITH_FALLBACK on AWS databricks [closed]

Error Accessing Volume Data in Databricks Trial: "Maximum Number of Retries Exceeded"

Databricks delta live tables metadata comments for created columns file_path and last_modified_date

Databricks Unity Catalog Error: IAM role is not self-assuming when creating external location via Terraform

How can I schedule a Complete Python Project in Databricks

Assign a Search Case ID Number based on 2 indicators

Why does databricks autoloader crash after error and how can I fix it?

databricks JSONPath wildcard is missing results

Are there any techniques to solve skew data in databricks?

Pyspark Databricks optimization techniques

Azure DataBricks Cluster usage metrics

Unable to create a workspace in databricks using AWS

Read CSV with "§" as delimiter using Databricks autoloader

Password protection Excel(.xlsx) file using Python in Databricks

Hot Network Questions