Newest 'cloudera' Questions

2 votes

1 answer

106 views

Impala INSERT OVERWRITE on Iceberg table does not remove duplicates despite using ROW_NUMBER()

I’m working with an Iceberg table in Impala named customer_fact, partitioned by the column created_at. The table contains duplicate rows based on customer_id, and I want to retain only the latest ...

Norah

21

asked May 25 at 8:59

1 vote

0 answers

35 views

Sparklyr job hang

we are running a sparklyr job that runs queries on Cloudera CDP Hive cluster. The job sometimes stops before a dbWriteTable function, doing nothing and running indefinitely. The job doesn't always ...

lrovere

11

asked Mar 27 at 13:40

0 votes

2 answers

73 views

Keep getting "ConnectionRefused" or "OOM" errors when working with PySpark in Cloudera Machine Learning

I'm setting up a CML session with 64GB of RAM and 4 CPUs, then I set up a PySpark session with these configurations spark = SparkSession.builder \ .appName("OptimizedSparkSession") \ ...

Perkūns

39

asked Feb 26 at 13:50

0 votes

3 answers

144 views

What's the difference between loading a dataset into PySpark and filtering it within the SQL query and filtering it with PySparks filter function?

Could someone explain what is faster - loading a table with an SQL query and filtering it within the table or loading the full table and filtering it outside with PySpark functions? For example, this ...

Perkūns

39

asked Feb 26 at 10:03

1 vote

1 answer

67 views

ExecuteScript with Jython: 'ascii' codec can't encode character u'\ufffd' in position 28: ordinal not in range(128)

I am stuck with an error with the encoding of non-ascii characters from a FlowFile content, in NiFi. I am processing the text with an ExecuteScript processor using Jython. The flow is a simple ...

alex

13

asked Feb 19 at 12:06

0 votes

0 answers

45 views

Cloudera Data Science Workbench Not Using Virtual Environment's Python

Question: I am working in Cloudera Data Science Workbench (CDSW) and have created a virtual environment named "testenv". I started a session and activated my virtual environment using: ...

Bini Yoni

11

asked Feb 12 at 9:05

0 votes

0 answers

37 views

NiFi's EncryptContent processor throws "Can't use an RSA_SIGN key for encryption" error

NiFi's EncryptContent processor throws "Can't use an RSA_SIGN key for encryption" error. I tried both .gpg & .asc key file formats.

Sam

21

asked Jan 6 at 14:36

0 votes

2 answers

144 views

How to get name of the script used in a job on Cloudera ML platform

I want to programmatically retrieve the name of the script used in the current job that runs a python script on the Cloudera ML platform. __file__ magic variable doesn't work as in the background our ...

Mischa Lisovyi

3,263

asked Dec 2, 2024 at 15:45

0 votes

1 answer

55 views

SQL Count Time Spent on Every Status

I have the following table in SQL: ID CreatedDate OldValue NewValue 1 18/11/2024 13:05:10 Open Escalated 1 18/11/2024 14:05:10 Escalated With Customer 1 18/11/2024 16:05:10 With Customer Closed 2 20/...

MahdiJ

1

asked Nov 29, 2024 at 20:45

0 votes

1 answer

45 views

Does anyone know how to start work with Manager API Java client?

I created Cloudera cluster on AWS by this instruction https://docs.cloudera.com/cdp-public-cloud/cloud/getting-started/topics/cdp-deploy_cdp_using_terraform.html and these Terraform scripts https://...

VladS

4,356

asked Oct 15, 2024 at 13:29

0 votes

1 answer

90 views

How to load csv file into hive table using python on local windows machine

We have enterprise hadoop cluster installed on linux servers in our organisation. I am trying to insert csv file into one of our hive tables. I have csv file in my local windows machine. I am using ...

Pavan Sai Aravala

49

asked Aug 26, 2024 at 17:24

0 votes

1 answer

53 views

Capture airflow run duration

I have a requirement to gather run duration (time) for the last 3 months, for a particular airflow job. In our CDE environment we use airflow to call spark DBT jobs, of late the run duration of job ...

Anil_468

3

asked Aug 7, 2024 at 14:40

0 votes

0 answers

265 views

Kafka Connect to Snowflake connection via JDBC error

I've been trying to send data from Kafka to Snowflake using the JDBC driver with Kafka Connect. Some details about the environment: Kafka is running in a Cloudera private cluster (Base 7.1.9). The ...

alex

13

asked Jul 11, 2024 at 10:59

1 vote

2 answers

98 views

SSIS Remove quotation from insert script to ADO NET Destination

I had try to insert data to Cloudera/Hive using SSIS.Connection I used from SSIS to Cloudera using ODBC. I got an issue when execute the task, the script generated for insert including double ...

angga_sbs

21

asked Apr 3, 2024 at 3:52

1 vote

2 answers

797 views

Issue with SQLAlchemy accessing Impala database via cloudera ODBC DSN

I'm trying to access an Impala DB via SQLAlchemy - I have configured a DSN that allows me to connect to the DB when using directly pyodbc. However when using SQLAlchemy I get an error: When using a db ...

ErnstW

33

asked Mar 15, 2024 at 15:16

Collectives™ on Stack Overflow

Impala INSERT OVERWRITE on Iceberg table does not remove duplicates despite using ROW_NUMBER()

Sparklyr job hang

Keep getting "ConnectionRefused" or "OOM" errors when working with PySpark in Cloudera Machine Learning

What's the difference between loading a dataset into PySpark and filtering it within the SQL query and filtering it with PySparks filter function?

ExecuteScript with Jython: 'ascii' codec can't encode character u'\ufffd' in position 28: ordinal not in range(128)

Cloudera Data Science Workbench Not Using Virtual Environment's Python

NiFi's EncryptContent processor throws "Can't use an RSA_SIGN key for encryption" error

How to get name of the script used in a job on Cloudera ML platform

SQL Count Time Spent on Every Status

Does anyone know how to start work with Manager API Java client?

How to load csv file into hive table using python on local windows machine

Capture airflow run duration

Kafka Connect to Snowflake connection via JDBC error

SSIS Remove quotation from insert script to ADO NET Destination

Issue with SQLAlchemy accessing Impala database via cloudera ODBC DSN

Hot Network Questions