2,500 questions
2
votes
1
answer
106
views
Impala INSERT OVERWRITE on Iceberg table does not remove duplicates despite using ROW_NUMBER()
I’m working with an Iceberg table in Impala named customer_fact, partitioned by the column created_at. The table contains duplicate rows based on customer_id, and I want to retain only the latest ...
1
vote
0
answers
35
views
Sparklyr job hang
we are running a sparklyr job that runs queries on Cloudera CDP Hive cluster.
The job sometimes stops before a dbWriteTable function, doing nothing and running indefinitely.
The job doesn't always ...
0
votes
2
answers
73
views
Keep getting "ConnectionRefused" or "OOM" errors when working with PySpark in Cloudera Machine Learning
I'm setting up a CML session with 64GB of RAM and 4 CPUs, then I set up a PySpark session with these configurations
spark = SparkSession.builder \
.appName("OptimizedSparkSession") \
...
0
votes
3
answers
144
views
What's the difference between loading a dataset into PySpark and filtering it within the SQL query and filtering it with PySparks filter function?
Could someone explain what is faster - loading a table with an SQL query and filtering it within the table or loading the full table and filtering it outside with PySpark functions?
For example, this ...
1
vote
1
answer
67
views
ExecuteScript with Jython: 'ascii' codec can't encode character u'\ufffd' in position 28: ordinal not in range(128)
I am stuck with an error with the encoding of non-ascii characters from a FlowFile content, in NiFi. I am processing the text with an ExecuteScript processor using Jython.
The flow is a simple ...
0
votes
0
answers
45
views
Cloudera Data Science Workbench Not Using Virtual Environment's Python
Question:
I am working in Cloudera Data Science Workbench (CDSW) and have created a virtual environment named "testenv". I started a session and activated my virtual environment using:
...
0
votes
0
answers
37
views
NiFi's EncryptContent processor throws "Can't use an RSA_SIGN key for encryption" error
NiFi's EncryptContent processor throws "Can't use an RSA_SIGN key for encryption" error. I tried both .gpg & .asc key file formats.
0
votes
2
answers
144
views
How to get name of the script used in a job on Cloudera ML platform
I want to programmatically retrieve the name of the script used in the current job that runs a python script on the Cloudera ML platform.
__file__ magic variable doesn't work as in the background our ...
0
votes
1
answer
55
views
SQL Count Time Spent on Every Status
I have the following table in SQL:
ID
CreatedDate
OldValue
NewValue
1
18/11/2024 13:05:10
Open
Escalated
1
18/11/2024 14:05:10
Escalated
With Customer
1
18/11/2024 16:05:10
With Customer
Closed
2
20/...
0
votes
1
answer
45
views
Does anyone know how to start work with Manager API Java client?
I created Cloudera cluster on AWS by this instruction https://docs.cloudera.com/cdp-public-cloud/cloud/getting-started/topics/cdp-deploy_cdp_using_terraform.html and these Terraform scripts https://...
0
votes
1
answer
90
views
How to load csv file into hive table using python on local windows machine
We have enterprise hadoop cluster installed on linux servers in our organisation. I am trying to insert csv file into one of our hive tables. I have csv file in my local windows machine. I am using ...
0
votes
1
answer
53
views
Capture airflow run duration
I have a requirement to gather run duration (time) for the last 3 months, for a particular airflow job.
In our CDE environment we use airflow to call spark DBT jobs, of late the run duration of job ...
0
votes
0
answers
265
views
Kafka Connect to Snowflake connection via JDBC error
I've been trying to send data from Kafka to Snowflake using the JDBC driver with Kafka Connect.
Some details about the environment:
Kafka is running in a Cloudera private cluster (Base 7.1.9).
The ...
1
vote
2
answers
98
views
SSIS Remove quotation from insert script to ADO NET Destination
I had try to insert data to Cloudera/Hive using SSIS.Connection I used from SSIS to Cloudera using ODBC.
I got an issue when execute the task, the script generated for insert including double ...
1
vote
2
answers
797
views
Issue with SQLAlchemy accessing Impala database via cloudera ODBC DSN
I'm trying to access an Impala DB via SQLAlchemy - I have configured a DSN that allows me to connect to the DB when using directly pyodbc.
However when using SQLAlchemy I get an error:
When using a db ...