149 questions
0
votes
1
answer
294
views
Subprocess Error When Trying to Pip Install Fastparquet on Windows 10 & Python 3.13
I am trying to pip install Fastparquet and get the error below. I have searched but cannot find anything on this specific issue. I've tried running CMD as administrator but that does not help. I've ...
1
vote
0
answers
131
views
Is there any cross-tool way to use Interval type in Parquet format?
Problem
One of the logical types defined parquet file format specification is Interval, to represent time intervals (a.k.a. durations, time deltas, and so...). Here's part of
what the documentation ...
1
vote
1
answer
667
views
Python: OSError: [Errno 22] Invalid argument, when trying to use pandas.read_parquert
I have this simple code
import pandas as pd
file = pd.read_parquet('file.rot',engine='fastparquet')
file.rot is a table of data (float numbers) with 5 columns
When I run it the error that appears is ...
1
vote
0
answers
2k
views
NOT_SUPPORTED: Unsupported Trino column type (timestamp(3)) for Parquet column
I have a pandas dataframe that has a timedelta column.
df['dep_time'] = pd.to_timedelta(df.loc[:, 'dep_time'])
dataframe.dtypes shows this column as:
dep_time timedelta64[ns]
Next I save this ...
0
votes
0
answers
103
views
Read partitioned parquet files over ftp works with fastparquet, fails with pyarrow
I have partitioned .parquet files hosted on an FTP server with the following structure :
├───train_set=TS04
│ part.0.parquet
│ part.1.parquet
│ part.2.parquet
│
├───train_set=TS05
│ ...
0
votes
1
answer
167
views
Loading columnar-structured time-series data faster into a NumPy Arrays
Hi! Are there any ways to load large, (ideally) compressed, and columnar-structured data faster into NumPy arrays in Python? Considering common solutions such as Pandas, Apache Parquet/Feather and ...
1
vote
0
answers
584
views
Unable to read Parquet file with PyArrow: Malformed levels
Assume that I am unable to change how the Parquet file is written, i.e. it is immutable and so we must find a way of reading it given the following complexities...
In:
import pandas as pd
pd....
0
votes
1
answer
51
views
How to Handle Growing _metadata File Size and Avoid Corruption in Amazon Redshift Spectrum Parquet Append
Context:
Our web application generates a lot of log files that arrive in an S3 bucket.
The files in the bucket contain JSON strings and have a .txt file format. We process these files in chunks of 200 ...
3
votes
1
answer
8k
views
Fastparquet Installation Error: Getting requirements to build wheel did not run successfully. exit code 1
Thanks in advance for any help provided. I have no programming or computer science experience so I apologize for what are likely to be a series of very dumb questions!
I recently received a book that ...
1
vote
2
answers
1k
views
How could be possible to ignore non exist column from pandas read parquet function
I am trying to read parquet file through pandas, where a few columns do not exist in some files.
I am wondering to know ignore the column existence check in read parquet function.
def column_data(self)...
0
votes
1
answer
103
views
asynchronous processing of data but sequential file save in multiprocessing
I'm processing really large log file - e.g. 300 GB and I have a script which chunk reads the file and asynchronously process the data (need to read some key:values from it) in pool of processes and ...
0
votes
1
answer
1k
views
Error converting column to bytes using encoding UTF8
I got below error when writing dask dataframe to S3. Couldn't figure out why. Does anybody know how to fix.
dd.from_pandas(pred, npartitions=npart).to_parquet(out_path)
The error is
error.. Error ...
1
vote
2
answers
2k
views
Unable to write parquet with DATE as logical type for a column from pandas
I am trying to write a parquet file which contains one date column having logical type in parquet as DATE and physical type as INT32. I am writing the parquet file using pandas and using fastparquet ...
0
votes
1
answer
99
views
Is there the best way to train binary classification with 1000 parquet files?
I'm training a binary classification model with a huge dataset in parquet format. However, it has a lot, I cannot fill all of the data into memory. Currently, I am doing like below but I'm facing out-...
0
votes
1
answer
180
views
Error installing tsflex on Mac: "Failed building wheel for fastparquet"
I've come across an issue while attempting to install the tsflex package on my Mac using pip3. After running pip3 install tsflex, I received the following error message:
Collecting tsflex
Using ...