Skip to main content
0 votes
1 answer
294 views

Subprocess Error When Trying to Pip Install Fastparquet on Windows 10 & Python 3.13

I am trying to pip install Fastparquet and get the error below. I have searched but cannot find anything on this specific issue. I've tried running CMD as administrator but that does not help. I've ...
Robsmith's user avatar
  • 473
1 vote
0 answers
131 views

Is there any cross-tool way to use Interval type in Parquet format?

Problem One of the logical types defined parquet file format specification is Interval, to represent time intervals (a.k.a. durations, time deltas, and so...). Here's part of what the documentation ...
mgab's user avatar
  • 4,024
1 vote
1 answer
667 views

Python: OSError: [Errno 22] Invalid argument, when trying to use pandas.read_parquert

I have this simple code import pandas as pd file = pd.read_parquet('file.rot',engine='fastparquet') file.rot is a table of data (float numbers) with 5 columns When I run it the error that appears is ...
EsOj's user avatar
  • 13
1 vote
0 answers
2k views

NOT_SUPPORTED: Unsupported Trino column type (timestamp(3)) for Parquet column

I have a pandas dataframe that has a timedelta column. df['dep_time'] = pd.to_timedelta(df.loc[:, 'dep_time']) dataframe.dtypes shows this column as: dep_time timedelta64[ns] Next I save this ...
Mohan's user avatar
  • 4,829
0 votes
0 answers
103 views

Read partitioned parquet files over ftp works with fastparquet, fails with pyarrow

I have partitioned .parquet files hosted on an FTP server with the following structure : ├───train_set=TS04 │ part.0.parquet │ part.1.parquet │ part.2.parquet │ ├───train_set=TS05 │ ...
Arthur Attout's user avatar
0 votes
1 answer
167 views

Loading columnar-structured time-series data faster into a NumPy Arrays

Hi! Are there any ways to load large, (ideally) compressed, and columnar-structured data faster into NumPy arrays in Python? Considering common solutions such as Pandas, Apache Parquet/Feather and ...
user avatar
1 vote
0 answers
584 views

Unable to read Parquet file with PyArrow: Malformed levels

Assume that I am unable to change how the Parquet file is written, i.e. it is immutable and so we must find a way of reading it given the following complexities... In: import pandas as pd pd....
Tom Bomer's user avatar
  • 113
0 votes
1 answer
51 views

How to Handle Growing _metadata File Size and Avoid Corruption in Amazon Redshift Spectrum Parquet Append

Context: Our web application generates a lot of log files that arrive in an S3 bucket. The files in the bucket contain JSON strings and have a .txt file format. We process these files in chunks of 200 ...
Aakash's user avatar
  • 39
3 votes
1 answer
8k views

Fastparquet Installation Error: Getting requirements to build wheel did not run successfully. exit code 1

Thanks in advance for any help provided. I have no programming or computer science experience so I apologize for what are likely to be a series of very dumb questions! I recently received a book that ...
EricT's user avatar
  • 31
1 vote
2 answers
1k views

How could be possible to ignore non exist column from pandas read parquet function

I am trying to read parquet file through pandas, where a few columns do not exist in some files. I am wondering to know ignore the column existence check in read parquet function. def column_data(self)...
soft encoder's user avatar
0 votes
1 answer
103 views

asynchronous processing of data but sequential file save in multiprocessing

I'm processing really large log file - e.g. 300 GB and I have a script which chunk reads the file and asynchronously process the data (need to read some key:values from it) in pool of processes and ...
sarkafa's user avatar
0 votes
1 answer
1k views

Error converting column to bytes using encoding UTF8

I got below error when writing dask dataframe to S3. Couldn't figure out why. Does anybody know how to fix. dd.from_pandas(pred, npartitions=npart).to_parquet(out_path) The error is error.. Error ...
Justin Shan's user avatar
1 vote
2 answers
2k views

Unable to write parquet with DATE as logical type for a column from pandas

I am trying to write a parquet file which contains one date column having logical type in parquet as DATE and physical type as INT32. I am writing the parquet file using pandas and using fastparquet ...
Behroz Sikander's user avatar
0 votes
1 answer
99 views

Is there the best way to train binary classification with 1000 parquet files?

I'm training a binary classification model with a huge dataset in parquet format. However, it has a lot, I cannot fill all of the data into memory. Currently, I am doing like below but I'm facing out-...
Mason's user avatar
  • 27
0 votes
1 answer
180 views

Error installing tsflex on Mac: "Failed building wheel for fastparquet"

I've come across an issue while attempting to install the tsflex package on my Mac using pip3. After running pip3 install tsflex, I received the following error message: Collecting tsflex Using ...
Sira's user avatar
  • 11

15 30 50 per page
1
2 3 4 5
10