Newest 'fastparquet' Questions

0 votes

1 answer

294 views

Subprocess Error When Trying to Pip Install Fastparquet on Windows 10 & Python 3.13

I am trying to pip install Fastparquet and get the error below. I have searched but cannot find anything on this specific issue. I've tried running CMD as administrator but that does not help. I've ...

Robsmith

473

asked Nov 12, 2024 at 12:52

1 vote

0 answers

131 views

Is there any cross-tool way to use Interval type in Parquet format?

Problem One of the logical types defined parquet file format specification is Interval, to represent time intervals (a.k.a. durations, time deltas, and so...). Here's part of what the documentation ...

mgab

4,024

asked Oct 3, 2024 at 12:00

1 vote

1 answer

667 views

Python: OSError: [Errno 22] Invalid argument, when trying to use pandas.read_parquert

I have this simple code import pandas as pd file = pd.read_parquet('file.rot',engine='fastparquet') file.rot is a table of data (float numbers) with 5 columns When I run it the error that appears is ...

EsOj

13

asked Apr 10, 2024 at 16:43

1 vote

0 answers

2k views

NOT_SUPPORTED: Unsupported Trino column type (timestamp(3)) for Parquet column

I have a pandas dataframe that has a timedelta column. df['dep_time'] = pd.to_timedelta(df.loc[:, 'dep_time']) dataframe.dtypes shows this column as: dep_time timedelta64[ns] Next I save this ...

Mohan

4,829

asked Feb 12, 2024 at 13:06

0 votes

0 answers

103 views

Read partitioned parquet files over ftp works with fastparquet, fails with pyarrow

I have partitioned .parquet files hosted on an FTP server with the following structure : ├───train_set=TS04 │ part.0.parquet │ part.1.parquet │ part.2.parquet │ ├───train_set=TS05 │ ...

Arthur Attout

2,966

asked Feb 7, 2024 at 20:01

0 votes

1 answer

167 views

Loading columnar-structured time-series data faster into a NumPy Arrays

Hi! Are there any ways to load large, (ideally) compressed, and columnar-structured data faster into NumPy arrays in Python? Considering common solutions such as Pandas, Apache Parquet/Feather and ...

user22943026

asked Nov 18, 2023 at 23:59

1 vote

0 answers

584 views

Unable to read Parquet file with PyArrow: Malformed levels

Assume that I am unable to change how the Parquet file is written, i.e. it is immutable and so we must find a way of reading it given the following complexities... In: import pandas as pd pd....

Tom Bomer

113

asked Nov 9, 2023 at 16:36

0 votes

1 answer

51 views

How to Handle Growing _metadata File Size and Avoid Corruption in Amazon Redshift Spectrum Parquet Append

Context: Our web application generates a lot of log files that arrive in an S3 bucket. The files in the bucket contain JSON strings and have a .txt file format. We process these files in chunks of 200 ...

Aakash

39

asked Nov 7, 2023 at 10:12

3 votes

1 answer

8k views

Fastparquet Installation Error: Getting requirements to build wheel did not run successfully. exit code 1

Thanks in advance for any help provided. I have no programming or computer science experience so I apologize for what are likely to be a series of very dumb questions! I recently received a book that ...

EricT

31

asked Oct 24, 2023 at 3:05

1 vote

2 answers

1k views

How could be possible to ignore non exist column from pandas read parquet function

I am trying to read parquet file through pandas, where a few columns do not exist in some files. I am wondering to know ignore the column existence check in read parquet function. def column_data(self)...

soft encoder

11

asked Sep 22, 2023 at 13:14

0 votes

1 answer

103 views

asynchronous processing of data but sequential file save in multiprocessing

I'm processing really large log file - e.g. 300 GB and I have a script which chunk reads the file and asynchronously process the data (need to read some key:values from it) in pool of processes and ...

sarkafa

1

asked Sep 3, 2023 at 8:43

0 votes

1 answer

1k views

Error converting column to bytes using encoding UTF8

I got below error when writing dask dataframe to S3. Couldn't figure out why. Does anybody know how to fix. dd.from_pandas(pred, npartitions=npart).to_parquet(out_path) The error is error.. Error ...

Justin Shan

81

asked Aug 29, 2023 at 16:53

1 vote

2 answers

2k views

Unable to write parquet with DATE as logical type for a column from pandas

I am trying to write a parquet file which contains one date column having logical type in parquet as DATE and physical type as INT32. I am writing the parquet file using pandas and using fastparquet ...

Behroz Sikander

4,049

asked Aug 22, 2023 at 12:00

0 votes

1 answer

99 views

Is there the best way to train binary classification with 1000 parquet files?

I'm training a binary classification model with a huge dataset in parquet format. However, it has a lot, I cannot fill all of the data into memory. Currently, I am doing like below but I'm facing out-...

Mason

27

asked Aug 1, 2023 at 8:29

0 votes

1 answer

180 views

Error installing tsflex on Mac: "Failed building wheel for fastparquet"

I've come across an issue while attempting to install the tsflex package on my Mac using pip3. After running pip3 install tsflex, I received the following error message: Collecting tsflex Using ...

Sira

11

asked May 28, 2023 at 17:06

Collectives™ on Stack Overflow

Subprocess Error When Trying to Pip Install Fastparquet on Windows 10 & Python 3.13

Is there any cross-tool way to use Interval type in Parquet format?

Python: OSError: [Errno 22] Invalid argument, when trying to use pandas.read_parquert

NOT_SUPPORTED: Unsupported Trino column type (timestamp(3)) for Parquet column

Read partitioned parquet files over ftp works with fastparquet, fails with pyarrow

Loading columnar-structured time-series data faster into a NumPy Arrays

Unable to read Parquet file with PyArrow: Malformed levels

How to Handle Growing _metadata File Size and Avoid Corruption in Amazon Redshift Spectrum Parquet Append

Fastparquet Installation Error: Getting requirements to build wheel did not run successfully. exit code 1

How could be possible to ignore non exist column from pandas read parquet function

asynchronous processing of data but sequential file save in multiprocessing

Error converting column to bytes using encoding UTF8

Unable to write parquet with DATE as logical type for a column from pandas

Is there the best way to train binary classification with 1000 parquet files?

Error installing tsflex on Mac: "Failed building wheel for fastparquet"

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags