Newest 'fastparquet+python' Questions

0 votes

1 answer

298 views

Subprocess Error When Trying to Pip Install Fastparquet on Windows 10 & Python 3.13

I am trying to pip install Fastparquet and get the error below. I have searched but cannot find anything on this specific issue. I've tried running CMD as administrator but that does not help. I've ...

Robsmith

473

asked Nov 12, 2024 at 12:52

1 vote

1 answer

667 views

Python: OSError: [Errno 22] Invalid argument, when trying to use pandas.read_parquert

I have this simple code import pandas as pd file = pd.read_parquet('file.rot',engine='fastparquet') file.rot is a table of data (float numbers) with 5 columns When I run it the error that appears is ...

EsOj

13

asked Apr 10, 2024 at 16:43

0 votes

1 answer

167 views

Loading columnar-structured time-series data faster into a NumPy Arrays

Hi! Are there any ways to load large, (ideally) compressed, and columnar-structured data faster into NumPy arrays in Python? Considering common solutions such as Pandas, Apache Parquet/Feather and ...

user22943026

asked Nov 18, 2023 at 23:59

1 vote

0 answers

585 views

Unable to read Parquet file with PyArrow: Malformed levels

Assume that I am unable to change how the Parquet file is written, i.e. it is immutable and so we must find a way of reading it given the following complexities... In: import pandas as pd pd....

Tom Bomer

113

asked Nov 9, 2023 at 16:36

0 votes

1 answer

51 views

How to Handle Growing _metadata File Size and Avoid Corruption in Amazon Redshift Spectrum Parquet Append

Context: Our web application generates a lot of log files that arrive in an S3 bucket. The files in the bucket contain JSON strings and have a .txt file format. We process these files in chunks of 200 ...

Aakash

39

asked Nov 7, 2023 at 10:12

1 vote

2 answers

1k views

How could be possible to ignore non exist column from pandas read parquet function

I am trying to read parquet file through pandas, where a few columns do not exist in some files. I am wondering to know ignore the column existence check in read parquet function. def column_data(self)...

soft encoder

11

asked Sep 22, 2023 at 13:14

0 votes

1 answer

103 views

asynchronous processing of data but sequential file save in multiprocessing

I'm processing really large log file - e.g. 300 GB and I have a script which chunk reads the file and asynchronously process the data (need to read some key:values from it) in pool of processes and ...

sarkafa

1

asked Sep 3, 2023 at 8:43

0 votes

1 answer

1k views

Error converting column to bytes using encoding UTF8

I got below error when writing dask dataframe to S3. Couldn't figure out why. Does anybody know how to fix. dd.from_pandas(pred, npartitions=npart).to_parquet(out_path) The error is error.. Error ...

Justin Shan

81

asked Aug 29, 2023 at 16:53

1 vote

2 answers

2k views

Unable to write parquet with DATE as logical type for a column from pandas

I am trying to write a parquet file which contains one date column having logical type in parquet as DATE and physical type as INT32. I am writing the parquet file using pandas and using fastparquet ...

Behroz Sikander

4,049

asked Aug 22, 2023 at 12:00

0 votes

1 answer

99 views

Is there the best way to train binary classification with 1000 parquet files?

I'm training a binary classification model with a huge dataset in parquet format. However, it has a lot, I cannot fill all of the data into memory. Currently, I am doing like below but I'm facing out-...

Mason

27

asked Aug 1, 2023 at 8:29

0 votes

1 answer

180 views

Error installing tsflex on Mac: "Failed building wheel for fastparquet"

I've come across an issue while attempting to install the tsflex package on my Mac using pip3. After running pip3 install tsflex, I received the following error message: Collecting tsflex Using ...

Sira

11

asked May 28, 2023 at 17:06

0 votes

1 answer

595 views

parquet time stamp overflow with fastparquet/pyarrow

I have a parquet file I am reading from s3 using fastparquet/pandas , the parquet file has a column with date 2022-10-06 00:00:00 , and I see it is wrapping it as 1970-01-20 06:30:14.400, Please see ...

Bill

363

asked Apr 3, 2023 at 1:36

1 vote

1 answer

2k views

pyarrow timestamp datatype error on parquet file

I have this error when I read and count records in pandas using pyarrow, I do not want pyarrow to convert to timestamp[ns], it can keep in timestamp[us], is there an option to keep timestamp as is ?, ...

Bill

363

asked Mar 31, 2023 at 11:14

1 vote

2 answers

3k views

how to efficiently read pq files - Python

I have a list of files with .pq extension, whose names are stored in a list. My intention is to read these files, filter them based on pandas, and then merge them into a single pandas data frame. ...

sergey_208

662

asked Feb 23, 2023 at 19:42

0 votes

1 answer

1k views

How can I query parquet files with the Polars Python API?

I have a .parquet file, and would like to use Python to quickly and efficiently query that file by a column. For example, I might have a column name in that .parquet file and want to get back the ...

SamTheProgrammer

1,173

asked Feb 17, 2023 at 17:12

Collectives™ on Stack Overflow

All Questions

Subprocess Error When Trying to Pip Install Fastparquet on Windows 10 & Python 3.13

Python: OSError: [Errno 22] Invalid argument, when trying to use pandas.read_parquert

Loading columnar-structured time-series data faster into a NumPy Arrays

Unable to read Parquet file with PyArrow: Malformed levels

How to Handle Growing _metadata File Size and Avoid Corruption in Amazon Redshift Spectrum Parquet Append

How could be possible to ignore non exist column from pandas read parquet function

asynchronous processing of data but sequential file save in multiprocessing

Error converting column to bytes using encoding UTF8

Unable to write parquet with DATE as logical type for a column from pandas

Is there the best way to train binary classification with 1000 parquet files?

Error installing tsflex on Mac: "Failed building wheel for fastparquet"

parquet time stamp overflow with fastparquet/pyarrow

pyarrow timestamp datatype error on parquet file

how to efficiently read pq files - Python

How can I query parquet files with the Polars Python API?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags