0

I'm running some python code on my local machine to read an avro file. The file originally existed in a Google Cloud Storage (GCS) bucket however I downloaded the file locally so I could read it like so:

from avro.datafile import DataFileReader
from avro.io import DatumReader

with open('/path/to/file.avro', 'rb') as f:
    reader = DataFileReader(f, DatumReader())
    records = [record for record in reader]
    reader.close()

print(records[0])

However what I'd like to do is read the file directly from GCS. I know I can write some code to download the file (e.g. like this: https://stackoverflow.com/a/48279267) however I'm wondering if there's a way to read the file directly from GCS without having to laboriously download it first.

2
  • Hi were you able to solve this?
    – mehere
    Commented Dec 28, 2022 at 17:19
  • No, afraid not!
    – jamiet
    Commented Jan 1, 2023 at 19:41

1 Answer 1

0

You can try doing something like this:

from google.cloud import storage
from avro.datafile import DataFileReader
from avro.io import DatumReader
import io

bucket_name = 'your-bucket-name'
blob_name = 'path/to/file.avro'

client = storage.Client()

bucket = client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)

fileBytes = blob.download_as_bytes()

with io.BytesIO(fileBytes) as f:
    reader = DataFileReader(f, DatumReader())
    records = [record for record in reader]
    reader.close()

print(records[0])

Hopefully this helps!

1
  • I don't know much about this topic, I asked a friend of mine and he send me script. Hopefully this works Commented Jun 4, 2024 at 22:01

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.