1

How do I write many vector datasets (layers or GeoDataFrames) to a GeoPackage?

I have 12 GeoDataFrames that I want to store in one GeoPackage as separate layers. I'm writing these layers to the same file from multiple processes at the same time.

Below is the code I use, and it produces an error:

pyogrio.errors.DataSourceError: A file system object called '...
lines_1900.gpkg' already exists.
# write to GeoPackage
gpkg_name = 'lines_' + year + '.gpkg'
gpkg_file = os.path.join(out_folder, gpkg_name)
trip_lines.to_file(gpkg_file, layer=month, driver="GPKG")

Or is this a completely wrong approach?

8
  • 1
    How do you define month?
    – Bera
    Commented yesterday
  • I suggest you step through your code and see what exact values you are getting for each to_file call. Alternatively you could comment out the to_file calls and replace them with print("%s %d" % (gpkg_file, month)) so you can see the parameters printed to the console without stepping your code. But this now transforms to general coding and debugging, nothing specific to geopandas.
    – til_b
    Commented yesterday
  • I can't reproduce. Please provide runnable code with dummy data that causes the exception.
    – user2856
    Commented yesterday
  • 2
    I forgot to mention, that each layer is written in separate process. I have a multiprocessing script, 6 processes running at the same time and access the Geopackage. This is why I receive error. Thank you for looking into this. Commented yesterday
  • A file can always only be written to from one process at the time... Is there a specific reason why you are writing from several processes?
    – Pieter
    Commented yesterday

2 Answers 2

3

I can run this code without errors, so I suggest your problem lies in a different part of your script.

This works with my QGIS-shipped python installation:

import geopandas as gpd
from shapely.geometry import Point, Polygon

# First GeoDataFrame: Points
gdf_points = gpd.GeoDataFrame({
    'name': ['A', 'B'],
    'geometry': [Point(0, 0), Point(1, 1)]
}, crs="EPSG:4326")

# Second GeoDataFrame: Polygons
gdf_polygons = gpd.GeoDataFrame({
    'id': [1],
    'geometry': [Polygon([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)])]
}, crs="EPSG:4326")

# Write both to the same GeoPackage file, each to a different layer
gdf_points.to_file("d:/tmp/example.gpkg", layer='points', driver="GPKG")
gdf_polygons.to_file("d:/tmp/example.gpkg", layer='polygons', driver="GPKG")
1
  • Ok, yes of course. This works for me as well. I forgot to mention, that each layer is written in separate process. I have a multiprocessing script, 6 processes running at the same time and access the Geopackage. This is why I receive error. Still looking for a solution Commented yesterday
3

I cant reproduce your error.

Can you try this?

import geopandas as gpd
    
out_file = r"C:\Users\bera\Desktop\gistest\my_data.gpkg"
df = gpd.GeoDataFrame(data={"col1":[1]}, geometry=gpd.points_from_xy(x=[1], y=[1]), crs=3006)
    
#Write the dataframe as two different layers into the same geopackage
df.to_file(filename=out_file, layer="layer_1", driver="GPKG")
df.to_file(filename=out_file, layer="layer_2", driver="GPKG")

print(gpd.list_layers(out_file))
#       name geometry_type
# 0  layer_1         Point
# 1  layer_2         Point
2
  • Ok, yes of course. This works for me as well. I forgot to mention, that each layer is written in separate process. I have a multiprocessing script, 6 processes running at the same time and access the Geopackage. This is why I receive error. Still looking for a solution Commented yesterday
  • 1
    For info, from geopandas 1.0 it has a geopandas.list_layers(path) function...
    – Pieter
    Commented 21 hours ago

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.