Long time reader of Code Review with my first question. I'm self-taught and suspect this code can be improved. The project is really important to me and the team and I could really use the help.
I'm using this as part of a larger script run through the terminal which takes a GeoDataFrame and adds the GPS coordinates as a Point per the specification, and then outputs a GDF again. (I then export it as a geojson file.) So far it's running through ~3500 rows in about 2 hours - which is fine - but I don't think I'm using many coding best practices. I'd like to make it as robust as possible because we have some datasets to run through that are +15000 rows. Does anyone have any feedback on this script?
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import geopy as gpy
from geopy.geocoders import GoogleV3
from geopy.extra.rate_limiter import RateLimiter
from geopy.exc import GeocoderTimedOut
from geopy.location import Location
def addressParsing(gdf_obj, delayseconds):
"""
This takes a whole GeoDataFrame and adds the Geo Coords from the standard address
before returning the udpated Geodataframe
"""
# Returned class obj if None
site = Location("0", (0.0, 0.0, 0.0))
def do_geocode(address):
try:
return geocode_with_delay(address)
except GeocoderTimedOut:
return geocode_with_delay(address)
print(f"starting parser: {gdf_obj.shape}, estimated time: {round(delayseconds * gdf_obj.shape[0] / 60, 2)} min")
# Initiate geocoder
geolocator = GoogleV3(api_key=g_api_key)
# Create a geopy rate limiter class:
geocode_with_delay = RateLimiter(
geolocator.geocode,
error_wait_seconds=delayseconds + 20,
min_delay_seconds=delayseconds,
swallow_exceptions=True,
return_value_on_exception= site
)
# Apply the geocoder with delay using the rate limiter:
gdf_obj['temp'] = gdf_obj['Address'].apply(do_geocode)
# Get point coordinates from the GeoPy location object on each row, drop z vector data:
gdf_obj["coords"] = gdf_obj['temp'].apply(lambda loc: tuple(loc.point)[:2] if loc else tuple(site.point)[:2])
# Create shapely point objects to geometry column:
gdf_obj["geometry"] = gdf_obj["coords"].apply(Point)
# Drop intermediate columns
gdf_obj = gdf_obj.drop(["temp", "coords"], axis=1)
print("FINISHED - conversion successful - check shape")
return gdf_obj