Using Pandas to read a 14k lines CSV with many empty cells - what I call "Emmenthaler data", lots of holes :-) . A very short, simplified sampler (urlbuilder.csv):
"one","two","three"
"","foo","bacon",""
"spam","bar",""
The cells contain information to match against this web API, like this:
http://base_URL&two="foo"&three="bacon"
http://base_URL&one="spam"&two="bar"
Leaving values empty in the URL (``...one=""&two="bar"`) would give wrong results. So I want to use just the non-empty fields in each row. To illustrate the method:
import pandas as pd
def buildurl(row, colnames):
URL = 'http://someAPI/q?'
values = row.one, row.two, row.three # ..ugly..
for value,field in zip(values, colnames):
if not pd.isnull(value):
URL = '{}{}:{}&'.format(URL, field, value)
return URL
df = pd.read_csv('urlbuilder.csv', encoding='utf-8-sig', engine='python')
colnames = list(df.columns)
for row in df.itertuples():
url = buildurl(row, colnames)
print(url)
It works and it's probably better than a cascade of if not isnull's. But I still have this loud hunch that there are much more elegant ways of doing this. Yet I can't seem to find them, probably I'm not googling for the right jargon.
Please comment