How to better process csv file with pandas and further dealing with set

Question

I have below working code with pandas and python, i'm looking if there is an improvement or simplification which can be done.

Can we Just wrap this up into a definition.

$ cat getcbk_srvlist_1.py
#!/python/v3.6.1/bin/python3
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import pandas as pd
import os
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##################### END OF THE Display Settings ###################

################# PANDAS Extraction ###########
df_csv = pd.read_csv(input("Please input the CSV File Name: "), usecols=['Platform ID', 'Target system address']).dropna()
hostData = df_csv[df_csv['Platform ID'].str.startswith("CDS-Unix")]['Target system address']
hostData.to_csv('host_file1', header=None, index=None, sep=' ', mode='a')

with open('host_file1') as f1, open('host_file2') as f2:
    dataset1 = set(f1)
    dataset2 = set(f2)

for i, item in enumerate(sorted(dataset2 - dataset1)):
    print(str(item).strip())

os.unlink("host_file1")

The above code just compares the two files one is processed through pandas ie host_file1 and another is already existing host_file2.

Maarten Fabré · Accepted Answer · 2018-12-12 08:52:46Z

main guard

It is common to put the code you want to run behind an if __name__ == "__main__":, so you can later import the functions that might be reused in a different module

naming

You use both snake_case and CamelCase. Try to stick to 1 naming convention. PEP-8 advised snake_case for variables and functions, CamelCase for classes

functions

split the code in logical parts

pandas settings

def settings_pandas():
    pd.set_option("display.height", None)
    pd.set_option("display.max_rows", None)
    pd.set_option("display.max_columns", None)
    pd.set_option("display.width", None)
    pd.set_option("expand_frame_repr", True)

filename input

The way you ask the filename is very fragile. A more robust way would be to ask the filename in a different function, and then validate it

from pathlib import Path
def ask_filename(validate=True):
    """
    Asks the user for a filename.
    If `validate` is True, it checks whether the file exists and it is a file
    """
    while True:
        file = Path(input("Please input the CSV File Name: (CTRL+C to abort)"))
        if validate:
            if not file.exists() and file.is_file():
                print("Filename is invalid")
                continue
        return file

IO

def read_host_data(filename):
    """reads `filename`, filters the unix platforms, and returns the `Target system address`"""
    df = pd.read_csv(filename, usecols=["Platform ID", 'Target system address']).dropna()
    unix_platforms = df['Platform ID'].str.startswith("CDS-Unix")
    return df.loc[unix_platforms, "Target system address"]

There is no need to save the intermediary data to a file. You could use a io.StringIO. An alternative if you need a temporary file is tempfile.

But in this case, where you just need the set of the values of a pd.Series, you can do just set(host_data), without the intermediary file.

putting it together:

if __name__ == "__main__":
    settings_pandas()  # needed?
    filename = ask_filename()
    host_data = set(read_host_data(filename))
    with open("hostfile2") as hostfile2:
        host_data2 = set(hostfile2)
    for item in sorted(host_data2 - host_data):
        print(item.strip())

since the i is not used, I dropped the enumerate. Since host_data2 is directly read from a file, there are no conversions, and it are all strs, so the conversion to str is dropped too.

Since I don't see any printing of pandas data, This part can be dropped apparently.

@.Maarten, thnx for the elaborated and explicit explanation. — Karn Kumar
– Karn Kumar, Commented Dec 12, 2018 at 15:15

Stack Exchange Network

How to better process csv file with pandas and further dealing with set

1 Answer 1

main guard

naming

functions

pandas settings

filename input

IO

putting it together:

You must log in to answer this question.

Hot Network Questions

How to better process csv file with pandas and further dealing with set

1 Answer 1

main guard

naming

functions

pandas settings

filename input

IO

putting it together:

You must log in to answer this question.

Related

Hot Network Questions