I have below working code with pandas and python, i'm looking if there is an improvement or simplification which can be done.
Can we Just wrap this up into a definition.
$ cat getcbk_srvlist_1.py
#!/python/v3.6.1/bin/python3
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import pandas as pd
import os
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##################### END OF THE Display Settings ###################
################# PANDAS Extraction ###########
df_csv = pd.read_csv(input("Please input the CSV File Name: "), usecols=['Platform ID', 'Target system address']).dropna()
hostData = df_csv[df_csv['Platform ID'].str.startswith("CDS-Unix")]['Target system address']
hostData.to_csv('host_file1', header=None, index=None, sep=' ', mode='a')
with open('host_file1') as f1, open('host_file2') as f2:
dataset1 = set(f1)
dataset2 = set(f2)
for i, item in enumerate(sorted(dataset2 - dataset1)):
print(str(item).strip())
os.unlink("host_file1")
The above code just compares the two files one is processed through pandas ie host_file1 and another is already existing host_file2.