I am scraping web with python and getting data to .csv file that looks like this. If I append to the file, I might have some repeated/duplicate data. To avoid that what can i use? I am not sure about pandas - If i should open the file in pandas and then drop duplicates. I tried other methods of my own, but was unable to come up with a solution. I was thinking of using pandas as the last option
Date,Time,Status,School,GPA,GRE,GMAT,Round,Location,Post-MBA Career,via,on,Details,Note
2021-05-18,13:59:00,Accepted from Waitlist,Yale SOM,3.8,No data provided,740,Round 2 ,NYC,Non Profit / Social Impact,phone,2021-05-18,GPA: 3.8 GMAT: 740 Round: Round 2 | NYC,Interviewed and was waitlisted in R2. Just received the call this afternoon. Good luck everyone!
2021-05-18,13:51:00,Accepted from Waitlist,Yale SOM,3.8,323,No data provided,Round 2 ,Austin,Marketing,phone,2021-05-18,GPA: 3.8 GRE: 323 Round: Round 2 | Austin,Keep your head up! It all works out how it is supposed to.
drop_duplicates
is probably your best option if you intend to later use pandas on the data. If you do not, and if the file can fit in memory, then using a set of lines should do the job.