I have a big file with entries as opened in python as:
fh_in=open('/xzy/abc', 'r')
parsed_in=csv.reader(fh_in, delimiter=',')
for element in parsed_in:
print(element)
RESULT:
['ABC', 'chr9', '3468582', 'NAME1', 'UGA', 'GGU']
['DEF', 'chr9', '14855289', NAME19', 'UCG', 'GUC']
['TTC', 'chr9', '793946', 'NAME178', 'CAG', 'GUC']
['ABC', 'chr9', '3468582', 'NAME272', 'UGT', 'GCU']
I have to extract only the unique entries and to remove entries with same values in col1, col2 and col3. Like in this case last line is same as line 1 on the basis of col1, col2 and col3.
I have tried two methods but failed:
Method 1:
outlist=[]
for element in parsed_in:
if element[0:3] not in outlist[0:3]:
outlist.append(element)
Method 2:
outlist=[]
parsed_list=list(parsed_in)
for element in range(0,len(parsed_list)):
if parsed_list[element] not in parsed_list[element+1:]:
outlist.append(parsed_list[element])
These both gives back all the entries and not unique entries on basis of first 3 columns.
Please suggest me a way to do so
AK