Total noob here and I have, what I think is, a simple enough problem to solve that has taken me out completely.
I have a tab delimited data set:
NS500418:110:H2VY7BGXX:4:21601:20699:7042 chrV 8256382 True CATCTAAATTTTGTTAGGATG chrV 8256540 True GAATAATAGAAGAGGTACAGA CATCTAAATTTTGTTAGGATGTTCTTCCTCGCCTTTTCTTTCTTAATTTAAGACGTCAAAAAGCAGCATATGACAGGGATTCTGGTATTCCAATGAGATCATTTTACCAATGACGAAAAAATACGTGAGGTGTTGCAAAATGACACAAAA GAATAATAGAAGAGGTACAGAAAACGTTTGTGACGTGAAAAATGCTAAAAGCTCAAGCAATGGGTGGTCTTCTAGAACTCTGAAGAAACTGTGTTTTGTTTTCATGATCTCGGGATGCTTCAAAAACTGAAATGGGTGTCAAAGCAGGCC CATCTAAATTTTGTTAGGATGTTCTTCCTCGCCTTTT GAATAATAGAAGAGGTACAGAAAACGTTTGTGACGTGA chrV 8256416 chrV 8256566 M03109:43:000000000-ACGWU:1:1102:11826:4015 chrIII 7513608 False TCGTTTTTTGTTCTCTAACAC chrX 15229802 False TTTTAAGTACTACCTAAGAACC TTCGCATGGATGTTTGATCCGAGAATTGGAGCTATTCTTATGCCAGTTAGTTTTTTTTCGTTTTTTGTTCTCTAACAC ATTTTGTGAAGCAATTTGGCCTTTTTTTAGTTGATCTAATTATGCGTAAACACAATTTTTAAGTACTACCTAAGAACC GTGTTAGAGAACAAAAAACGAAAAAAAACTAACTGGCATAAGAATAGCTCCAATTCTCGGATCGATCTAATTATGCGT GGTTCTTAGGTAGTACTTAAAAATTGTGTTTACGCATAATTAGATCGATCCGAGAATTGGAGCTATTCTTATGCCAGT chrIII 7513540 chrX 15229776 NS500418:110:H2VY7BGXX:4:11407:17860:12911 chrX 4775576 True GGATAGTTTTAATTTTCTTGG chrX 16142498 True GAGTACTGCCGCGCGATCGAT GGATAGTTTTAATTTTCTTGGATATTTTTAAATTCCGCTTAAAAACAACATTGTTAAGTCCGTTTTCACAGTTTGGAACTTTCTGTAAAATTGAGACTGGGAAAACTTAATGAAATAAAAGAATAGGTGCTCTTTACAAATTAAAAACAA GAGTACTGCCGCGCGATCAATGATCTCCTTTTTGTTGGAGAAAAGATTGGAGATGACGTCTAGCGCAAGCTTTTGGCTTTCCGATTCAAGTTCTTGATCTGATAGTCTGGGAGCCTTGATTGGAGCAGCTGGGACTTTTGCAGGTTGGGA GGATAGTTTTAATTTTCTTGGATATTTTTAAATTCCG GAGTACTGCCGCGCGATCGATCTTAGAAATTAGTTAAA chrX 4775610 chrX 16142526 NS500418:110:H2VY7BGXX:4:13612:12507:3869 chrX 11052325 False GGTCCAGCAAAACGCAGTAAAC chrI 14497739 True GTGGTGGAGGAGGAACGAATG TACTTAACCTTTGCTCCGCGGCAAAACATGATCATTTGTTCAAATAGACAATTTCGTTTTTTCTTTGACGATCAGAGTCAATGAAGTTATCTAAGGCAATCACAAAACATTTTTGAAAAGCAGCAACAGGTCCAGCAAAACGCAGTAAAC GTGGTGGAGGAGGAACGAATGGTTGTGGTCCGGCGAGTGGGGCCACTTGTGGCACAAAAGCTTGATGTCGGAGCAGATTTGGGGCGATCCCGTCTCGATGCTCGCCCACTCGGCAAAGGCGTTGATTCGGCTGGAACAACAAGCGTCTTC GTTTACTGCGTTTTGCTGGACCTGTTGCAGCTTTTCAA GTGGTGGAGGAGGAACGAATGGTTGTGGTCCGGCGAGT chrX 11052290 chrI 14497765 NS500418:110:H2VY7BGXX:3:11604:7974:16095 chrX 7483102 False CTAGTTCAATGAGGTATGTCAT chrX 5875247 False AAAAAACTGATGGTCTTATAT CTTGGCTCAAATAAAACTGAAATCGAAAATAAAGTTTTGCATGTAAATACATTTTCAGAGTGCCTACGACTATTACCATCGAGATCGACGCGAATATAGTGTACCCTGCTTTCCTCGTTCTCGCCAACCTAGTTCAATGAGGTATGTCAT TCACAGCCACCGGATATTCTGAGATGCTTCTTTTTTTGTTGTTGTCGTTAGATGTACAGTGCCATTCCGCATATCATTGATGTTAGGATCATCTAGCATCTACCAGAATTTTTCCTTTCTCTGAATTCTAAAAAACTGATGGTCTTATAT ATGACATACCTCATTGAACTAGGTTGGCGAGAACGAGG ATATAAGACCATCAGTTTTTTAGAATTCAGAGAAAGGA chrX 7483067 chrX 5875222 NS500418:110:H2VY7BGXX:1:12207:12144:18475 chrI 11267978 True TTTTTAGGCAGTATTCTGTGAA chrI 7633132 True GTTTTTAAGGTTTTCATCGAT TTTTTAGGCAGTATTCTGTGAACTTTCCTGCATAGTTTCCACTATGATCACCATTTTTCTAGCTCTCCTGGTTCTCACTACAAGTCCTGGACAAGTCGAGGTAAGGCTGTTTAGCCTAACCGGCCCAATGGGCCCTGCTAGGCCTCACAG GTTTTTAAGGTTTTCATCGATTTTAATTAAATTTTTATTCCAGGATGCACCAGGAAGTGAATTCAATATGCAACAGATGACATCAATGCACGACGATTCGACAACATTCACGAATCCAGTGTATGAATTAGAAGATGTTGATATGTCATC TTTTTAGGCAGTATTCTGTGAACTTTCCTGCATAGTTT GTTTTTAAGGTTTTCATCGATTTTAATTAAATTTTTAT chrI 11268013 chrI 7633159 NS500418:152:H25C7AFXX:3:11408:4830:8603 chrIV 2481023 False TGAATCATATCAGGGCAGCTG chrIV 2542156 False CGTTGCTTGCAGTGTTCCCTT GAATTTAAATTTCCTAGTGAAAAATGACAAAAAATTATGTTTTTGTAAAAAATATCTCGAAAAAATGTTTTTTTTTTCTTTTTTTCACCTAAAATTTTTTTGTTTCAGAATTTTGTGGGTGTTGATCTATGAATCATATCAGGTCAGCTG TGAAAAAAAAAATTTGCCAAAAAAGATCAAAGAGGCGCCGCCGACAGAGAAGTGCACATGAATTATATTCAGCTGGAAATTGGAAACTGAGAGAAATCTGAATAAAACATAATTTTTTTCTCTTATTTCCGTTGCTTGCAGTGTTCCCTT CAGCTGCCCTGATATGATTCATAGAGATCAAAGAGGCGCCGCCGACAGAGAAGTGCACATGAATTATATTCAGCTGGAAATTGGAAACTGAGAGAAATCTGAATACAACATAATTTTTTTCTCTTATTTCCGTTGCTTGCAGTGTTCCCTT chrIV 2480995 chrIV 2542026
which I pass through:
gc GSM2041038_n2_adults_dpn.TSV |
sls -Pattern '(chrIV.*chrIV.*chrIV.*chrIV)' |
Export-Csv OnlyChrIV.tsv -Delimiter "`t"
And get (what I assume is) a tab delimited file with headers and these results:
#TYPE Selected.System.Management.Automation.PSCustomObject "IgnoreCase" "LineNumber" "Line" "Filename" "Path" "Pattern" "Context" "Matches" "True" "32" "NS500418:152:H25C7AFXX:3:11408:4830:8603 chrIV 2481023 False TGAATCATATCAGGGCAGCTG chrIV 2542156 False CGTTGCTTGCAGTGTTCCCTT GAATTTAAATTTCCTAGTGAAAAATGACAAAAAATTATGTTTTTGTAAAAAATATCTCGAAAAAATGTTTTTTTTTTCTTTTTTTCACCTAAAATTTTTTTGTTTCAGAATTTTGTGGGTGTTGATCTATGAATCATATCAGGTCAGCTG TGAAAAAAAAAATTTGCCAAAAAAGATCAAAGAGGCGCCGCCGACAGAGAAGTGCACATGAATTATATTCAGCTGGAAATTGGAAACTGAGAGAAATCTGAATAAAACATAATTTTTTTCTCTTATTTCCGTTGCTTGCAGTGTTCCCTT CAGCTGCCCTGATATGATTCATAGAGATCAAAGAGGCGCCGCCGACAGAGAAGTGCACATGAATTATATTCAGCTGGAAATTGGAAACTGAGAGAAATCTGAATACAACATAATTTTTTTCTCTTATTTCCGTTGCTTGCAGTGTTCCCTT chrIV 2480995 chrIV 2542026" "InputStream" "InputStream" "(chrIV.*chrIV.*chrIV.*chrIV)" "" "System.Text.RegularExpressions.Match[]"
The data which I want is in the "Line" column. So I then pass this file through this:
Import-Csv OnlyChrIV.tsv -Delimiter "`t" |
select "line" |
Export-Csv OnlyChrIV_OnlyLine.tsv -Delimiter "`t"
And I will get this:
#TYPE Selected.System.Management.Automation.PSCustomObject "Line" "NS500418:152:H25C7AFXX:3:11408:4830:8603 chrIV 2481023 False TGAATCATATCAGGGCAGCTG chrIV 2542156 False CGTTGCTTGCAGTGTTCCCTT GAATTTAAATTTCCTAGTGAAAAATGACAAAAAATTATGTTTTTGTAAAAAATATCTCGAAAAAATGTTTTTTTTTTCTTTTTTTCACCTAAAATTTTTTTGTTTCAGAATTTTGTGGGTGTTGATCTATGAATCATATCAGGTCAGCTG TGAAAAAAAAAATTTGCCAAAAAAGATCAAAGAGGCGCCGCCGACAGAGAAGTGCACATGAATTATATTCAGCTGGAAATTGGAAACTGAGAGAAATCTGAATAAAACATAATTTTTTTCTCTTATTTCCGTTGCTTGCAGTGTTCCCTT CAGCTGCCCTGATATGATTCATAGAGATCAAAGAGGCGCCGCCGACAGAGAAGTGCACATGAATTATATTCAGCTGGAAATTGGAAACTGAGAGAAATCTGAATACAACATAATTTTTTTCTCTTATTTCCGTTGCTTGCAGTGTTCCCTT chrIV 2480995 chrIV 2542026"
My issue is that I now can't break the string back into its original columns because I need to add headers and process the data further from there.
I want (which is how the data was originally formatted):
"NS500418:152:H25C7AFXX:3:11408:4830:8603" "chrIV" "2481023" "False" "TGAATCATATCAGGGCAGCTG" "chrIV" "2542156"
Not:
"NS500418:152:H25C7AFXX:3:11408:4830:8603" "chrIV" "2481023" "False" "TGAATCATATCAGGGCAGCTG" "chrIV" "2542156"
I've tried split but this outputs a new line for every tab like the above example. I also don't know if the input and/or output are the methods I should be using here.
This also needs to be done for a series of lines. I used only one line as an example here for clarity.