1

I have a bunch of csv files read from a teensy adc onto an SD card and am trying to extract them to be able to do some basic stats over each row.

I have tried everything I can think of to try and fix this, but I cannot get my csv to be read correctly. The column names won't line up correctly. Heres the code I'm using:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats


###     Manual input of csv file and a short name for plot title
filename = "data.csv"

###     Read in data to a data frame with the correct formatting. index_col=0 was not working for all data files tested
data = pd.read_csv(filename,skiprows=1,header=1,index_col=None)

print(data.head())      #   To check that the columns are correctly lined up

For some reason I cannot get the code to read the header correctly and it keeps reading the header as one column longer than the data, resulting in an entire column of NaN's. This same thing happens when I do index_col=0 and index_col="SampleNumber" also.

I've tried several iterations of the read_csv line (changing the header=,index_col=, etc) but haven't been able to correct this. The only solution I have is to manually go through and delete the first column of all my CSV files, but that does not seem efficient. Ideally I should have the "SampleNumber" column become the index column (since not all data.csv files have consistent numbering for the SampleNumber), but if that doesn't work it is fine to remove them altogether.

How do I get the SampleNumber column to be read in correctly? I suspect this is mostly an issue with how my csv files are being created but I couldn't figure out a way to upload one of them for someone else to try.

What is currently being output:

   SampleNumber    C0    C1    C2    C3    C4    C5    C6    C7    C8    C9   C10   C11   C12   C13   C14  C15
0          3472  3030  2813  2695  2649  2636  2634  2632  2635  2635  2626  2624  2625  2623  2633  2597  NaN
1          2582  2581  2576  2561  2538  2511  2498  2490  2487  2484  2481  2481  2475  2475  2469  2475  NaN
2          2472  2474  2472  2474  2474  2474  2478  2474  2476  2484  2485  2490  2484  2485  2478  2486  NaN
3          2485  2483  2488  2488  2485  2486  2485  2484  2485  2483  2485  2483  2485  2483  2490  2473  NaN
4          2475  2472  2474  2477  2479  2482  2482  2482  2483  2487  2483  2482  2484  2483  2477  2483  NaN

What I want to be outputted:

              C0    C1    C2    C3    C4    C5    C6    C7    C8    C9   C10   C11   C12   C13   C14  C15
SampleNumber    
0             3472  3030  2813  2695  2649  2636  2634  2632  2635  2635  2626  2624  2625  2623  2633  2597  
1             2582  2581  2576  2561  2538  2511  2498  2490  2487  2484  2481  2481  2475  2475  2469  2475  
2             2472  2474  2472  2474  2474  2474  2478  2474  2476  2484  2485  2490  2484  2485  2478  2486  
3             2485  2483  2488  2488  2485  2486  2485  2484  2485  2483  2485  2483  2485  2483  2490  2473  
4             2475  2472  2474  2477  2479  2482  2482  2482  2483  2487  2483  2482  2484  2483  2477  2483

Raw CSV:

Start of new file:,,,,,,,,,,,,,,,,
MISCOUNT: 0,,,,,,,,,,,,,,,,
SampleNumber,C0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15
0,3472,3030,2813,2695,2649,2636,2634,2632,2635,2635,2626,2624,2625,2623,2633,2597
1,2582,2581,2576,2561,2538,2511,2498,2490,2487,2484,2481,2481,2475,2475,2469,2475
2,2472,2474,2472,2474,2474,2474,2478,2474,2476,2484,2485,2490,2484,2485,2478,2486
3,2485,2483,2488,2488,2485,2486,2485,2484,2485,2483,2485,2483,2485,2483,2490,2473
4,2475,2472,2474,2477,2479,2482,2482,2482,2483,2487,2483,2482,2484,2483,2477,2483
5,2481,2482,2482,2465,2455,2450,2442,2443,2441,2448,2444,2465,2470,2467,2440,2467
New contributor
N Mastick is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
7
  • 1. create a new CSV file by copying the top 5 rows of your existing CSV file with a text editing program like notepad.
    – Panda Kim
    Commented Apr 27 at 20:56
  • 2. check if the same issue occurs when importing a new CSV file.
    – Panda Kim
    Commented Apr 27 at 20:57
  • We need to see your input file as text, with some sample rows of data that reproduces the problem. Don't paste images as that is highly discouraged on StackOverflow. Also, because you are showing a spreadsheet, we really don't know what the input file looks like - spreadsheet automatically/automagically parse input text files to create rows and columns and that hides the original state of the input as a pure text file.
    – topsail
    Commented Apr 27 at 20:57
  • 3. If you are experiencing the same issue, copy the text from the new CSV file and provide it in the body of this post.
    – Panda Kim
    Commented Apr 27 at 20:59
  • It reads the data correctly if I copy it into a txt file and use delimiter='\t' or if I first open the file into vscode. I need to send this data file and code to a professor so I either need to fix the issue with code or send him a message about making sure he opens the file correctly. I'm hoping for the first one. I'll add a copy and pasted version of the csv file into the main post too.
    – N Mastick
    Commented Apr 27 at 21:08

2 Answers 2

0
# code for example
import io
import pandas as pd
txt = '''
Start of new file:,,,,,,,,,,,,,,,,
MISCOUNT: 0,,,,,,,,,,,,,,,,
SampleNumber,C0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15
0,3472,3030,2813,2695,2649,2636,2634,2632,2635,2635,2626,2624,2625,2623,2633,2597
1,2582,2581,2576,2561,2538,2511,2498,2490,2487,2484,2481,2481,2475,2475,2469,2475
2,2472,2474,2472,2474,2474,2474,2478,2474,2476,2484,2485,2490,2484,2485,2478,2486
'''

# answer
df = pd.read_csv(io.StringIO(txt), header=2) # use file_path instead io.StringIO(txt)

df

   SampleNumber    C0    C1    C2  ...   C12   C13   C14   C15
0             0  3472  3030  2813  ...  2625  2623  2633  2597
1             1  2582  2581  2576  ...  2475  2475  2469  2475
2             2  2472  2474  2472  ...  2484  2485  2478  2486

[3 rows x 17 columns]
0

welcome to StackOverflow. 👋

I tried what you tried and I actually got what you want (I use input.csv):

import pandas as pd

df = pd.read_csv("input.csv", skiprows=2, index_col=0) # index_col="SampleNumber" also works

print(df)
               C0     C1    C2    C3    C4    C5  ...   C10   C11   C12   C13   C14   C15
SampleNumber                                      ...
0             3472  3030  2813  2695  2649  2636  ...  2626  2624  2625  2623  2633  2597
1             2582  2581  2576  2561  2538  2511  ...  2481  2481  2475  2475  2469  2475
2             2472  2474  2472  2474  2474  2474  ...  2485  2490  2484  2485  2478  2486
3             2485  2483  2488  2488  2485  2486  ...  2485  2483  2485  2483  2490  2473
4             2475  2472  2474  2477  2479  2482  ...  2483  2482  2484  2483  2477  2483
5             2481  2482  2482  2465  2455  2450  ...  2444  2465  2470  2467  2440  2467

[6 rows x 16 columns]

I agree with your diagnosis of the problem, but cannot figure out how that would happen. I did a count of commas in the CSV you provided and it looked good (consistent), and ran the CSV through some other tools I use for CSV analysis and they didn't complain, and the outputs looked correct to my eyes. So... ???

I even tried different combinations of kwargs to read_csv() and still got what I believe you want:

...

s = repr(df) # string-ify the previous df I printed above

for header_arg in [
    {"skiprows": 2},
    {"header": 2},
]:
    for index_arg in [
        {"index_col": 0},
        {"index_col": "SampleNumber"},
    ]:
        kwargs = dict(header_arg)
        kwargs.update(index_arg)

        df = pd.read_csv("input.csv", **kwargs)

        assert repr(df) == s

        print(f"{kwargs}: good")
...

{'skiprows': 2, 'index_col': 0             }: good
{'skiprows': 2, 'index_col': 'SampleNumber'}: good
{'header':   2, 'index_col': 0             }: good
{'header':   2, 'index_col': 'SampleNumber'}: good

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.