1
$\begingroup$

I am new to data science. I am attempting to write a program using regression techniques, and all of my values are numerical, except for the date and time (UTC), which are written in this format: HH:MM:SS MM/DD/YY. The date and time are a part of a CSV file and I do not know how to alter the column. I have looked around for how to convert this to a numerical value, but all the results put the date before the time. Other than that, I am having a hard time finding people that changed more than a single date. If anyone could guide me on how to make the time and date readable (using LinearRegression().fit() from the sklearn.linear_model library) I would greatly appreciate it.

P.S. Do I even have to convert it to a number? Can I keep it as the date and time or do I need to convert it?

EDIT:

algaeData = pd.read_csv(r'my_file').drop(columns=['Type', 'Device Type', 'Device S/N', 'Mooring', 'MRPT & NOTES'])
algaeData['Date (UTC)'] = pd.to_datetime(algaeData['Date (UTC)'], format='%H:%M:%S %m/%d/%y')

x = algaeData.drop(columns=['BGA (ug/L) (ug/L)'])
y = algaeData['BGA (ug/L) (ug/L)']
x, y = np.array(x), np.array(y)

model = LinearRegression().fit(x, y)
$\endgroup$

1 Answer 1

1
$\begingroup$

If you're using pandas you can convert your column pretty easily using

df['col'] =  pd.to_datetime(df['col'], format='%H:%M:%S %m/%d/%Y')

That will read your dates as a datetime64[ns] object. Which sklearn will be able to parse when you fit your LinearRegression model using that predictor.

Though I fail to understand what you're trying to do when you say

Other than that, I am having a hard time finding people that changed more than a single date.

$\endgroup$
6
  • $\begingroup$ I believe what you suggested works, but it returned "time data '00:02:37 01/01/18' does not match format '%H:%M:%S %m/%d/%Y'. Any idea why? $\endgroup$ Commented Nov 30, 2021 at 13:16
  • $\begingroup$ %Y is for Year with century as a decimal number. Replace it with %y for year without century padded. For a full format guide check this link docs.python.org/3/library/… $\endgroup$ Commented Nov 30, 2021 at 14:26
  • $\begingroup$ Thanks. I’ll work on that later and update you on it. $\endgroup$ Commented Nov 30, 2021 at 14:50
  • $\begingroup$ I did what you suggested and it seemed to have removed the previous error. However, I now get the error: "float() argument must be a string or a number, not 'Timestamp'". I have updated the question to show my code, so maybe you could see something wrong with it. All of the necessary libraries have been imported already. $\endgroup$ Commented Nov 30, 2021 at 22:28
  • $\begingroup$ I looked into it and surrounded the pd.to_datetime() function with a pd.to_numeric() function. That cleared that issue. However, I now have the issue that it cannot convert ' ' to string because there is nothing there in the data value. $\endgroup$ Commented Nov 30, 2021 at 22:34

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.