I have an excel that extracts information from a platform called SAP, when they extract the information it comes with a format of dd/mm/yyyy, but there are times that the date is extracted as dd.mm.yyyy, the thing is that when I convert that specific column to a DataFrame using python's library Pandas, the format just goes crazy. This is the code I've been trying:
import pandas as pd
import re
from datetime import datetime
# Convertir xlsx a csv
excel_data = pd.read_excel("Reportes/Crudos/Reporte SAP.xlsx", header=1)
excel_data['Asiento contable (Fecha de contabilización)'].to_csv("Reportes/Crudos/1.csv", index=False)
excel_data['Asiento contable (Fecha de contabilización)'].to_excel("Reportes/Crudos/1.xlsx", index=False)
# imprime solamente valores unicos sin repeticion
print(excel_data['Asiento contable (Fecha de contabilización)'].unique())
The print gives me this:
[datetime.datetime(2025, 8, 1, 0, 0) datetime.datetime(2025, 9, 1, 0, 0)
'13/01/2025' '17/01/2025' datetime.datetime(2025, 3, 1, 0, 0)
'18/01/2025' '20/01/2025' datetime.datetime(2025, 10, 1, 0, 0)
'14/01/2025' datetime.datetime(2025, 2, 1, 0, 0)
datetime.datetime(2025, 6, 1, 0, 0) datetime.datetime(2025, 7, 1, 0, 0)
datetime.datetime(2025, 11, 1, 0, 0) datetime.datetime(2025, 4, 1, 0, 0)
'15/01/2025' '16/01/2025' datetime.datetime(2025, 12, 1, 0, 0)
datetime.datetime(2025, 5, 1, 0, 0) '19/01/2025'
datetime.datetime(2025, 1, 30, 0, 0) datetime.datetime(2025, 1, 31, 0, 0)
datetime.datetime(2025, 1, 28, 0, 0) datetime.datetime(2025, 1, 23, 0, 0)
datetime.datetime(2025, 1, 27, 0, 0) datetime.datetime(2025, 1, 29, 0, 0)
datetime.datetime(2025, 1, 22, 0, 0) datetime.datetime(2025, 1, 24, 0, 0)
datetime.datetime(2025, 1, 25, 0, 0) datetime.datetime(2025, 1, 21, 0, 0)
datetime.datetime(2025, 1, 26, 0, 0)]
Going in the generated csv and xlsx it gives me this:
csv:
1. 01/08/2025 00:00
2. 01/09/2025 00:00
3. 13/01/2025
4. 17/01/2025
5. 01/03/2025 00:00
6. 18/01/2025
7. 20/01/2025
8. 01/10/2025 00:00
9. 14/01/2025
10. 01/02/2025 00:00
11. 01/06/2025 00:00
12. 01/07/2025 00:00
13. 01/11/2025 00:00
14. 01/04/2025 00:00
15. 15/01/2025
16. 16/01/2025
17. 01/12/2025 00:00
18. 01/05/2025 00:00
19. 19/01/2025
20. 30/01/2025 00:00
21. 31/01/2025 00:00
22. 28/01/2025 00:00
23. 23/01/2025 00:00
24. 27/01/2025 00:00
25. 29/01/2025 00:00
26. 22/01/2025 00:00
27. 24/01/2025 00:00
28. 25/01/2025 00:00
29. 21/01/2025 00:00
30. 26/01/2025 00:00
xlsx:
1. 2025-08-01 00:00:00
2. 2025-09-01 00:00:00
3. 13/01/2025
4. 17/01/2025
5. 2025-03-01 00:00:00
6. 18/01/2025
7. 20/01/2025
8. 2025-10-01 00:00:00
9. 14/01/2025
10. 2025-02-01 00:00:00
11. 2025-06-01 00:00:00
12. 2025-07-01 00:00:00
13. 2025-11-01 00:00:00
14. 2025-04-01 00:00:00
15. 15/01/2025
16. 16/01/2025
17. 2025-12-01 00:00:00
18. 2025-05-01 00:00:00
19. 19/01/2025
20. 2025-01-30 00:00:00
21. 2025-01-31 00:00:00
22. 2025-01-28 00:00:00
23. 2025-01-23 00:00:00
24. 2025-01-27 00:00:00
25. 2025-01-29 00:00:00
26. 2025-01-22 00:00:00
27. 2025-01-24 00:00:00
28. 2025-01-25 00:00:00
29. 2025-01-21 00:00:00
30. 2025-01-26 00:00:00
Which means we have 3 types of format:
- dd/mm/yyyy hh:mm (csv) | dd-mm-yyyy hh:mm (excel)
- dd/mm/yyyy
- mm/dd/yyyy hh:mm (csv) | mm-dd-yyyy hh:mm (excel)
If we make an analysis, we get that from January 1 to 12 they come out with this format: cell: mm/dd/yyyy hh:mm formula bar: mm/dd/yyyy hh:mm:ss a. m.
from 13-20 come with this format: cell: dd/mm/yyyy formula bar: dd/mm/yyyy
and from 21-31 they come with this format: cell: dd/mm/yyyy hh:mm Formula bar: dd/mm/yyyy hh:mm:ss a. m.
I've tried making a:
df["Asiento contable (Fecha de contabilización)"] = pd.to_datetime(df["Asiento contable (Fecha de contabilización)"], dayfirst=True)
But it doesnt recognize the dates well and I end with no data between the 1st and the 12th of January
I want to know if there is a way to unify this 3 types of format into just one: dd/mm/yyyy
pd.read_excelassumes likepd.to_datetimea format of MM/DD/YYYY. This is because the conversion todatetimeobject always failed, if first 2 characters converted to decimal were above 12. You could try optiondayfirst=Trueofpd.to_datetime.