I have two data series of model prediction and observations. I am able to make line plots of these series. I would like to add a linear regression fit of the two data series. i would also like to add metrics as annotations to the plot. i need assistance how to configure the fit formula, currently it is not being accepted (error) and the MAE (-18.1903) and r squared (-6.4282) are incorrect.
I have not been able to solve my problem from similar posts, assistance please.
Date,HumFc,HumOb
20260201, 74.5, 78.2
20260201, 74.5, 78.2
20260202, 71.4, 93.9
20260203, 60.1, 80.2
20260204, 67.9, 91.4
20260205, 71.4, 89.5
20260206, 62.9, 97.7
20260207, 64.5, 89.7
20260208, 76.1, 88.2
20260209, 75.8, 83.7
20260210, 73.8, 90.2
20260211, 65.4, 89.9
20260212, 50.4, 80.7
20260213, 60.8, 75.6
20260214, 65.0, 93.9
20260215, 64.3, 85.3
20260216, 69.1, 86.2
20260217, 74.0, 95.0
20260218, 81.8, 87.7
20260219, 71.6, 89.9
20260220, 65.9, 86.5
20260221, 52.9, 90.9
20260222, 86.2, 87.8
20260223, 75.4, 68.9
20260224, 80.2, 87.6
20260225, 70.4, 90.6
20260226, 70.4, 87.2
20260227, 74.8, 86.1
20260228, 65.2, 84.9
20260301, 65.6, 94.5
20260302, 71.9, 88.1
20260303, 68.4, 92.0
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
df = pd.read_csv('/Hum_h5d_06.csv', skipinitialspace=True)
print(df.columns)
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%Y%m%d')
df['Date'] = pd.to_datetime(df['Date'], format="%b %d")
#quit()
#1. Create and fit the regression model:
model = LinearRegression()
#2. fit regression model
X = np.array(df['HumOb']).reshape((-1, 1))
Y = df['HumFc']
model.fit(X, Y)
#3. calculate metrics
RMSE = np.sqrt(((df['HumFc'] - df['HumOb'])**2).mean())
print(f"RMSE: {RMSE:.4f}")
MAE = mean_absolute_error(df['HumFc'], df['HumOb'])
print(f"MAE: {MAE:.4f}")
RSQ = r2_score(df['HumFc'], df['HumOb'])
print(f"RSQ: {RSQ:.4f}")
quit()
#4. Create plot and get the axes object
fig, ax = plt.subplots()
ax.plot(df['Date'], df['HumOb'])
date_format = mdates.DateFormatter('%b %d')
#5. Apply the formatter to the x-axis
ax.xaxis.set_major_formatter(date_format)
# Optional: automatically format and rotate the date labels for better visibility
fig.autofmt_xdate()
#6. make the plot
plt.plot(df['Date'], df['HumOb'], color='b', label='HumOb')
plt.plot(df['Date'], df['HumFc'], color='r', label='HumFc')
plt.title('5-Day Humidity Forecast and AWS Recorded Humidity')
plt.xlabel('Date_of_Forecast')
plt.ylabel('Humidity')
plt.legend(loc = "lower right", bbox_to_anchor=(1.01, 0.04), fontsize=10)
plt.grid(True)
#7. Annotation text
metrics_text = (
f"RMSE: {RMSE:.3f}\n"
f"MAE: {MAE:.3f}\n"
f"RSQ: {RSQ:.3f}"
)
#8. Add the metrics as an annotation on the plot
plt.annotate(
metrics_text,
xy=(0.025, 0.065), # Position (x, y) relative to plot axes (0,0 is bottom left, 1,1 is top right)
xycoords='axes fraction',
fontsize=8,
bbox=dict(boxstyle="round,pad=0.5", fc="white", alpha=0.5)
)
plt.show()
