I'm trying to plot actual vs predicted values using matplotlib.
Here is my code:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
path = '/content/drive/MyDrive/ML_DATASETS/energy.csv'
data = pd.read_csv(path)
data['timestamp'] = pd.to_datetime(data['timestamp'])
data['time_num'] = range(len(data))
X = data[['time_num', 'temp']]
y = data['load']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
plt.figure(figsize=(15, 5))
plt.scatter(data['timestamp'].iloc[X_test], y_test, s=5,label="Actual")
plt.scatter(data['timestamp'].iloc[X_test], y_pred, s=5, color='red', label="Predicted")
plt.xlabel("Datetime")
plt.ylabel("Load")
plt.title("Energy Load:Actual vs Predicted")
plt.legend()
plt.tight_layout()
plt.show()
Error:
IndexError Traceback (most recent call last)
/tmp/ipykernel_8426/3212058888.py in <cell line: 0>()
1 plt.figure(figsize=(15, 5))
----> 2 plt.scatter(data['timestamp'].iloc[X_test], y_test, s=5, alpha=0.5, label="Actual")
3 plt.scatter(data['timestamp'].iloc[X_test], y_pred, s=5, alpha=0.5, color='red', label="Predicted")
4 plt.xlabel("Datetime")
5 plt.ylabel("Load (MW)")
1 frames
/usr/local/lib/python3.12/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1721 key = slice(None)
1722 elif isinstance(key, ABCDataFrame):
-> 1723 raise IndexError(
1724 "DataFrame indexer is not allowed for .iloc\n"
1725 "Consider using .loc for automatic alignment."
IndexError: DataFrame indexer is not allowed for .iloc
Consider using .loc for automatic alignment.
<Figure size 1500x500 with 0 Axes>
I suspect the issue is with X_test indexing, but I'm not sure how to fix it.
I tried converting X_test to a list and using .loc instead of .iloc, but it didn't work.
Any help would be appreciated!

X_text?ilocmeans "integer locator" and it needs number of row, not names or other values.data.iloc[X_test]['timestamp']or maybe you need onlyX_test['timestamp']- it would be the simplest but it needs to keep'timestamp'inX = data[['time_num', 'temp', 'timestamp']]