3

I'm trying to plot actual vs predicted values using matplotlib.

Here is my code:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

path = '/content/drive/MyDrive/ML_DATASETS/energy.csv'
data = pd.read_csv(path)

data['timestamp'] = pd.to_datetime(data['timestamp'])

data['time_num'] = range(len(data))

X = data[['time_num', 'temp']]
y = data['load']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

plt.figure(figsize=(15, 5))
plt.scatter(data['timestamp'].iloc[X_test], y_test, s=5,label="Actual")
plt.scatter(data['timestamp'].iloc[X_test], y_pred, s=5, color='red', label="Predicted")
plt.xlabel("Datetime")
plt.ylabel("Load")
plt.title("Energy Load:Actual vs Predicted")
plt.legend()
plt.tight_layout()
plt.show()
Error:
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_8426/3212058888.py in <cell line: 0>()
      1 plt.figure(figsize=(15, 5))
----> 2 plt.scatter(data['timestamp'].iloc[X_test], y_test, s=5, alpha=0.5, label="Actual")
      3 plt.scatter(data['timestamp'].iloc[X_test], y_pred, s=5, alpha=0.5, color='red', label="Predicted")
      4 plt.xlabel("Datetime")
      5 plt.ylabel("Load (MW)")

1 frames
/usr/local/lib/python3.12/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1721             key = slice(None)
   1722         elif isinstance(key, ABCDataFrame):
-> 1723             raise IndexError(
   1724                 "DataFrame indexer is not allowed for .iloc\n"
   1725                 "Consider using .loc for automatic alignment."

IndexError: DataFrame indexer is not allowed for .iloc
Consider using .loc for automatic alignment.
<Figure size 1500x500 with 0 Axes>

I suspect the issue is with X_test indexing, but I'm not sure how to fix it.

I tried converting X_test to a list and using .loc instead of .iloc, but it didn't work.

Any help would be appreciated!

2
  • what do you have in X_text? iloc means "integer locator" and it needs number of row, not names or other values. Commented Apr 2 at 18:14
  • did you try to invert order data.iloc[X_test]['timestamp'] or maybe you need only X_test['timestamp'] - it would be the simplest but it needs to keep 'timestamp' in X = data[['time_num', 'temp', 'timestamp']] Commented Apr 2 at 18:17

1 Answer 1

2

You may need X_test.index

data['timestamp'].iloc[X_test.index]

but it can be safer to use .loc because index sometimes may not have numbers 0..len(data) but something else - e.g. 10..len(data)+10 (data.index = range(10, len(data) + 10)) and then iloc will search in wrong place.

data['timestamp'].loc[X_test.index]

Minimal working code with example data directly in code - so everyone can simply copy and run it.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# path = '/content/drive/MyDrive/ML_DATASETS/energy.csv'
# data = pd.read_csv(path)

import io

text = """timestamp,temp,load
2026.04.01,12,1
2026.04.02,9,0
2026.04.03,15,1
2026.04.04,4,0
2026.04.05,10,1
2026.04.06,10,1
2026.04.07,15,1
2026.04.08,7,0
2026.04.09,5,0
2026.04.10,11,1
"""
data = pd.read_csv(io.StringIO(text))


data["timestamp"] = pd.to_datetime(data["timestamp"])

data["time_num"] = range(10, len(data) + 10)
# data.index = range(10, len(data) + 10)  # to show that `iloc` is wrong idea
print(data)

X = data[["time_num", "temp"]]
y = data["load"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(X_test)

plt.figure(figsize=(15, 5))
plt.scatter(data["timestamp"].loc[X_test.index], y_test, s=5, label="Actual")
plt.scatter(data["timestamp"].loc[X_test.index], y_pred, s=5, color="red", label="Predicted")
plt.xlabel("Datetime")
plt.ylabel("Load")
plt.title("Energy Load:Actual vs Predicted")
plt.legend()
plt.tight_layout()
plt.show()

Result (for 10 values and test_size=0.4):

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.