0

I am trying to plot my data using scatter plot however i can't seem to get the 3 plots into 1 image How do i solve this:

nu_cluster = 3
kmeans  = KMeans(n_clusters=nu_cluster,random_state=0)
data_df["cluster"] = kmeans.fit_predict(X_std)
print("after Kmeans predict")# visualization
plt.figure(figsize=(8, 6))
for i in range(nu_cluster):
    cluster_data = data_df[data_df["cluster"] == i]#return a boolean and then passed to data_df
    plt.scatter(cluster_data["charges"], cluster_data["age"],c=[plt.cm.viridis(i / (nu_cluster - 1))] ,label=f"Cluster {i + 1}")
    plt.xlabel("Charges")
    plt.ylabel("Age")
    plt.title("Cluster of age against charges", fontsize=16, fontweight="bold")
    plt.legend(loc="lower right")
    plt.show()

I tried using the figure() function provided by matplotlib but with no success. plt.figure(figsize=(8,7))

1
  • Also in which cases are scatter plots used, am trying to understand the value in visualization as compared to bar or line graphs which are easy to understand and analyze
    – alson
    Commented Mar 7, 2024 at 10:49

3 Answers 3

1

Move the plt.show() outside of the loop. Once you call plt.show(), the figure is "discarded" and the subsequent plot commands create a new figure automatically.

Better yet, use the object-based/explicit interface of Matplotlib:

nu_cluster = 3
kmeans  = KMeans(n_clusters=nu_cluster,random_state=0)
data_df["cluster"] = kmeans.fit_predict(X_std)
print("after Kmeans predict")# visualization
fig, ax = plt.subplots(figsize=(8, 6))
for i in range(nu_cluster):
    cluster_data = data_df[data_df["cluster"] == i]#return a boolean and then passed to data_df
    ax.scatter(cluster_data["charges"], cluster_data["age"],c=[plt.cm.viridis(i / (nu_cluster - 1))] ,label=f"Cluster {i + 1}")
    ax.set_xlabel("Charges")
    ax.set_ylabel("Age")
    ax.set_title("Cluster of age against charges", fontsize=16, fontweight="bold")
    ax.legend(loc="lower right")
fig.show()
2
  • Thank you for help, i solved the issue by removing the plt.show() from the for iteration.
    – alson
    Commented Mar 7, 2024 at 10:44
  • @alson If your problem is resolved, please also mark the question as answered. Commented Mar 7, 2024 at 10:47
0

You want to use Pyplot's subplots() function to create multiple axis (or plots) in the same figure, subplots will return two objects, the figure and a list of axes.

Basically:

import matplotlib.pyplot as plt

fig, axs = plt.subplots(3)
axs[0].plot([1,2,3])
axs[1].plot([1,2,3], [1,2,3])
axs[2].plot([1,2,3], [2,4,6])

fig.show()

enter image description here

1
  • Never used subplots am new to data science currently studying in my year 2.2 but will look into it. Thanks.
    – alson
    Commented Mar 7, 2024 at 10:46
0

For every iteration of your loop you are essentially overwriting the previous plot--think your plot isn't 'saved' anywhere.

All you need in the loop is:

plt.figure(figsize=(8, 6))
for i in range(nu_cluster):
    cluster_data = data_df[data_df["cluster"] == i]
    plt.scatter(cluster_data["charges"], cluster_data["age"], c=[plt.cm.viridis(i / (nu_cluster - 1))], label=f"Cluster {i + 1}")

You can remove the code below outside of the loop (it's not efficient to keep naming your graph again and again, try to only keep iterable code within the loop).

To answer your comment, scatter plots are good for visualising two continuous variables. You can add a line after if it seems that there is a trend, but if there isn't a polynomial/linear relationship between your variables it is redundant. Think about how useful it would be to add a line plot to a cluster graph?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.