0

I tried to run:

import numpy as np
import pandas as pd
import tensorflow as tf
import numpy as np

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Embedding, GlobalAveragePooling1D
from tensorflow.keras.layers import TextVectorization
from sklearn.model_selection import train_test_split
from tensorflow import keras 

from nltk.tokenize.treebank import TreebankWordTokenizer, TreebankWordDetokenizer
from sklearn.feature_extraction.text import CountVectorizer

dataf=pd.read_csv('D:/datafile.csv')
data=pd.read_csv("D:/dataset1c2f4b7/dataset/train.csv",encoding='latin-1')
l=[]
for a in dataf['text']:
    l.append(a)
m=[]
for a in dataf['target']:
    m.append(a)

X_train, X_test, y_train, y_test = train_test_split(l, m, test_size=0.2, random_state=42)

vectorizer = CountVectorizer()
vectorizer.fit(X_train)
X_train = vectorizer.transform(X_train)
X_test = vectorizer.transform(X_test)
X_train=np.array(X_train)
X_test=np.array(X_test)
y_train=np.array(y_train)
y_test=np.array(y_test)
print(X_train)
model = keras.models.Sequential() 
model.add(keras.layers.Embedding(10000, 128)) 
model.add(keras.layers.SimpleRNN(64, return_sequences=True)) 
model.add(keras.layers.SimpleRNN(64)) 
model.add(keras.layers.Dense(128, activation="relu")) 
model.add(keras.layers.Dropout(0.4)) 
model.add(keras.layers.Dense(1, activation="sigmoid")) 
model.summary() 




model.compile("rmsprop", 
              "binary_crossentropy", 
              metrics=["accuracy"])
model.fit(X_train, y_train,epochs=5,verbose=False,validation_data=(X_test, y_test),batch_size=10)
model.save('gfgModel.h5')  
tf.saved_model.save(model, 'one_step 05')

This shows

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type csr_matrix)

I am trying to create a text classifier.

I was just expecting the model to be trained as everything is in array form.

4
  • Are you using Numpy 2.1.1?
    – user4136999
    Commented Oct 5, 2024 at 17:27
  • Where is the error? What's the problem array? What is creating scipy sparse csr_matices? Is any of this your code or is it all borrowed from some tutorial?
    – hpaulj
    Commented Oct 5, 2024 at 18:28
  • It is my code and not borrowed from tutorials. Commented Oct 5, 2024 at 19:27
  • Then you know the nature of the arrays such as X_train. Remember, you can only make a tensor from a numeric dtype array.
    – hpaulj
    Commented Oct 5, 2024 at 20:07

1 Answer 1

0

I probably should wait till you reply to my questions, but ... Here are a couple of observations

This loop, while it works:

l=[]
for a in dataf['text']:
    l.append(a)

is slower than necessary.

dataf['text'] is a pandas Series. That might be usable directly. Otherwise try dataf['text'].to_list() or even dataf['text'].to_numpy(). Check the docs to verify the method names.

According to the docs a CountVectorizer() produces a sparse csr_matrix. If so np.array(X_train) will produce an array of single row csr_matrices. That isn't the correct way to convert a sparse matrix into a dense array.

anArray = sparse_matrix.toarray()

is the correct way

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.toarray.html#scipy.sparse.csr_matrix.toarray

My guess is the error occurs deep in the keras model calls, at the point where it tries to convert the input arrays into tensors. That should be evident if you look at the full context of the error message. When we ask for full error (with traceback) we want YOU to look at it as well, trying to deduce what is happening. Tensor conversion normally requires a numeric dtype array. If my above guess is right your X_train etc are object dtype arrays.

Check the shape and dtype of arrays in all cases like this.

I see you have a print(X_train). Didn't you see anything unusual in that print? That's another thing - I like to see the results of prints like this.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.