0
$\begingroup$

i am trying to classify a sequence of 10 numbers with keras and tensorflow. a common neural network doesn't seem to be an option. here is my data:

X_train.shape

(8313, 10)

X_train

array([[13, 17,  6, ..., 14, 14, 13],
       [13, 13, 13, ...,  3, 14, 14],
       [17, 14, 14, ..., 17, 13, 17],
       ...,
       [ 6, 14, 13, ..., 13, 14, 14],
       [ 0,  5,  9, ..., 12,  5,  7],
       [13, 17, 14, ..., 13, 13, 13]])

as you can see X has about 8k rows and 10 columns. the numbers inside the array have no real value. so 14 is not 2 times 7. they are categorical and stand for the sequence a user interacted with the system. so the first column is step 1, second columns is step 2, and so on.

the order is important. i want the model to differ these sequences.

y is binary 0 or 1.

which model is appropriate for this scenario?

$\endgroup$

1 Answer 1

2
$\begingroup$
  • As the numbers don't represent real value (e.g. you cannot say that 1+2=3 for the features), you need to encode them as 1-hot vectors. (E.g. number 3 is encoded as [0,0,0,1,0,...0]).
  • Then, your observations become a sequence of 10 vectors. (Each vector have the same dimensionality, which is the maximum value in your data). You can either tread them as non-sequential data, e.i. directly feed them to any standart classifier (logistic regression, fully connected network with logistic regression at output, SVM, kNN etc.)
  • You can also try models specialized for sequential data: LSTM or GRU.
  • I would start with simpler methods (logistic regression) and consider LSTM or GRU only if it doesn't work well.
$\endgroup$
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.