I am following this tutorial for language detection using machine learning. In the dataset I am using, however, there are multiple variables as features. I tried, in the place of X = data["Text"]
, X = df["message", "fingers", "tail"]
,(message, fingers, and tail are the three feature variables I am using) but it throws a KeyError;
Traceback (most recent call last):
File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas\\_libs\\hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\\_libs\\hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('message', 'fingers', 'tail')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\usr\Downloads\thecode.py", line 13, in <module>
X = df["message", "fingers", "tail"]
~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: ('message', 'fingers', 'tail')
How should I implement code so as to use all features without throwing errors?
X=df[["message", "fingers", "tail"]]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
. It says it found input variables with inconsistent numbers of samples: [3,500]. Could you help with that too?X
andy
are not the same length, from the error message one of your variables has 3 rows, and the other has 500.X
, the entire data frame, andy
, a vector of labels, don't have the same length. For example, it looks like your data frameX
only has 3 rows and you have 500 labels iny
. I'd check your code to make sure you haven't accidentally sliced your data frame with something likedf=df.head()
or adf = df.dropna()
before definingX
(and that your data frame is complete)