-1

I am following this tutorial for language detection using machine learning. In the dataset I am using, however, there are multiple variables as features. I tried, in the place of X = data["Text"], X = df["message", "fingers", "tail"],(message, fingers, and tail are the three feature variables I am using) but it throws a KeyError;

Traceback (most recent call last):
  File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\\_libs\\hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\\_libs\\hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('message', 'fingers', 'tail')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\usr\Downloads\thecode.py", line 13, in <module>
    X = df["message", "fingers", "tail"]
        ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
    raise KeyError(key) from err
KeyError: ('message', 'fingers', 'tail')

How should I implement code so as to use all features without throwing errors?

5
  • 1
    Hi Harry, are you trying to return all three columns from your data frame? In that case you need to pass your keys to the dataframe as a list: X=df[["message", "fingers", "tail"]]
    – A10
    Commented Sep 23, 2024 at 22:30
  • @A10: Thanks a lot, that fixes the issue, but then I'm running into a ValueError due to the line x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.20). It says it found input variables with inconsistent numbers of samples: [3,500]. Could you help with that too?
    – harry
    Commented Sep 23, 2024 at 22:39
  • 1
    That error typically comes up when X and y are not the same length, from the error message one of your variables has 3 rows, and the other has 500.
    – A10
    Commented Sep 23, 2024 at 22:42
  • @A10: But each of the feature variables have columns with 500 entries, why is that a problem?
    – harry
    Commented Sep 23, 2024 at 22:59
  • 1
    Sorry, I probably shouldn't have used the word variable. What that error message is telling you is that X, the entire data frame, and y, a vector of labels, don't have the same length. For example, it looks like your data frame X only has 3 rows and you have 500 labels in y. I'd check your code to make sure you haven't accidentally sliced your data frame with something like df=df.head() or a df = df.dropna() before defining X (and that your data frame is complete)
    – A10
    Commented Sep 23, 2024 at 23:06

1 Answer 1

0

The issue can be solved by replacing the code with X = np.asarray(df[["message", "fingers", "tail"]]).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.