Skip to main content

All Questions

0 votes
1 answer
45 views

How to preprocess date in Isolation Forest sklearn [closed]

I am using sklearn's IsolationForest model to detect anomalies on a time-series dataset. One of the features is date with the format MM-YYYY, the other features are numeric values. What is the best ...
Mar's user avatar
  • 19
-2 votes
1 answer
80 views

Dummy Variable as Boolean rather than Integer [closed]

I'm working on a machine learning project in Python. Using pandas pd.get_dummies I'm trying to create dummy variables for a categorical column in my data but the variables are being converted to ...
Victor Olusegun's user avatar
-1 votes
1 answer
79 views

How can I achieve accurate imputation of missing values in a dataset?

I'm working with a dataset containing details about used cars, and I've encountered several missing values in the Fuel_Type column. The possible values include 'Gasoline', 'E85 Flex Fuel', 'Hybrid', '...
user27500319's user avatar
-1 votes
1 answer
56 views

KeyError when using array as feature in language detection

I am following this tutorial for language detection using machine learning. In the dataset I am using, however, there are multiple variables as features. I tried, in the place of X = data["Text&...
harry's user avatar
  • 111
0 votes
1 answer
56 views

Separate a ingredients/feature into separate columns that is marked with "0" or "1"

I'm looking at a some food waste data where I have a fair bit of data including the Ingredients for what was in the food. I'm trying to do some ML on the data, and I'm having some trouble getting it ...
Patrick Kenney's user avatar
1 vote
1 answer
50 views

Applying log transformation to a column

I have encoded Gender column by OneHotEncoder. I want to apply log transformation to only Female[0] column but it is applying log to all the columns — why? My code: import pandas as p from sklearn....
Aaftab Tai's user avatar
0 votes
1 answer
76 views

How to make Isolation Forest detect anomaly at the peak of the difference, instead of the first value seen

I am using Isolation Forest to identify anomalies in a very large data frame. The data is noisy, so I have conducted many filtering operations to smooth out the noise so that the true anomalies ...
Zach Tynes's user avatar
0 votes
1 answer
73 views

KNNImputer drops columns despite of numeric datatypes and right shape

I am using KNNImputer to impute np.nan values in several pd.DataFrame. I checked that all the datatypes of each one of the dataframes are numeric. However, KNNImputer drops some columns in some ...
Ivan's user avatar
  • 125
1 vote
1 answer
1k views

Write data directly to blob storage from an Azure Machine Learning Studio notebook

I'm working on some interactive development in an Azure Machine Learning notebook and I'd like to save some data directly from a pandas DataFrame to a csv file in my default connected blob storage ...
Matt_Haythornthwaite's user avatar
0 votes
0 answers
63 views

Low Validation and Test Accuracy with Random Forest on ECG Data

I'm working on a project involving ECG data classification using a Random Forest model. Unfortunately, my model's performance is significantly lower than expected, and I'm struggling to understand why....
MEJRI Rawaa's user avatar
2 votes
3 answers
96 views

Pandas takes all columns of a dataframe even when some columns are specified

I am trying to train KMeans model using Scikit-Learn. I am stuck on this issue for 2 days. Pandas is selecting all columns of a dataframe even though I specified 2 columns. Here is the dataframe in ...
Shree_ML's user avatar
0 votes
1 answer
640 views

SageMaker Processing Job permission denied to save csv file under /opt/ml/processing/<folder>

I am working on a project involving Step Functions with SageMaker. I have an existing Step Function that I need to integrate SageMaker into, and I tried adding steps such as processing, model training,...
Gwenda Thomas's user avatar
0 votes
0 answers
59 views

i got ValueError: np.nan is an invalid document, expected byte or unicode string

import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity # Read the first Excel file with Business codes and descriptions ...
mobinhb's user avatar
0 votes
0 answers
54 views

How to convert Xarray Dataset to a Darts Time Series

I have an Xarray dataset object with lat/lon/time coordinates This is a map of climate data. I want to convert this to an Darts TimeSeries object in order to train models on it. There is a function to ...
David Flasterstein's user avatar
1 vote
0 answers
83 views

How to save and load TensorFlow Decision forest model for incremental learning?

I am developing TensorFlow decision forest regression model for incremental learning, So I have developed the model and have saved the model. When I retrain with new data the error is coming like &...
Swasthik Shivananda's user avatar

15 30 50 per page
1
2 3 4 5
141
X