All Questions
Tagged with machine-learning pandas
2,102 questions
0
votes
1
answer
45
views
How to preprocess date in Isolation Forest sklearn [closed]
I am using sklearn's IsolationForest model to detect anomalies on a time-series dataset. One of the features is date with the format MM-YYYY, the other features are numeric values.
What is the best ...
-2
votes
1
answer
80
views
Dummy Variable as Boolean rather than Integer [closed]
I'm working on a machine learning project in Python. Using pandas pd.get_dummies I'm trying to create dummy variables for a categorical column in my data but the variables are being converted to ...
-1
votes
1
answer
79
views
How can I achieve accurate imputation of missing values in a dataset?
I'm working with a dataset containing details about used cars, and I've encountered several missing values in the Fuel_Type column. The possible values include 'Gasoline', 'E85 Flex Fuel', 'Hybrid', '...
-1
votes
1
answer
56
views
KeyError when using array as feature in language detection
I am following this tutorial for language detection using machine learning. In the dataset I am using, however, there are multiple variables as features. I tried, in the place of X = data["Text&...
0
votes
1
answer
56
views
Separate a ingredients/feature into separate columns that is marked with "0" or "1"
I'm looking at a some food waste data where I have a fair bit of data including the Ingredients for what was in the food. I'm trying to do some ML on the data, and I'm having some trouble getting it ...
1
vote
1
answer
50
views
Applying log transformation to a column
I have encoded Gender column by OneHotEncoder. I want to apply log transformation to only Female[0] column but it is applying log to all the columns — why?
My code:
import pandas as p
from sklearn....
0
votes
1
answer
76
views
How to make Isolation Forest detect anomaly at the peak of the difference, instead of the first value seen
I am using Isolation Forest to identify anomalies in a very large data frame. The data is noisy, so I have conducted many filtering operations to smooth out the noise so that the true anomalies ...
0
votes
1
answer
73
views
KNNImputer drops columns despite of numeric datatypes and right shape
I am using KNNImputer to impute np.nan values in several pd.DataFrame. I checked that all the datatypes of each one of the dataframes are numeric. However, KNNImputer drops some columns in some ...
1
vote
1
answer
1k
views
Write data directly to blob storage from an Azure Machine Learning Studio notebook
I'm working on some interactive development in an Azure Machine Learning notebook and I'd like to save some data directly from a pandas DataFrame to a csv file in my default connected blob storage ...
0
votes
0
answers
63
views
Low Validation and Test Accuracy with Random Forest on ECG Data
I'm working on a project involving ECG data classification using a Random Forest model. Unfortunately, my model's performance is significantly lower than expected, and I'm struggling to understand why....
2
votes
3
answers
96
views
Pandas takes all columns of a dataframe even when some columns are specified
I am trying to train KMeans model using Scikit-Learn.
I am stuck on this issue for 2 days.
Pandas is selecting all columns of a dataframe even though I specified 2 columns.
Here is the dataframe in ...
0
votes
1
answer
640
views
SageMaker Processing Job permission denied to save csv file under /opt/ml/processing/<folder>
I am working on a project involving Step Functions with SageMaker. I have an existing Step Function that I need to integrate SageMaker into, and I tried adding steps such as processing, model training,...
0
votes
0
answers
59
views
i got ValueError: np.nan is an invalid document, expected byte or unicode string
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Read the first Excel file with Business codes and descriptions
...
0
votes
0
answers
54
views
How to convert Xarray Dataset to a Darts Time Series
I have an Xarray dataset object with lat/lon/time coordinates This is a map of climate data. I want to convert this to an Darts TimeSeries object in order to train models on it. There is a function to ...
1
vote
0
answers
83
views
How to save and load TensorFlow Decision forest model for incremental learning?
I am developing TensorFlow decision forest regression model for incremental learning, So I have developed the model and have saved the model. When I retrain with new data the error is coming like &...