Skip to main content

All Questions

6 votes
4 answers
8k views

Why is KNN so much faster with cosine distance than Euclidean distance?

I am fitting a k-nearest neighbors classifier using scikit learn and noticed that the fitting is faster, often by an order of magnitude or more, when using the cosine similarity between two vectors ...
Simon Segert's user avatar
1 vote
1 answer
3k views

Can you use the isolation forest algorithm on large sample sizes?

I've been using the scikit learn sklearn.ensemble.IsolationForest implementation of the isolation forest to detect anomalies in my datasets that range from 100s of rows to millions of rows worth of ...
ddx's user avatar
  • 520
0 votes
0 answers
351 views

implement PCA in python and k nearest

i want to implement Dimensionality Reduction with Neighborhood Components Analysis in classification using k Nearest neighbourhood https://scikit-learn.org/stable/auto_examples/neighbors/...
devss's user avatar
  • 145
0 votes
0 answers
66 views

How to Fit New Classes to SKLearn Algorithm

I am in need of a SKLearn Classifier which can be trained periodically by fitting new data to the already trained algorithm while retaining the classes it has learned to fit to previously. I have ...
Nathan Lewis's user avatar
-1 votes
3 answers
2k views

Clustering with unknown number of clusters

I need to find logins that belong to the same person. The task should be solved in python environment. I have a dataset with user actions. From these actions I created the N number of features: - ...
Shokan's user avatar
  • 13
1 vote
2 answers
5k views

Accuracy score for a KNN model (IRIS data)

What might be some key factors for increasing or stabilizing the accuracy score (NOT TO significantly vary) of this basic KNN model on IRIS data? Attempt from sklearn import neighbors, datasets, ...
Emma's user avatar
  • 27.8k
-2 votes
1 answer
517 views

how can i solve the "Value Error" which says could not convert string to float: 'D'?

I'm trying to get the output using the same following expression but unable to fetch the details . can anyone please help? # Separate into feature set and target variable #FTR = Full Time Result (H=...
user avatar
0 votes
0 answers
2k views

Accuracy for Random Forest Algorithm is 0.0

I'm doing a machine learning project using Jupyter notebook. I'm using Random Forest with GridSearchCV, the execution is working fine, but I got Accuracy = 0.0 When I tried Decision Tree the Accuracy ...
Maryem Samet's user avatar
-4 votes
1 answer
80 views

Machine learning and actual predictions [closed]

I have a question about machine learning regarding predictions. So typically I would have a dataset with x's and y's that i would train my algo on. But what if I just have a dataset with input ...
blargh's user avatar
  • 39
-3 votes
1 answer
1k views

Cluster analysis algorithm for identifying line clusters on a map

I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this: Before commiting them to my database, I'd like to automatically identify all point clusters ( ...
Ruan's user avatar
  • 922
2 votes
2 answers
228 views

“help” decision tree by tying 2 features together

Assuming I have in my dataset 2 (or more) features that are for sure linked (for example: feature B indicates the amount of relevance of feature A), is there a way I could design a decision tree that ...
Binyamin Even's user avatar
9 votes
3 answers
10k views

Compare multiple algorithms with sklearn pipeline

I'm trying to set up a scikit-learn pipeline to simplify my work. The problem I'm facing is that I don't know which algorithm (random forest, naive bayes, decision tree etc.) fits best so I need to ...
vivi11130704's user avatar
0 votes
1 answer
242 views

Algorithm match class distribution of two datasets

I have MC(Monte Carlo/simulation) and data each having events in two classes 0 and 1. I am trying to write an algorithm such that I can match the number of events in class 0 and 1 of MC to data i.e I ...
kg__'s user avatar
  • 79
1 vote
2 answers
5k views

Isolation Forest Sklearn for 1D array or list and how to tune hyper parameters

Is there a way to implement sklearn isolation forest for a 1D array or list? All the examples I came across are for data of 2 Dimension or more. I have right now developed a model with three features ...
Dheepan Manoharan's user avatar
0 votes
1 answer
1k views

How to do classification in binary data set using scikit-learn?

I have the following binary dataset: [ [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], [...
sshussain270's user avatar
  • 1,875

15 30 50 per page