Newest 'machine-learning+algorithm+scikit-learn' Questions

6 votes

4 answers

8k views

Why is KNN so much faster with cosine distance than Euclidean distance?

I am fitting a k-nearest neighbors classifier using scikit learn and noticed that the fitting is faster, often by an order of magnitude or more, when using the cosine similarity between two vectors ...

Simon Segert

431

asked May 23, 2021 at 14:32

1 vote

1 answer

3k views

Can you use the isolation forest algorithm on large sample sizes?

I've been using the scikit learn sklearn.ensemble.IsolationForest implementation of the isolation forest to detect anomalies in my datasets that range from 100s of rows to millions of rows worth of ...

ddx

520

asked Jun 16, 2020 at 22:29

0 votes

0 answers

351 views

implement PCA in python and k nearest

i want to implement Dimensionality Reduction with Neighborhood Components Analysis in classification using k Nearest neighbourhood https://scikit-learn.org/stable/auto_examples/neighbors/...

devss

145

asked Jan 8, 2020 at 18:30

0 votes

0 answers

66 views

How to Fit New Classes to SKLearn Algorithm

I am in need of a SKLearn Classifier which can be trained periodically by fitting new data to the already trained algorithm while retaining the classes it has learned to fit to previously. I have ...

Nathan Lewis

21

asked Sep 12, 2019 at 15:24

-1 votes

3 answers

2k views

Clustering with unknown number of clusters

I need to find logins that belong to the same person. The task should be solved in python environment. I have a dataset with user actions. From these actions I created the N number of features: - ...

Shokan

13

asked Jul 17, 2019 at 13:48

1 vote

2 answers

5k views

Accuracy score for a KNN model (IRIS data)

What might be some key factors for increasing or stabilizing the accuracy score (NOT TO significantly vary) of this basic KNN model on IRIS data? Attempt from sklearn import neighbors, datasets, ...

Emma

27.8k

asked Jul 5, 2019 at 1:08

-2 votes

1 answer

517 views

how can i solve the "Value Error" which says could not convert string to float: 'D'?

I'm trying to get the output using the same following expression but unable to fetch the details . can anyone please help? # Separate into feature set and target variable #FTR = Full Time Result (H=...

user9032932

asked Apr 9, 2019 at 23:48

0 votes

0 answers

2k views

Accuracy for Random Forest Algorithm is 0.0

I'm doing a machine learning project using Jupyter notebook. I'm using Random Forest with GridSearchCV, the execution is working fine, but I got Accuracy = 0.0 When I tried Decision Tree the Accuracy ...

Maryem Samet

143

asked Mar 30, 2019 at 3:40

-4 votes

1 answer

80 views

Machine learning and actual predictions [closed]

I have a question about machine learning regarding predictions. So typically I would have a dataset with x's and y's that i would train my algo on. But what if I just have a dataset with input ...

blargh

39

asked Jan 18, 2019 at 17:53

-3 votes

1 answer

1k views

Cluster analysis algorithm for identifying line clusters on a map

I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this: Before commiting them to my database, I'd like to automatically identify all point clusters ( ...

Ruan

922

asked Nov 10, 2018 at 17:56

2 votes

2 answers

228 views

“help” decision tree by tying 2 features together

Assuming I have in my dataset 2 (or more) features that are for sure linked (for example: feature B indicates the amount of relevance of feature A), is there a way I could design a decision tree that ...

Binyamin Even

3,392

asked Oct 31, 2018 at 8:33

9 votes

3 answers

10k views

Compare multiple algorithms with sklearn pipeline

I'm trying to set up a scikit-learn pipeline to simplify my work. The problem I'm facing is that I don't know which algorithm (random forest, naive bayes, decision tree etc.) fits best so I need to ...

vivi11130704

451

asked Aug 5, 2018 at 14:46

0 votes

1 answer

242 views

Algorithm match class distribution of two datasets

I have MC(Monte Carlo/simulation) and data each having events in two classes 0 and 1. I am trying to write an algorithm such that I can match the number of events in class 0 and 1 of MC to data i.e I ...

kg__

79

asked Jul 6, 2018 at 21:55

1 vote

2 answers

5k views

Isolation Forest Sklearn for 1D array or list and how to tune hyper parameters

Is there a way to implement sklearn isolation forest for a 1D array or list? All the examples I came across are for data of 2 Dimension or more. I have right now developed a model with three features ...

Dheepan Manoharan

45

asked Jun 20, 2018 at 21:32

0 votes

1 answer

1k views

How to do classification in binary data set using scikit-learn?

I have the following binary dataset: [ [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], [...

sshussain270

1,875

asked Jan 18, 2018 at 0:42

Collectives™ on Stack Overflow

All Questions

Why is KNN so much faster with cosine distance than Euclidean distance?

Can you use the isolation forest algorithm on large sample sizes?

implement PCA in python and k nearest

How to Fit New Classes to SKLearn Algorithm

Clustering with unknown number of clusters

Accuracy score for a KNN model (IRIS data)

how can i solve the "Value Error" which says could not convert string to float: 'D'?

Accuracy for Random Forest Algorithm is 0.0

Machine learning and actual predictions [closed]

Cluster analysis algorithm for identifying line clusters on a map

“help” decision tree by tying 2 features together

Compare multiple algorithms with sklearn pipeline

Algorithm match class distribution of two datasets

Isolation Forest Sklearn for 1D array or list and how to tune hyper parameters

How to do classification in binary data set using scikit-learn?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags