All Questions
37 questions
6
votes
4
answers
8k
views
Why is KNN so much faster with cosine distance than Euclidean distance?
I am fitting a k-nearest neighbors classifier using scikit learn and noticed that the fitting is faster, often by an order of magnitude or more, when using the cosine similarity between two vectors ...
1
vote
1
answer
3k
views
Can you use the isolation forest algorithm on large sample sizes?
I've been using the scikit learn sklearn.ensemble.IsolationForest implementation of the isolation forest to detect anomalies in my datasets that range from 100s of rows to millions of rows worth of ...
0
votes
0
answers
351
views
implement PCA in python and k nearest
i want to implement Dimensionality Reduction with Neighborhood Components Analysis in classification using k Nearest neighbourhood
https://scikit-learn.org/stable/auto_examples/neighbors/...
0
votes
0
answers
66
views
How to Fit New Classes to SKLearn Algorithm
I am in need of a SKLearn Classifier which can be trained periodically by fitting new data to the already trained algorithm while retaining the classes it has learned to fit to previously.
I have ...
-1
votes
3
answers
2k
views
Clustering with unknown number of clusters
I need to find logins that belong to the same person. The task should be solved in python environment.
I have a dataset with user actions. From these actions I created the N number of features:
- ...
1
vote
2
answers
5k
views
Accuracy score for a KNN model (IRIS data)
What might be some key factors for increasing or stabilizing the accuracy score (NOT TO significantly vary) of this basic KNN model on IRIS data?
Attempt
from sklearn import neighbors, datasets, ...
-2
votes
1
answer
517
views
how can i solve the "Value Error" which says could not convert string to float: 'D'?
I'm trying to get the output using the same following expression but unable to fetch the details . can anyone please help?
# Separate into feature set and target variable
#FTR = Full Time Result (H=...
0
votes
0
answers
2k
views
Accuracy for Random Forest Algorithm is 0.0
I'm doing a machine learning project using Jupyter notebook. I'm using Random Forest with GridSearchCV, the execution is working fine, but I got Accuracy = 0.0
When I tried Decision Tree the Accuracy ...
-4
votes
1
answer
80
views
Machine learning and actual predictions [closed]
I have a question about machine learning regarding predictions.
So typically I would have a dataset with x's and y's that i would train my algo on. But what if I just have a dataset with input ...
-3
votes
1
answer
1k
views
Cluster analysis algorithm for identifying line clusters on a map
I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:
Before commiting them to my database, I'd like to automatically identify all point clusters ( ...
2
votes
2
answers
228
views
“help” decision tree by tying 2 features together
Assuming I have in my dataset 2 (or more) features that are for sure linked (for example: feature B indicates the amount of relevance of feature A), is there a way I could design a decision tree that ...
9
votes
3
answers
10k
views
Compare multiple algorithms with sklearn pipeline
I'm trying to set up a scikit-learn pipeline to simplify my work. The problem I'm facing is that I don't know which algorithm (random forest, naive bayes, decision tree etc.) fits best so I need to ...
0
votes
1
answer
242
views
Algorithm match class distribution of two datasets
I have MC(Monte Carlo/simulation) and data each having events in two classes 0 and 1. I am trying to write an algorithm such that I can match the number of events in class 0 and 1 of MC to data i.e I ...
1
vote
2
answers
5k
views
Isolation Forest Sklearn for 1D array or list and how to tune hyper parameters
Is there a way to implement sklearn isolation forest for a 1D array or list? All the examples I came across are for data of 2 Dimension or more.
I have right now developed a model with three features ...
0
votes
1
answer
1k
views
How to do classification in binary data set using scikit-learn?
I have the following binary dataset:
[
[1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
[...