Questions tagged [efficiency]
Efficiency, in algorithmic processing, is usually associated to resource usage. The metrics to evaluate the efficiency of a process are commonly account for execution time, memory/disk or storage requirements, network usage and power consumption.
43 questions
4
votes
0
answers
36
views
Time-efficient parallelization of masks for pre-processing a dataset
I have a large dataset (~10M points) in python and I want to filter it using a large number of different custom masks, as part of calculations to create a new but related dataset. Because the dataset ...
0
votes
1
answer
27
views
feature engineering mechanism
why do we need to rescale some feature having large range I know we do it for faster rate of gradient descent ,but still how does rescaling works? and it doesn't break the model
and does rescaling ...
3
votes
3
answers
3k
views
What's the fastest clustering package in Python?
I'd like to perform clustering analysis on a dataset with 1,300 columns and 500,000 rows.
I've seen that clustering algorithms are available in SciKit-Learn. But I'm worried that the algorithms will ...
0
votes
1
answer
82
views
Efficient ways of clustering for big data
I have a task which is customer segmentation with 120k users and a record of their purchases which is +3 million records of data, the approach I want to use is to use clustering algorithms like kmeans ...
1
vote
1
answer
1k
views
Why does SciKit-Learn's OneHotEncoder take so long on a Large Dataset?
I'm using an older version of SciKit-Learn, version 1.0.2, to try and OneHotEncode some data. My dataset is fairly large, 184 columns going to 311 after the ...
0
votes
1
answer
539
views
parallel work on KNN in python
I have a question, related to parallel work on python
How I can use Processers =1,2,3... on k nearest neighbor algorithm when K=1, 2, 3,.. to find the change in time spent, speedup, and efficiency.
...
0
votes
0
answers
47
views
find speedup for different number of processes
I am new to data science
I need to create code to find speedup compared with the number of processes while using a k-nearest neighbor. which (k=1,2,3,4,5,6,7).
this process should be after downloading ...
8
votes
3
answers
1k
views
Levenshtein distance vs simple for loop
I have recently begun studying different data science principles, and have had a particular interest as of late in fuzzy matching. For preface, I'd like to include smarter fuzzy searching in a ...
1
vote
0
answers
38
views
More efficient way to create frequency column based on different groupings
I have code below that calculates a frequency for each column element (respective to it's own column) and adds all five frequencies together in a column. The code works but is very slow and the ...
0
votes
0
answers
33
views
Inbetween CNN and MLP: neural network architecture for "close to convolutional" problem?
I am looking to approximate an (expensive to calculate precisely) forward problem using a NN. Input and output are vectors of identical length. Although not linear, the output somewhat resembles a ...
1
vote
0
answers
60
views
Efficient method of performing within matrix similarity
I want to compute a similarity comparison for each entry in a dataset to every other entry that is labeled as class 1 (excluding the current entry if it has a label of 1). So, consider a matrix of ...
1
vote
1
answer
1k
views
Set value for column based on two other columns in pandas dataframe
I have a dataframe that has contracts with different order dates and I need to create a new column that assign a number to each contract if it has more than one order date. For example my sample ...
2
votes
0
answers
341
views
What is the difference in computational cost at inference time between object detection and semantic segmentation?
I am aware that YOLO (v1-5) is a real-time object detection model with moderately good overall prediction performance. I know that UNet and variants are efficient semantic segmentation models that are ...
1
vote
0
answers
50
views
Can I say that a trained neural network model with less parameters requires less resources during real world inference?
Let us imagine that we have two trained neural network models with different architectures (e.g., type of layers). The first model (a) uses 1D convolutional layers with fully-connected layers and has ...
1
vote
2
answers
54
views
Deep learning on cloud
I am trying to implement some deep learning models with large amount of data around 10gigabyte. Although, my Laptop and Collab-free crashes when it tries to load them. Do you think it worths to buy ...