Newest 'preprocessing' Questions - Data Science Stack Exchange

0 votes

0 answers

10 views

Stylegan preprocess

i have a dataset and in each picture there are many things. What should i do for train of GANs (styleGAN) for Preprocessing that the model distinguish the things in the Picture. Now the result is not ...

arash

1

asked Nov 24 at 14:03

0 votes

1 answer

44 views

Correlated Features In Classificatification Problem

I'm working on binary classification problem to identify struggling students in university. I have some features that are correlated such as high_school_grade_1 that represents 75% of ...

Youness Belhaj

1

asked Oct 26 at 22:39

1 vote

0 answers

40 views

Splitting the ISIC 2018 Skin Lesion Segmentation Dataset

In the official paper "Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)", they provided a dataset including over ...

AAA_11

41

asked Oct 16 at 11:25

4 votes

0 answers

29 views

Time-efficient parallelization of masks for pre-processing a dataset

I have a large dataset (~10M points) in python and I want to filter it using a large number of different custom masks, as part of calculations to create a new but related dataset. Because the dataset ...

quail

41

asked Oct 6 at 18:36

7 votes

1 answer

135 views

Data Drift & Model Comparison in Production MLOps: Handling Scale Changes with AutoML

Background I'm implementing a production MLOps pipeline for part classification using Databricks AutoML. My pipeline automatically retrains models when new data arrives and compares performance with ...

Spearitch502

83

asked Aug 14 at 14:21

7 votes

1 answer

100 views

Effects of resizing training images during preprocessing CNN classification model

I'm trying to train a CNN model to identify phytoplankton species from a training set. During preprocessing, the images are resized to 224x224, which seems to be stretching or compressing the object ...

Charlottefaf

115

asked Jul 17 at 18:35

0 votes

0 answers

31 views

Discrete Feature Imputation: How to Choose an Appropriate Data Distribution Model?

I am working on a dataset containing features that are discrete frequency counts. I understand that knowing the underlying data distribution is important for selecting an appropriate imputation method....

Emre

1

asked Jul 11 at 10:22

1 vote

1 answer

68 views

Is it valid to filter features using t-tests before train/test split in high-dimensional biological data

I'm working with high-dimensional biological data (∼41,000 features × 3,979 samples from RNA-seq for 2 conditions). Here’s a simplified version of my preprocessing and filtering pipeline before ...

Adi Gershon

11

asked May 17 at 8:27

2 votes

0 answers

28 views

Question about preprocessing two time-series datasets from different measurement devices

I have a question regarding the preprocessing step in a project I'm working on. I have two different measurement devices that both collect time-series data. My goal is to analyze the similarity ...

TTC

21

asked Mar 31 at 2:27

7 votes

1 answer

97 views

Difference between transform('min) vs min() in pandas

I am currently working on a dataset that has two columns: customerID and date. I want to find the minimum date for each customerID. Initially, I used the following code: ...

Guna

897

asked Mar 27 at 11:58

0 votes

0 answers

83 views

Opinions on the practice of removing stop words before VADER

I know there is already a question on this topic, but it doesn’t fully address my concerns. I am currently writing my master's thesis and will use VADER for sentiment analysis (the vader package by ...

MaryJ.

1

asked Feb 19 at 15:00

1 vote

0 answers

29 views

How can I efficiently process and load a large Protobuf dataset for machine learning model training?

I am training a model on multiple cache miss examples from various trace simulations. For every trace I have thousands of miss examples stored and I have many traces. I'm storing the examples in ...

Saffy

11

asked Feb 5 at 14:54

0 votes

0 answers

25 views

Is Negation Handling Necessary in Topic Modeling?

I'm a fourth-semester Informatics Engineering student. Currently, I'm working on a topic modelling project using a Twitter dataset for college assignment. I've encountered a difficulty where, in one ...

Edwin Jaya

1

asked Jan 17 at 2:40

0 votes

0 answers

31 views

String to number in case of having millions of unique values

I am currently working on preprocessing big data dataset for ML purposes. I am struggling with encoding strings as numbers. I have a dataset of multiple blockchain transactions and I have addresses of ...

Asic

21

asked Dec 17, 2024 at 16:03

0 votes

0 answers

35 views

Efficient way to clean 8752 pictures from the very similar one

I have 8752 pictures that was converted from, more or less, an hour long CCTV video with Python script screenshotting. My supervisor told me to clean the data from the roughly similar one. At first I ...

RedSean

1

asked Nov 27, 2024 at 14:40

Stack Exchange Network

Questions tagged [preprocessing]

Stylegan preprocess

Correlated Features In Classificatification Problem

Splitting the ISIC 2018 Skin Lesion Segmentation Dataset

Time-efficient parallelization of masks for pre-processing a dataset

Data Drift & Model Comparison in Production MLOps: Handling Scale Changes with AutoML

Effects of resizing training images during preprocessing CNN classification model

Discrete Feature Imputation: How to Choose an Appropriate Data Distribution Model?

Is it valid to filter features using t-tests before train/test split in high-dimensional biological data

Question about preprocessing two time-series datasets from different measurement devices

Difference between transform('min) vs min() in pandas

Opinions on the practice of removing stop words before VADER

How can I efficiently process and load a large Protobuf dataset for machine learning model training?

Is Negation Handling Necessary in Topic Modeling?

String to number in case of having millions of unique values

Efficient way to clean 8752 pictures from the very similar one

Hot Network Questions