Skip to main content

Questions tagged [preprocessing]

Data preprocessing is a data mining technique that involves transforming raw data into a better understandable or more useful format.

0 votes
0 answers
10 views

i have a dataset and in each picture there are many things. What should i do for train of GANs (styleGAN) for Preprocessing that the model distinguish the things in the Picture. Now the result is not ...
arash's user avatar
  • 1
0 votes
1 answer
44 views

I'm working on binary classification problem to identify struggling students in university. I have some features that are correlated such as high_school_grade_1 that represents 75% of ...
Youness Belhaj's user avatar
1 vote
0 answers
40 views

In the official paper "Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)", they provided a dataset including over ...
AAA_11's user avatar
  • 41
4 votes
0 answers
29 views

I have a large dataset (~10M points) in python and I want to filter it using a large number of different custom masks, as part of calculations to create a new but related dataset. Because the dataset ...
quail's user avatar
  • 41
7 votes
1 answer
135 views

Background I'm implementing a production MLOps pipeline for part classification using Databricks AutoML. My pipeline automatically retrains models when new data arrives and compares performance with ...
Spearitch502's user avatar
7 votes
1 answer
100 views

I'm trying to train a CNN model to identify phytoplankton species from a training set. During preprocessing, the images are resized to 224x224, which seems to be stretching or compressing the object ...
Charlottefaf's user avatar
0 votes
0 answers
31 views

I am working on a dataset containing features that are discrete frequency counts. I understand that knowing the underlying data distribution is important for selecting an appropriate imputation method....
Emre's user avatar
  • 1
1 vote
1 answer
68 views

I'm working with high-dimensional biological data (∼41,000 features × 3,979 samples from RNA-seq for 2 conditions). Here’s a simplified version of my preprocessing and filtering pipeline before ...
Adi Gershon's user avatar
2 votes
0 answers
28 views

I have a question regarding the preprocessing step in a project I'm working on. I have two different measurement devices that both collect time-series data. My goal is to analyze the similarity ...
TTC's user avatar
  • 21
7 votes
1 answer
97 views

I am currently working on a dataset that has two columns: customerID and date. I want to find the minimum date for each customerID. Initially, I used the following code: ...
Guna's user avatar
  • 897
0 votes
0 answers
83 views

I know there is already a question on this topic, but it doesn’t fully address my concerns. I am currently writing my master's thesis and will use VADER for sentiment analysis (the vader package by ...
MaryJ.'s user avatar
  • 1
1 vote
0 answers
29 views

I am training a model on multiple cache miss examples from various trace simulations. For every trace I have thousands of miss examples stored and I have many traces. I'm storing the examples in ...
Saffy's user avatar
  • 11
0 votes
0 answers
25 views

I'm a fourth-semester Informatics Engineering student. Currently, I'm working on a topic modelling project using a Twitter dataset for college assignment. I've encountered a difficulty where, in one ...
Edwin Jaya's user avatar
0 votes
0 answers
31 views

I am currently working on preprocessing big data dataset for ML purposes. I am struggling with encoding strings as numbers. I have a dataset of multiple blockchain transactions and I have addresses of ...
Asic's user avatar
  • 21
0 votes
0 answers
35 views

I have 8752 pictures that was converted from, more or less, an hour long CCTV video with Python script screenshotting. My supervisor told me to clean the data from the roughly similar one. At first I ...
RedSean's user avatar

15 30 50 per page
1
2 3 4 5
36