Questions tagged [dataset]
A dataset is a collection of data, often in tabular or matrix form. This tag is NOT intended for data requests ("where can I find a dataset about ...") --> see OpenData
1,514 questions
0
votes
1
answer
93
views
Request for Assistance with Acquiring a Diverse Cry Dataset for Research Project
I am working on a research project aimed at classifying babies' cries based on their needs. However, I have encountered difficulties in obtaining a suitable cry dataset.
The only dataset I was able to ...
0
votes
1
answer
692
views
How would normalizing be affected by outliers? And how to avoid it?
I have a data set that boils down to Three clomuns: 1.Supplier name 2. Number of transactions with supplier 3. Total value of those transaction.
I'm trying to find the best way to rank all suppliers ...
1
vote
1
answer
99
views
Project structure - many projects share same large dataset
I have a bunch of projects for my job that are largely unrelated except they use the same data, which is pretty big on disk in csv format. I want these to exist separately from each other and I ...
0
votes
1
answer
386
views
How to write custom de-identification algorithm in Python?
I have tried a simple algorithm to anonymize my data using the de-identification technique. But the code doesn't work for me. I want to anonymize the data by slightly changing the values. The data ...
2
votes
1
answer
338
views
How to fake data based on the condition and weight
I'm trying to fake data for the coffee shop. I've two features age and menu. Menu includes various type of drinks such as coffee [latte, espresso, mocca, etc], tea [milktea, lemontea], milk [freshmilk,...
1
vote
1
answer
58
views
Cannot figure out how to make THIS particular data set long-form (theory, not even code)
I am a tableau developer, but I know Python, stats, and, in short, I think you all will be best able to solve my problem.
There is a universal filter on Facility. This means that any dataset/sheet ...
0
votes
1
answer
109
views
A guide to learn data analysis
I'm new to data analysis and I need to do a data analysis project using clustering methods for a course in R. I have no idea how to start and choose my data set. I'm looking for some resources. Is ...
0
votes
1
answer
167
views
What causes a Data Transformation Pipeline Error
I'm making a data transformation pipeline on a dataset, and I am getting an error:
all the input array dimensions except for concatenation axis must match exactly, but along dimension 0, the array at ...
2
votes
1
answer
376
views
Training a neural network with TWO possible correct outputs for one input
I have a system as a black box that has two correct outputs for a single input sample.
now I want to train a neural network to generate at least one of the correct outputs for that input sample.
what ...
1
vote
1
answer
187
views
Dataset of extremely low-dimensional images for PCA
I am looking for a public data-set of images that differ from each other only slightly, so that after applying PCA they can be reconstructed with a small error from ...
5
votes
3
answers
127
views
Should I drop duplicates over features but no target
I'm in a debate with someone about a problem where there are duplicates over features (i.e. $ X_1 = X_2 $ but $ Y_1 \ne Y_2 $).
My point of view is that we should keep those datas, as they can be ...
0
votes
1
answer
1k
views
What's the best approach to dealing with missing data in a dataset?
I have a dataset that contains missing values in some columns. I would like to know what is the best approach to deal with this missing data. Should I remove rows with missing data or fill in missing ...
0
votes
1
answer
259
views
different range of target values in neural network
I am working on a neural network regression code. The dataset includes 14 features in the range value between -1 and 1. while the target variable is changing among (0.000759) to (1100).
The target ...
2
votes
2
answers
173
views
How to train on extended data set correctly
I have trained my classifier on pictures with a mixture of several classes
on each picture, e.g. A-F. The classifier is able to (nearly) correctly segment those classes
on the images.
Now I got more ...
1
vote
1
answer
767
views
Sampling methods for Text datasets (NLP)
I am working on two text datasets, one is having 68k text samples and other is having 100k text samples. I have encoded the text datasets into bert embedding.
...