Newest 'dataset' Questions - Data Science Stack Exchange

0 votes

0 answers

35 views

Guide me with my major project titled Satellite-Based Agricultural Vulnerability Monitoring

I am working on a major project titled Utilizing Satellite Data and Deep Learning to Monitor Agricultural Vulnerabilities to Climate Change. My goal is to develop a system to monitor agricultural ...

Shivani Toorpu

1

asked Dec 13, 2025 at 15:02

8 votes

2 answers

113 views

When should we avoid balancing an imbalanced dataset?

I am working on a network security-related project, in which I have to build a deep learning model to detect a specific attack. It's about detecting whether a network system of an organisation is a ...

lony235

83

asked Nov 15, 2025 at 16:22

0 votes

0 answers

24 views

How to extract my fingerprint from my laptop's finger sensor

So like I have a bunch of fingerprint as a data set (my college gave me). Now I want to use these fingerprint as datasets and train a model to understand the different things. That is beside the point....

Sayan

1

asked Nov 6, 2025 at 17:23

2 votes

1 answer

51 views

What could be a dataset in which the presence of an outlier or a null value dramatically affects the performance of the decision tree?

I am tasked with giving an example of a dataset in which the presence of an outlier or a null value dramatically affects the performance of a decision tree. I've searched and searched the web and I ...

Arunabh

121

asked Oct 31, 2025 at 23:05

3 votes

2 answers

91 views

Imbalanced classes and ML set up

I’m working on a MarTech use case (predict customers conversions to a certain product). Not really used to work within this domain, therefore I’m seeking some critical questions on my set up. Context: ...

Henri

133

asked Oct 10, 2025 at 7:49

4 votes

0 answers

36 views

Time-efficient parallelization of masks for pre-processing a dataset

I have a large dataset (~10M points) in python and I want to filter it using a large number of different custom masks, as part of calculations to create a new but related dataset. Because the dataset ...

quail

41

asked Oct 6, 2025 at 18:36

3 votes

0 answers

106 views

How can I constrain a fitting parameter to be the same across multiple datasets?

I am doing nonlinear fits on multiple datasets with several fitting parameters. Each dataset is fit with the same equation and same fitting parameters. Specifically, I am using the curve fitting ...

jaystrop

31

asked Sep 19, 2025 at 16:33

1 vote

0 answers

60 views

Why are there date discrepencies in 2024 North Carolina absentee ballot data?

I've been working with North Carolina's mail-in/absentee ballot data for the 2024 general election. There are 327 rows with ballot request dates prior to 2024, including a few marked in years much ...

Chris Lindgren

119

asked Sep 1, 2025 at 16:12

2 votes

1 answer

69 views

Large, historical, international news corpus for NLP; open access and Python workflow?

I need a large, historical, international news/articles dataset for an NLP project. Ideal features: • the earlier the better–present; multilingual; public/academic access. • Full text preferred; URLs +...

Joe94

121

asked Aug 29, 2025 at 20:32

6 votes

2 answers

84 views

How to handle irrelevant categorical variables in aggregated data?

I’m working with ad server data where I can’t get user-level data — only aggregated reports. The data is aggregated on multiple categorical dimensions (e.g., day × product × medium × source × campaign ...

David

73

asked Aug 9, 2025 at 13:49

2 votes

0 answers

59 views

What is the best approach for future proofing research data against new parameters?

For my research I regularly perform parameter searches. Suppose I have a set of hyperparameters $\textbf{X}=\{X_0, X_1, \dots X_n \}$ and some function $f(\textbf{X}) = \textbf{Y}$ where $\textbf{Y}=\{...

squareroottwo

21

asked Aug 5, 2025 at 14:26

2 votes

0 answers

32 views

How to improve fine-tuning for task dependency extraction?

I'm trying to fine-tune a LLaMA 3.1 Instruct model to adapt it to a specific industrial domain. The goal is to have the model extract direct dependencies between tasks from a list of operational steps ...

lili

371

asked Jul 31, 2025 at 14:08

4 votes

1 answer

101 views

Sample size distribution for a dataset

This is a more general question regarding to the nature of a dataset for any statistical method used afterwards. Let's say you have a nice,clean dataset that contains values for predicting the maximum ...

ChairmanMeow

183

asked Jul 29, 2025 at 14:21

5 votes

1 answer

107 views

How to measure that my dataset is good for the training?

I wanted to train a model for this dataset. the Inputs dataset is here:https://drive.google.com/file/d/1bbMa7auwYjYxyCB72UMBNv5kaojqV7WH/view?usp=sharing and the outputs dataset is here:https://drive....

Naivahash80

345

asked Jul 27, 2025 at 14:57

0 votes

1 answer

48 views

Matching BDD100K semantic segmentations to the original image

BDD100K is a dataset for autonomous driving. I downloaded the images + labels, and also the semantic segmentations, but I am facing an issue: The image names don't match between the original images ...

kutschkem

101

asked Jul 14, 2025 at 7:38

Stack Exchange Network

Questions tagged [dataset]

Guide me with my major project titled Satellite-Based Agricultural Vulnerability Monitoring

When should we avoid balancing an imbalanced dataset?

How to extract my fingerprint from my laptop's finger sensor

What could be a dataset in which the presence of an outlier or a null value dramatically affects the performance of the decision tree?

Imbalanced classes and ML set up

Time-efficient parallelization of masks for pre-processing a dataset

How can I constrain a fitting parameter to be the same across multiple datasets?

Why are there date discrepencies in 2024 North Carolina absentee ballot data?

Large, historical, international news corpus for NLP; open access and Python workflow?

How to handle irrelevant categorical variables in aggregated data?

What is the best approach for future proofing research data against new parameters?

How to improve fine-tuning for task dependency extraction?

Sample size distribution for a dataset

How to measure that my dataset is good for the training?

Matching BDD100K semantic segmentations to the original image

Hot Network Questions