Newest 'research' Questions - Data Science Stack Exchange

2 votes

2 answers

129 views

Best Practice for Group Based splitting (Train / Val / Test)

As an intro, Group Based Splitting is data splitting into Train / Test (Val), when by some attribute like patient_id, item_id or similar, to ensure that same person ...

Michael D

209

asked Jun 10, 2025 at 10:43

2 votes

1 answer

68 views

Where can I find open/free Galton's Ox estimate/Wisdom of Crowd dataset and similar?

I am playing around with some thoughts on Wisdom of Crowd phenomena and wanted to do some analysis in R/Excel. Francis Galton pioneered this concept and I was hoping to use his dataset but I can't ...

Ethos

21

asked May 10, 2025 at 16:47

2 votes

0 answers

96 views

Why can monotonic feature transformation influence the performance of hypeparam-tuned tree-based models (e.g., random forest)?

I recently observed something unexpected: Although monotonic feature transformation does not affect the performance of decision tree-based models with default hyperparameters, it actually does affect ...

user4924539

21

asked May 2, 2025 at 21:00

0 votes

0 answers

33 views

I would like to build an open source Traffic Signs Dataset solely for research purposes

I've been interested lately in doing research about different neural networks and how to contribute to Autonomous Vehicles, I used a couple of images to train a model and the results were different ...

Amy

1

asked Mar 2, 2025 at 17:06

1 vote

0 answers

33 views

Research in Machine Learning in the era of transformers

I'm a master's student in Machine Learning. I'm interested in pursuing research in the field, but I'm concerned about the recent advancements like ChatGPT, CLIP, and DiNO that require massive compute ...

brahim benelghali

11

asked May 2, 2024 at 15:30

0 votes

0 answers

52 views

Can one have good understanding on a method without having direct experience with it?

This question is in line of these previous questions on other sites: Is it possible to conduct scientific research without actually getting close to the sample/specimen? in Biology SE Is it possible ...

Ooker

133

asked Mar 18, 2024 at 17:20

0 votes

0 answers

36 views

How to use two independent datasets in machine learning phd research work?

In order to develop an academic performance prediction model for a local Higher Ed Institution, I have collected the OULAD open dataset and the local Institution's dataset which I structured into the ...

AnilPHD

1

asked Jan 29, 2024 at 16:58

1 vote

0 answers

54 views

ML paper reproducibility

How can I reproduce results in an ML paper if I don't have the identical resources to train the models as in the paper ? (in my case I only have a laptop spec NVidia gpu and in most of the papers I ...

okm02

11

asked Oct 25, 2023 at 9:33

15 votes

3 answers

24k views

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

It could be that I'm misunderstanding the problems space and the iterations of LLAMA, GPT, and PaLM are all based on BERT like many language models are, but every time I see a new paper in improving ...

Ethan

253

asked Aug 3, 2023 at 1:11

1 vote

1 answer

56 views

Which statistical technique should I use for a within-person repeated measures study?

I have collected all my data for a study and need to run my analysis but have come unstuck (I should have planned better beforehand I know). I'm looking to see whether personality traits (five trait ...

Jay

11

asked May 24, 2023 at 13:09

1 vote

0 answers

45 views

Influence functions on neural networks: Help with understanding of result and derivation

I'm working through a paper titled "Understanding Black-box Predictions via Influence Functions" where they introduce the notion of influence functions from robust statistics to approximate ...

rasgaard

11

asked May 18, 2023 at 12:20

0 votes

2 answers

150 views

Where can I find the applied data science research papers?

I'm trying to find conferences that have applied data science papers published. I'm only interested in top ranked conferences. And I notice quite a number of them are quite theoretical, e.g. IJAI, ...

Student

441

asked May 17, 2023 at 5:11

0 votes

2 answers

112 views

The ideal function in R for fit fitting n LASSO Regressions on n data sets

As part of a statistical learning research paper I am collaborating on, I am running/fitting two hundred sixty thousand different LASSO Regressions on the same number of different randomly generated ...

Marlen

167

asked Jan 31, 2023 at 5:32

4 votes

1 answer

82 views

Resources for Promotion/Demotion Strategies for ML Item Recommendation Systems?

We are looking to design a system where specific items or categories of items can be boosted/promoted up or relegated/demoted down the recommendation order. What are the common strategies or standards ...

JPTheEngineer

41

asked Jan 20, 2023 at 20:28

0 votes

1 answer

55 views

Which specific AWS service to use for running Benchmark Regressions on datasets far too large to run locally on my laptop [closed]

I am in the middle of a research project with a collaborator in which he has proposed a novel statistical learning processor for optimal variable selection, and I am running the 3 Benchmark Variable ...

Marlen

167

asked Jan 4, 2023 at 14:29

Stack Exchange Network

Questions tagged [research]

Best Practice for Group Based splitting (Train / Val / Test)

Where can I find open/free Galton's Ox estimate/Wisdom of Crowd dataset and similar?

Why can monotonic feature transformation influence the performance of hypeparam-tuned tree-based models (e.g., random forest)?

I would like to build an open source Traffic Signs Dataset solely for research purposes

Research in Machine Learning in the era of transformers

Can one have good understanding on a method without having direct experience with it?

How to use two independent datasets in machine learning phd research work?

ML paper reproducibility

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

Which statistical technique should I use for a within-person repeated measures study?

Influence functions on neural networks: Help with understanding of result and derivation

Where can I find the applied data science research papers?

The ideal function in R for fit fitting n LASSO Regressions on n data sets

Resources for Promotion/Demotion Strategies for ML Item Recommendation Systems?

Which specific AWS service to use for running Benchmark Regressions on datasets far too large to run locally on my laptop [closed]

Hot Network Questions