Questions tagged [reference-request]
"References" is our generic tag for questions seeking information about books, papers, presentations, videos of lectures, on-line tutorials, etc., regarding any subject matter that is on-topic for Data Science.
81 questions
6
votes
1
answer
103
views
Looking for a concise guide for quick revision of statistics
I am looking for a handbook for quick revision of statistics that is straightforward, offering concise explanations, key assumptions, formulas, and possibly one exercise for practice.
If you have any ...
11
votes
1
answer
883
views
Neural network to find errors in training data
My data set consists of an output variable which is categorical with 4 different values and the input variables of which there are roughly 100 and they are boolean, ie True/False. The data set has ...
0
votes
2
answers
76
views
Basis of plotting
I need some help in clearing out basics of plotting in various plotting packages. Specifically matplotlib, seaborn and plotly. Certain fundamental principles are always same across all packages. Can ...
0
votes
0
answers
60
views
Andrew Ng ML course using MATLAB?
Nowadays python is mostly used for machine learning and i think it is also used in new ML courses of Andrew Ng
https://www.quora.com/Why-was-MATLAB-not-used-in-the-Andrew-Ng-course-of-deep-learning
...
0
votes
1
answer
102
views
Fixing class imbalance vs Over-detecting in test data
In my experiences, binary classifiers tend do better in terms of F1 scores when the class imbalance is at least reduced. However, this leads to over-predicting in the test data.
(Thought) Example: If ...
1
vote
0
answers
165
views
Best practises for creating datasets for the purpose of finetuning LLMs
I am working on a problem for which no datasets exist. I have obtained several examples from this domain, and so far have been using them in Large Language Model (LLM) prompts(few shot learning) but I ...
1
vote
1
answer
4k
views
Why is 0.7, in general, the default value of temperature for LLMs?
I have recently read through a lot of documentation and articles about Large Language Models (LLMs), and I have come to the conclusion that 0.7 is, most of the time, the default value for the ...
0
votes
1
answer
78
views
Suggestions to learn the Machine Learning models in greater depth?
I've been learning machine learning for the past few weeks from books and online courses. The books I've been reading, and currently still reading is "Hands-On Machine Learning with Scikit-Learn ...
0
votes
0
answers
58
views
Does clustering belong to the domain of data mining or to the domain of machine learning?
Question 1. Does clustering belong to the domain of data mining or to the domain of machine learning? Or to both domains?
Question 2. Depending on the answer to Question 1, could you please suggest a ...
1
vote
0
answers
33
views
Are "textbook backpropagation" still relevant?
The above backpropagation algorithm is taken from Shalev Shwartz and Ben-David's textbook: Understanding Machine Learning. This algorithm is described in the same way as the one in Mostafa's textbook, ...
0
votes
1
answer
96
views
How to use API Documentation of dataset
Some websites provide a link to a dataset (in Excel sheets format) which allows the dataset to be downloaded. But some others additionally provide API documentation, like this site. Can you please ...
1
vote
1
answer
178
views
Is it ok to normalize data using minmaxscalar on dependent variable?
I'm trying to make a sales prediction using the column X = item_amount and y = item_price_total, I'm confused whether it's okay to normalize data on the dependent variable using minmaxscalar?
With the ...
4
votes
1
answer
82
views
Resources for Promotion/Demotion Strategies for ML Item Recommendation Systems?
We are looking to design a system where specific items or categories of items can be boosted/promoted up or relegated/demoted down the recommendation order.
What are the common strategies or standards ...
0
votes
3
answers
2k
views
How to remove outliers properly?
I was wondering what is the best practice for removing outliers from data. Plotting a boxplot for each feature (column of the dataset) and removing data that fall outside the whiskers seems like a ...
1
vote
1
answer
75
views
What are the possible applications of a Data Scientist in the design fase of an Aerospace Or Railway Engineering industry? [closed]
I have been trying to understand this for a long time, but this information proves to be incredibly elusive online.
What are possible jobs that a pure Data Scientist, without much background knowledge,...