Skip to main content

Questions tagged [dataset]

Requests for datasets are off-topic on this site. Use this tag for questions concerning creating, processing, or maintaining datasets.

187 votes
15 answers
57k views

In a recent article of Amstat News, the authors (Mark van der Laan and Sherri Rose) stated that "We know that for large enough sample sizes, every study—including ones in which the null hypothesis of ...
Carlos Accioly's user avatar
103 votes
25 answers
43k views

I've been working on a new method for analyzing and parsing datasets to identify and isolate subgroups of a population without foreknowledge of any subgroup's characteristics. While the method works ...
96 votes
6 answers
9k views

In my job role I often work with other people's datasets; non-experts bring me clinical data and I help them summarise it and perform statistical tests. The problem I am having is that the datasets I ...
Chris Beeley's user avatar
  • 5,921
95 votes
2 answers
193k views

I have seen the min-max normalization formula but that normalizes values between 0 and 1. How would I normalize my data between -1 and 1? I have both negative and positive values in my data matrix.
covfefe's user avatar
  • 1,299
72 votes
8 answers
123k views

This question is motivated by my question on meta-analysis. But I imagine that it would also be useful in teaching contexts where you want to create a dataset that exactly mirrors an existing ...
Jeromy Anglim's user avatar
53 votes
3 answers
21k views

EDIT: The Web Technologies and Services CRAN task view contains a much more comprehensive list of data sources and APIs available in R. You can submit a pull request on github if you wish to add a ...
44 votes
9 answers
40k views

When teaching an introductory level class, the teachers I know tend to invent some numbers and a story in order to exemplify the method they are teaching. What I would prefer is to tell a real story ...
43 votes
8 answers
1k views

My workplace has employees from a very wide range of disciplines, so we generate data in lots of different forms. Consequently, each team has developed its own system for storing data. Some use ...
41 votes
2 answers
7k views

"Big data" is everywhere in the media. Everybody says that "big data" is the big thing for 2012, e.g. KDNuggets poll on hot topics for 2012. However, I have deep concerns here. With big data, ...
Has QUIT--Anony-Mousse's user avatar
38 votes
5 answers
21k views

What are the freely available data set for classification with more than 1000 features (or sample points if it contains curves)? There is already a community wiki about free data sets: Locating ...
36 votes
5 answers
3k views

Let's say I am studying how daffodils respond to various soil conditions. I have collected data on the pH of the soil versus the mature height of the daffodil. I'm expecting a linear relationship, ...
SlowMagic's user avatar
  • 625
36 votes
3 answers
4k views

I've just come across Anscombe's quartet (four datasets that have almost indistinguishable descriptive statistics but look very different when plotted) and I am curious if there are other more or less ...
36 votes
3 answers
22k views

Is there a visualization model that is good for showing the intersection overlap of many sets? I am thinking something like Venn diagrams but that somehow might lend itself better to a larger number ...
Kyle Brandt's user avatar
36 votes
2 answers
2k views

I'll propose this question by means of an example. Suppose I have a data set, such as the boston housing price data set, in which I have continuous and categorical variables. Here, we have a "quality"...
Marcel's user avatar
  • 1,430
31 votes
5 answers
89k views

Can someone summarize for me with possible examples, at what situations increasing the training data improves the overall system? When do we detect that adding more training data could possibly over-...
madCode's user avatar
  • 796

15 30 50 per page
1
2 3 4 5
129