Skip to main content
added 2 characters in body; edited tags; edited title
Source Link
Meta Andrew T.
  • 13.6k
  • 5
  • 48
  • 95

Choosing best resource for crawling data from StackExchangeStack Exchange sites

For the purposedpurpose of an academic research project, I would like to obtain detailed data on questions, answers, tags, users, etc. That is, I seek historical data, as detailed as possible. I have seen that there are three resources, as listed in this answer. Namely, the ApiAPI, the data dump, and the StakStack Exchange Data Explorer.

From what I understood, the API is more suitable for obtaining live data. Upon viewing the two other alternatives - the dump and the SEDE - it is not clear which one is more suitable. In the dump, one can just download zipped XMLs whereas in the SEDE one can send customized queries. 

Is it the case that the dump includes everything that can be gotten through the SEDE? Or does the SEDE provide richer data in some sense? Can someone explain the differences between these two and which adviseadvice on which one is more suitable given my purpose?

Choosing best resource for crawling data from StackExchange sites

For the purposed of an academic research project, I would like to obtain detailed data on questions, answers, tags, users, etc. That is, I seek historical data, as detailed as possible. I have seen that there are three resources, as listed in this answer. Namely, the Api, the data dump and the Stak Exchange Data Explorer.

From what I understood, the API is more suitable for obtaining live data. Upon viewing the two other alternatives - the dump and the SEDE - it is not clear which one is more suitable. In the dump, one can just download zipped XMLs whereas in the SEDE one can send customized queries. Is it the case that the dump includes everything that can be gotten through the SEDE? Or does the SEDE provide richer data in some sense? Can someone explain the differences between these two and which advise on which one is more suitable given my purpose?

Choosing best resource for crawling data from Stack Exchange sites

For the purpose of an academic research project, I would like to obtain detailed data on questions, answers, tags, users, etc. That is, I seek historical data, as detailed as possible. I have seen that there are three resources, as listed in this answer. Namely, the API, the data dump, and the Stack Exchange Data Explorer.

From what I understood, the API is more suitable for obtaining live data. Upon viewing the two other alternatives - the dump and the SEDE - it is not clear which one is more suitable. In the dump, one can just download zipped XMLs whereas in the SEDE one can send customized queries. 

Is it the case that the dump includes everything that can be gotten through the SEDE? Or does the SEDE provide richer data in some sense? Can someone explain the differences between these two and which advice on which one is more suitable given my purpose?

Source Link
splinter
  • 309
  • 1
  • 9

Choosing best resource for crawling data from StackExchange sites

For the purposed of an academic research project, I would like to obtain detailed data on questions, answers, tags, users, etc. That is, I seek historical data, as detailed as possible. I have seen that there are three resources, as listed in this answer. Namely, the Api, the data dump and the Stak Exchange Data Explorer.

From what I understood, the API is more suitable for obtaining live data. Upon viewing the two other alternatives - the dump and the SEDE - it is not clear which one is more suitable. In the dump, one can just download zipped XMLs whereas in the SEDE one can send customized queries. Is it the case that the dump includes everything that can be gotten through the SEDE? Or does the SEDE provide richer data in some sense? Can someone explain the differences between these two and which advise on which one is more suitable given my purpose?