12
$\begingroup$

I am preparing a paper where some results involve computational verification of a conjecture. Of course, I am not proving the conjecture in full, but I verify it for some large values of the involved invariants using an algorithm.

I looked at several papers dealing with conjectures where computations were used to obtain some supportive results. I noticed different approaches, as in the following papers: PaperA and PaperB.

The following part is of course clear to me from what I have seen, that is, the process usually involves:

  1. Designing an algorithm to perform the computations;
  2. Reporting the numerical results in the paper;
  3. Sharing the code in another website (no in the paper, as also suggested here) so that referees and readers can reproduce the results.

My doubts and questions.

  1. Sharing the code. The code of PaperB was shared in a personal website here. I do not have a personal website. Should I open a GitHub repository to share the code and instructions, as done here? Or is there another recommended or simpler approach? I also read this question on MathOverflow, where most answers suggest using GitHub, but since that discussion is quite old, I am wondering whether there are any more up-to-date best practices for sharing research code.

  2. Declaring data usage. I noticed that in PaperA the authors stated "No data was used for the research described in the article." in the Data availability, whereas in PaperB they explicitly declared that data were used and available. I find this a bit confusing. Is there a standard practice here, or are both options acceptable depending on the journal?

$\endgroup$
9
  • 3
    $\begingroup$ as an alternative to GitHub, you can share the code and data on Zenodo; easy, free, reliable, it's my preferred option; most journals will insist that you make code and data available, the "no data was used" option does not apply to the case you describe. $\endgroup$ Commented Aug 25 at 10:20
  • $\begingroup$ @CarloBeenakker Nice, I appreciate your suggestion $\endgroup$ Commented Aug 25 at 10:29
  • 3
    $\begingroup$ Another alternative to Github: If you post preprints to the arXiv, you can also upload ancillary files with your submission. This is what I've done for sharing code related to papers. $\endgroup$ Commented Aug 25 at 11:28
  • 1
    $\begingroup$ storing code/data on arXiv has this drawback, which Zenodo/GitHub do not have: Ancillary files are stored with a particular version of an article and thus cannot be changed independently from the article. $\endgroup$ Commented Aug 25 at 12:32
  • 1
    $\begingroup$ You asked two questions, only one got answered. This site prefers one question per post. Just noticed the answer is updated to answer both so not to worry. $\endgroup$ Commented Aug 25 at 17:43

1 Answer 1

10
$\begingroup$

Q1: Sharing the code.

Rather than continuing the comment string, let me try and summarise why I find Zenodo preferable over GitHub as a repository for code and data, to accompany a research article.

Both Zenodo and GitHub have version control, you can update code and data without needing to change the manuscript file (which arXiv does not allow). The key difference is that Zenodo provides a DOI for each version, while GitHub uses URL's. Journals typically prefer a citable DOI, for discoverability and permanence. You can use GitHub Releases to freeze a version, but this still doesn’t create a DOI or permanent archive.

GitHub is convenient while you are developing the code, and it allows you to incorporate contributions from others. You can connect GitHub to Zenodo, so a convenient workflow would be to use GitHub for development and then make a Zenodo snapshot with a stable DOI$^\ast$ to cite with the article.

$^\ast$ Zenodo provides both a general DOI and a version-specific DOI: The general DOI points to all versions of the upload and is the one you would include in the publication.

Q2: Declaring data usage.

For completeness, re this second question: The declaration "No data was used for the research described in the article." means a user can check the results in the paper simply by reading the text and following the math --- without the need to write and run code. In the social or physical sciences it also means that no observational data was used to draw the conclusions in the paper, I guess that applies less to a math article.

$\endgroup$
5
  • $\begingroup$ Very interesting! I've just checked the Zenodo website, and it seems that they also allow uploading posters and other materials, so it might be useful not only for code. I have just one question: since both Zenodo and GitHub allow uploading code, if I link GitHub to Zenodo, can I upload the code only to GitHub without touching Zenodo, meaning that Zenodo will be automatically updated as well? In any case, I have to try them to get more familiar, I'm really new to them $\endgroup$ Commented Aug 25 at 13:23
  • 1
    $\begingroup$ yes, if you connect a GitHub repo to Zenodo, every new GitHub release automatically gets archived in Zenodo with a version-specific DOI, in addition to the "general" DOI that you would use for the publication. $\endgroup$ Commented Aug 25 at 13:24
  • $\begingroup$ From your second point, it seems that Paper A used a more suitable way of declaring the data, for instance. I'm not saying that Paper B used an incorrect approach, but rather that the method in Paper A appears to be the more appropriate one. Is that correct? $\endgroup$ Commented Aug 25 at 15:12
  • $\begingroup$ Papers A and B both provide source code, paper A in a GitHub repository, paper B in a personal web site; the former would be more advisable; I do note that the data availability statement of paper A goes against the journal policy, which states that "results of observations or experimentation that validate research findings" should be deposited in a public repository. $\endgroup$ Commented Aug 25 at 15:30
  • 1
    $\begingroup$ Ah no no, I was referring only to the statement for the Data availability... that is, in PaperA it is written "No data was used for the research described in the article." instead in PaperB "The source code to produce the datasets analyzed during the current study is available in the second author’s repository". Since "No data was used for the research described in the article" means a user can check the results in the paper without needing to write and running code, so PaperB did a more appropriate way (sorry, I switched the two papers in the previous comment). Let's say, it is just a detail $\endgroup$ Commented Aug 25 at 15:49

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.