Skip to main content
Notice removed Authoritative reference needed by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Notice added Authoritative reference needed by Anton Menshov
Bounty Started worth 50 reputation by Anton Menshov
Notice removed Authoritative reference needed by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Notice added Authoritative reference needed by Zoe - Save the data dump
Bounty Started worth 500 reputation by Zoe - Save the data dump
Notice removed Authoritative reference needed by CommunityBot
Bounty Ended with no winning answer by CommunityBot
'Stack Exchange' is the legal name; trademark capitalization; spelling; grammar; layout.
Source Link
M--
  • 5.9k
  • 2
  • 18
  • 50

So moreMore than a year ago - during the initial discussions about the new, per site data dump system, I'd asked if I could get a full copy of the data dumps. I followed up during the public release and I was told this was possible - on Jul 12th.

For Reference - this is the conversation that resulted in me making the request.

A screenshot of a section of the comments linked above where Philippe the VP of Community says “If you request the backup for that purpose, I'll see that we get it to you.“ in response to “the simplest way would be to have someone request a full copy, for the explicit goal of having a backup of the entire database”

I lodged a ticket on August 8th 2024 (134427). I've been waiting for a rather long time, but so far I've not got a response about it outside "we're checking".

Now, my intention was to help keep a trusted, archive copy, with my own resources for these data dumps. I'd agreed not to use it to train LLMs, and basically went through the process SEStack Exchange had suggested. I've provided very early notification I was going to request, and practically, if they'd said no, I'd probably be thinking of alternative ways to do this.

I'd also reminded the staff involved multiple times over the past year, though at one point, I'd given up that I'd get an answer and asked them to let me know when it was sorted. We're a full year of dumps on with no real answer.  Practically, Our best option ended up being public third party archives online - which ironically was what the company was trying to prevent.

Now, if I didn't want to go through the process the company suggested - in theory, works for downloading, or I can go after the fact and download it from internet archiveInternet Archive. However, Asas with many things - I was hoping the company would have given a straight answer and not made promises it didn't keep.

As such, I'd like to ask - was the process of obtaining a full/complete data dump ever planned and a year on, are there any plans to actually make these available for legitimate uses?

I believe I've been incredibly patient over the issue  - and considering one of my main goals was to show the company would accede to reasonable requests. So..., are there any plans to actually make the full data dump available with reasonable cause or did the company mislead us? If the process I followed was incorrect, can someone advise me what the company's expectations are of someone requesting a full copy of the data dump for legitimate reasons, and if I've missed any criteria required?

So more than a year ago - during the initial discussions about the new, per site data dump system, I'd asked if I could get a full copy of the data dumps. I followed up during the public release and I was told this was possible - on Jul 12th.

For Reference - this is the conversation that resulted in me making the request.

A screenshot of a section of the comments linked above where Philippe the VP of Community says “If you request the backup for that purpose, I'll see that we get it to you.“ in response to “the simplest way would be to have someone request a full copy, for the explicit goal of having a backup of the entire database”

I lodged a ticket on August 8th 2024 (134427). I've been waiting for a rather long time, but so far I've not got a response about it outside "we're checking".

Now, my intention was to help keep a trusted, archive copy, with my own resources for these data dumps. I'd agreed not to use it to train LLMs, and basically went through the process SE had suggested. I've provided very early notification I was going to request, and practically, if they'd said no, I'd probably be thinking of alternative ways to do this.

I'd also reminded the staff involved multiple times over the past year, though at one point, I'd given up that I'd get an answer and asked them to let me know when it was sorted. We're a full year of dumps on with no real answer.  Practically, Our best option ended up being public third party archives online - which ironically was what the company was trying to prevent.

Now, if I didn't want to go through the process the company suggested - in theory, works for downloading, or I can go after the fact and download it from internet archive. However, As with many things - I was hoping the company would have given a straight answer and not made promises it didn't keep.

As such, I'd like to ask - was the process of obtaining a full/complete data dump ever planned and a year on, are there any plans to actually make these available for legitimate uses?

I believe I've been incredibly patient over the issue  - and considering one of my main goals was to show the company would accede to reasonable requests. So... are there any plans to actually make the full data dump available with reasonable cause or did the company mislead us? If the process I followed was incorrect, can someone advise me what the company's expectations are of someone requesting a full copy of the data dump for legitimate reasons, and if I've missed any criteria required?

More than a year ago - during the initial discussions about the new, per site data dump system, I'd asked if I could get a full copy of the data dumps. I followed up during the public release and I was told this was possible - on Jul 12th.

For Reference - this is the conversation that resulted in me making the request.

A screenshot of a section of the comments linked above where Philippe the VP of Community says “If you request the backup for that purpose, I'll see that we get it to you.“ in response to “the simplest way would be to have someone request a full copy, for the explicit goal of having a backup of the entire database”

I lodged a ticket on August 8th 2024 (134427). I've been waiting for a rather long time, but so far I've not got a response about it outside "we're checking".

Now, my intention was to help keep a trusted, archive copy, with my own resources for these data dumps. I'd agreed not to use it to train LLMs, and basically went through the process Stack Exchange had suggested. I've provided very early notification I was going to request, and practically, if they'd said no, I'd probably be thinking of alternative ways to do this.

I'd also reminded the staff involved multiple times over the past year, though at one point, I'd given up that I'd get an answer and asked them to let me know when it was sorted. We're a full year of dumps on with no real answer. Practically, Our best option ended up being public third party archives online - which ironically was what the company was trying to prevent.

Now, if I didn't want to go through the process the company suggested - in theory, works for downloading, or I can go after the fact and download it from Internet Archive. However, as with many things - I was hoping the company would have given a straight answer and not made promises it didn't keep.

As such, I'd like to ask - was the process of obtaining a full/complete data dump ever planned and a year on, are there any plans to actually make these available for legitimate uses?

I believe I've been incredibly patient over the issue - and considering one of my main goals was to show the company would accede to reasonable requests. So, are there any plans to actually make the full data dump available with reasonable cause or did the company mislead us? If the process I followed was incorrect, can someone advise me what the company's expectations are of someone requesting a full copy of the data dump for legitimate reasons, and if I've missed any criteria required?

Notice added Authoritative reference needed by Zoe - Save the data dump
Bounty Started worth 400 reputation by Zoe - Save the data dump
Notice removed Authoritative reference needed by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Notice added Authoritative reference needed by Zoe - Save the data dump
Bounty Started worth 200 reputation by Zoe - Save the data dump
Notice removed Draw attention by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Notice added Draw attention by Zoe - Save the data dump
Bounty Started worth 100 reputation by Zoe - Save the data dump
Notice removed Draw attention by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Notice added Draw attention by Zoe - Save the data dump
Bounty Started worth 50 reputation by Zoe - Save the data dump