You are not logged in. Your edit will be placed in a queue until it is peer reviewed.
We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.
-
11I mean, but like, your partners, that you're still profiting off of by selling them our data, are still non-complaint with CC-BY-SA... They're training off of our data the same way anyone who trains off of the data dump would be. I'd hazard a guess this is really just an effort to protect the revenue streams derived from selling our data.user400654– user4006542025-08-21 19:32:30 +00:00Commented Aug 21, 2025 at 19:32
-
1"misuse, such as non-compliant use of CC-BY-SA content by large companies" - What exactly do you consider to be "non-compilant use of CC-BY-SA"? I assume you are talking about training of LLMs but what exactly would it make compilant (I think there were blog posts when you announced the partnerships with Google et al but these weren't that clear on that)? Anything else? Would it be compilant if someone creates an LLM and publishes it under CC-BY-SA together with a list of names+links of all SO/SE users? Does it make a difference for you whether the people creating LLMs are "large companies"?dan1st– dan1st2025-08-21 19:37:13 +00:00Commented Aug 21, 2025 at 19:37
-
3First, thank you for the additional information, and I recognize this decision probably wasn't yours (or Berthold's) personally. When you say that "[...] they were specifically crafted to be easily spotted by [...] our knowledgeable users", does that mean that if future data dumps contain watermarks, they would be similarly visible? It's rather time consuming if the community wants to use the data dump, they'd have to go search for a different one each time.cocomac– cocomac2025-08-21 19:37:56 +00:00Commented Aug 21, 2025 at 19:37
-
12"while ensuring that those engaging with the data in good faith could immediately recognize the watermark" How does engaging in good faith make this watermark more noticeable? How is it less noticeable if I had ill-intent? Do you really think a company would just download the data and chuck it right into their model without looking at it or understanding it? Do you think the people who would do that would also monitor Meta for discussions about the data dump? How many days of "protection" did you get by hiding the fact that you watermarked the dumps? This is so wrongheaded.ColleenV– ColleenV2025-08-21 19:40:50 +00:00Commented Aug 21, 2025 at 19:40
-
10I think many of us are really confused by the use of "safety & security" in this context. It certainly sounds like it is being as an excuse to maintain a shroud of secrecy. Am I correct in interpreting that this is NOT being used in the technical aspect of "cyber security," but rather in business terms, referring to "protecting the line of business"?anon– anon2025-08-21 20:14:39 +00:00Commented Aug 21, 2025 at 20:14
-
14I've been having a really hard time understanding the rather blatant contradiction between "they were specifically crafted to be easily spotted" and the insistence that notifying the Community in advance was some sort of safety/security issue. Surely, if you anticipated the "watermark" would be found, then you also anticipated that this post would be created when it was found, and that the sentiment from the Community would be negative. Now that the watermark has been found, and publicized---by the company's admission, the watermark is now devalued---will it be removed or changed?anon– anon2025-08-21 20:33:42 +00:00Commented Aug 21, 2025 at 20:33
-
1My response doesn't fit anywhere near into a comment, so there it is.Thomas Owens– Thomas Owens2025-08-21 21:00:10 +00:00Commented Aug 21, 2025 at 21:00
-
111) What other changes have you introduced into the data dump to "protect the integrity of the data"? 2) Why should we trust your answer to question #1?JonathanZ– JonathanZ2025-08-21 21:59:15 +00:00Commented Aug 21, 2025 at 21:59
-
9"while ensuring that those engaging with the data in good faith could immediately recognize the watermark" ─ How do you expect that the bogus data will be recognised by those acting in good faith, and not by those acting in bad faith? it will be recognised based on the user's competence, not their moral standing. Why should there be any correlation? Bad people can use data competently.kaya3– kaya32025-08-22 11:03:14 +00:00Commented Aug 22, 2025 at 11:03
-
8"Instead, these measures were implemented with the overarching goal of protecting the integrity of the data ..." ─ You intended to protect the integrity of the data by (checks notes) ruining the integrity of the data? ─ "... and reinforcing its intended use within the community." ─ How does the silent addition of bogus data communicate anything about intended use? It just harms all users of the data.kaya3– kaya32025-08-22 11:06:29 +00:00Commented Aug 22, 2025 at 11:06
-
8Re: "This very communication* – There was no communication, and that's part of the issue. I strongly detest calling these sanitized, after the fact, "welp, you found it" posts "communication" at all, and am flabbergasted that the company has the gall to say "you should actually be thankful we posted this at all, we're being transparent". You hid a development that does not serve its stated goal and actively harms legitimate users. No, gratitude is not a justified response from the community here.zcoop98– zcoop982025-08-22 17:13:43 +00:00Commented Aug 22, 2025 at 17:13
-
5At best, in the most gracious possible read, the company is struggling once again to balance pursuing its legitimate interests with communicating and being authentically transparent, and once again, it's failing miserably. I don't understand how we continue to end up here, and I don't think it's fair or reasonable to fall back on the "well actually we mean well" defense over and over. At some point, faith will erode enough that the folks who truly care will stop caring as much, will stop fighting as much, will stop being your sounding board, and your end product will be worse as a result.zcoop98– zcoop982025-08-22 17:15:40 +00:00Commented Aug 22, 2025 at 17:15
-
6All of this is just... tired; old hat. It would frankly be less exhausting if the company stopped explicitly calling its connection with the community one of its strengths... because that's why this is all so dang infuriating, time and time and time again. We're so far past "Rebuilding trust in us as responsible stewards of the network remains our top priority". Choices like these simply do not align with that statement.zcoop98– zcoop982025-08-22 17:22:15 +00:00Commented Aug 22, 2025 at 17:22
-
1"These posts were never hidden or intended to deceive the community". This statement is baffling. You silently introduced falsified data into the data dumps, without telling anyone, and claim that it wasn't "hidden"?Steve Bennett– Steve Bennett2025-09-03 12:40:45 +00:00Commented Sep 3, 2025 at 12:40
Add a comment
|
How to Edit
- Correct minor typos or mistakes
- Clarify meaning without changing it
- Add related resources or links
- Always respect the author’s intent
- Don’t use edits to reply to the author
How to Format
-
create code fences with backticks ` or tildes ~
```
like so
``` -
add language identifier to highlight code
```python
def function(foo):
print(foo)
``` - put returns between paragraphs
- for linebreak add 2 spaces at end
- _italic_ or **bold**
- indent code by 4 spaces
- backtick escapes
`like _so_` - quote by placing > at start of line
- to make links (use https whenever possible)
<https://example.com>[example](https://example.com)<a href="https://example.com">example</a>
How to Tag
A tag is a keyword or label that categorizes your question with other, similar questions. Choose one or more (up to 5) tags that will help answerers to find and interpret your question.
- complete the sentence: my question is about...
- use tags that describe things or concepts that are essential, not incidental to your question
- favor using existing popular tags
- read the descriptions that appear below the tag
If your question is primarily about a topic for which you can't find a tag:
- combine multiple words into single-words with hyphens (e.g. stack-overflow), up to a maximum of 35 characters
- creating new tags is a privilege; if you can't yet create a tag you need, then post this question without it, then ask the community to create it for you