You are not logged in. Your edit will be placed in a queue until it is peer reviewed.
We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.
-
26I'm downvoting because this is stupid and accomplishes nothing other than diluting the value of the data dumps.Thomas Owens– Thomas Owens2025-08-19 17:00:48 +00:00Commented Aug 19, 2025 at 17:00
-
29And the whole "we were waiting for you all to find it" is also ridiculous. I know that people spent many hours of time and computational resources questioning the validity of the data dumps. I'd be more than happy to elaborate on all the reasons why this is useless and stupid, but it should be obvious to the decision makers.Thomas Owens– Thomas Owens2025-08-19 17:02:12 +00:00Commented Aug 19, 2025 at 17:02
-
30So, the data dumps have been a bit of a sore subject with the community for awhile now. Choosing to play a hide-and-seek game with the community here is like "Honey I know we've been talking about being more honest with money lately, but I knew you'd catch that casino withdrawal on our shared savings account quickly, it was intentional and not meant to be sneaky! If I was trying to hide it from you babe I'd have used cash."Bryan Krause– Bryan Krause2025-08-19 17:10:08 +00:00Commented Aug 19, 2025 at 17:10
-
12If you intentionally made this in a way that it is discovered quickly, why was the watermarking not announced in some way? Can you (some representative from the company, not some community member) please elaborate on why you want to watermark it/what exactly you want to achieve with "discouraging reuse" and why you (the company) thought (or are still thinking?) that was a good idea?dan1st– dan1st2025-08-19 17:11:29 +00:00Commented Aug 19, 2025 at 17:11
-
21@Berthold Mimimal? I know that people - myself included - spent hours understanding what happened, talking about it, writing about it. But now, it's also broken. And the discussions of the "changing economic landscape" neglect the fact that now that the watermark has been broken, I can make fixed versions available if I wanted to. All that was accomplished was further alienating the community and contributors who do all the work (for free, often), to give the content.Thomas Owens– Thomas Owens2025-08-19 17:35:51 +00:00Commented Aug 19, 2025 at 17:35
-
12@Berthold - And yet you knew that it would be noticed by the community, which, in turn, meant it would be brought up here on Meta, with the same end result of it being known about, with the added result of confusing and frustrating community members.Mithical– Mithical2025-08-19 17:46:10 +00:00Commented Aug 19, 2025 at 17:46
-
18"we didn't announce the watermark because it would reduce the effectiveness of the watermarking in the first place, with regard to the reusers it is meant to impact" and "We were waiting for you all to find it" doesn't make any sense. What did you think we were going to do with it when we found it other than post a discussion of it on Meta? This is the lamest excuse for wasting a whole bunch of people's time and good will that I have ever heard.ColleenV– ColleenV2025-08-19 17:53:13 +00:00Commented Aug 19, 2025 at 17:53
-
18And the worst part of this is that y'all are asking for community help to clean up goo.gl links at the same time you are wasting volunteers' valuable time on this snipe hunt. It's just rude to be so flippant about this. If you had discussed this with the community, we would have told you how useless a control it was and maybe helped you find a better one.ColleenV– ColleenV2025-08-19 18:15:43 +00:00Commented Aug 19, 2025 at 18:15
-
19"We didn't announce it because it would reduce effectiveness" is in complete contrast to "We knew the community would find it right away." If you foresaw that we would find it, then you surely foresaw that when we did, we would ask questions about it when we found it. In which case, I would have expected this response much sooner. That this response took a week to get out sure feels like you didn't anticipate us finding it, and have gotten caught with your hand in the cookie jar, again.anon– anon2025-08-19 18:26:50 +00:00Commented Aug 19, 2025 at 18:26
-
13@Berthold Perhaps I'm missing it, but when you refer to "security & safety," you're referring to securing the Company's ability to monetize the Data Dump, even though the permissive CC BY-SA license makes that difficult? If that's not what the Company is trying to secure & keep safe, perhaps you could expand on that aspect.anon– anon2025-08-19 18:31:59 +00:00Commented Aug 19, 2025 at 18:31
-
15Now that you have officially disclosed this watermark, it has lost much of its effectiveness. Please tell us what other ways the data dump has been modified for similar goals.JonathanZ– JonathanZ2025-08-19 19:26:09 +00:00Commented Aug 19, 2025 at 19:26
-
13@Berthold Please pass along my thanks to the folks who thought that including these extra rows in the data dump was a good idea. Please specifically pass it along with my name attached. I'm curious if they could estimate how long I spent trying to validate what was obviously bad data, downloading multiple dumps, comparing this quarter's to last to see if there were OTHER things broken. Then trying to troubleshoot what seemed to be an accident, until I was nearly done writing up the Q here. Then waiting a week for a response.anon– anon2025-08-19 23:07:07 +00:00Commented Aug 19, 2025 at 23:07
-
16Also, for the record, this is actually insulting to the community: "this was not meant to be sneaky! We were waiting for you all to find it." ...... (1) because I don't believe it for a SECOND, and if you think we will, its insulting to our intelligence... and (2) because if I do take it as face value, then you have zero consideration for the users of the Creative Commons Data Dump. Injecting intentionally bad data into a data product & intentionally keeping it a secret is what black hat hackers do.anon– anon2025-08-19 23:14:41 +00:00Commented Aug 19, 2025 at 23:14
-
16genuine question- if you were waiting for us, why does it take a whole week from status-tag escalation to say "you found me!" ?starball– starball Mod2025-08-20 05:08:01 +00:00Commented Aug 20, 2025 at 5:08
-
24"The posts are there to discourage reuse" - which... is the entire point of the creative commons licence for posts here is to encourage reuse. While I get the current business model involves selling data to LLM companies, Its worth keeping in mind that people contribute to this site precisely so others can reuse their hard won knowledgeJourneyman Geek– Journeyman Geek2025-08-20 05:34:01 +00:00Commented Aug 20, 2025 at 5:34
|
Show 11 more comments
How to Edit
- Correct minor typos or mistakes
- Clarify meaning without changing it
- Add related resources or links
- Always respect the author’s intent
- Don’t use edits to reply to the author
How to Format
-
create code fences with backticks ` or tildes ~
```
like so
``` -
add language identifier to highlight code
```python
def function(foo):
print(foo)
``` - put returns between paragraphs
- for linebreak add 2 spaces at end
- _italic_ or **bold**
- indent code by 4 spaces
- backtick escapes
`like _so_` - quote by placing > at start of line
- to make links (use https whenever possible)
<https://example.com>[example](https://example.com)<a href="https://example.com">example</a>
How to Tag
A tag is a keyword or label that categorizes your question with other, similar questions. Choose one or more (up to 5) tags that will help answerers to find and interpret your question.
- complete the sentence: my question is about...
- use tags that describe things or concepts that are essential, not incidental to your question
- favor using existing popular tags
- read the descriptions that appear below the tag
If your question is primarily about a topic for which you can't find a tag:
- combine multiple words into single-words with hyphens (e.g. stack-overflow), up to a maximum of 35 characters
- creating new tags is a privilege; if you can't yet create a tag you need, then post this question without it, then ask the community to create it for you