Timeline for Can I get Open Graph Protocol data without behaving as a web scraper?

Current License: CC BY-SA 4.0

Post Revisions

19 events

when toggle format	what		by	license	comment
S Sep 18, 2025 at 18:23	history	suggested	Ali Khakbaz	CC BY-SA 4.0	fixed grammar
Sep 18, 2025 at 12:11	review	Suggested edits
S Sep 18, 2025 at 18:23
Sep 11, 2025 at 17:43	comment	added	JimmyJames		I know this doesn't help you but I can't see why they didn't use headers for this. Maybe I'm missing something but it looks like another example of meta creating a new standard for something that's already solved by HTTP natively.
Sep 10, 2025 at 19:12	history	edited	Lamron	CC BY-SA 4.0	"deny" detail
Sep 9, 2025 at 15:46	answer	added	Doc Brown		timeline score: 2
Sep 9, 2025 at 12:23	comment	added	Lamron		To find trending words, my program analyze Bluesky posts and it isn't related to OGP.
Sep 9, 2025 at 11:08	comment	added	Basilevs		Larmon, how do you find trending words? Are you scanning some sites? I could be wrong here.
Sep 9, 2025 at 10:54	comment	added	Doc Brown		Correct me, but AFAIK the purpose of a robots.txt is usually to stop search crawlers to scan an entire web site frequently, not to stop anybody from seeing the content of a site (or their headlines) at all. If someone adds OGP data to their site, they want the headlines to be presented on social media / newsfeeds, and the content of robots.txt should usually be in line with that goal (otherwise is misdesigned, which is nothing which should not be your concern.)
Sep 9, 2025 at 10:29	comment	added	freakish		@Basilevs yeah, yeah. I'm pretty sure companies around the world are ethical with regards to our data as well. Sorry, I don't give a f**k.
Sep 9, 2025 at 10:10	comment	added	Basilevs		@freakish ethical bot respects robots.txt and presents accurate agent name.
Sep 9, 2025 at 8:30	comment	added	freakish		If you truely want to download only meta tags, which typically reside inside `<head></head>` tag, then you can always just download the page (and parse) chunk by chunk, until you see `</head>` tag. Choose an xml parser that works chunk by chunk, there are plenty of them. Doable, but pain in the a**. Plus closing an incomplete connection might be suspicious.
Sep 9, 2025 at 8:25	comment	added	freakish		"If website denies bots, it becomes impossible to get OGP data." I don't understand this statement. You literally just make a request to the web server and parse the result. There's no way for the server to prevent that (well, unless you do like millions of requests in short time). They cannot deny you. Just like they cannot deny a human user. There is no difference, as long as you behave. As for the first question: why downloading entire page is a problem? HTML doesn't weight that much compared to say images or videos.
Sep 9, 2025 at 6:31	review	Close votes
Sep 14, 2025 at 3:00
Sep 9, 2025 at 3:05	history	edited	Lamron	CC BY-SA 4.0	Add actual cases
Sep 8, 2025 at 22:20	history	edited	Arseni Mourzenko	CC BY-SA 4.0	added 126 characters in body; edited tags; edited title
Sep 8, 2025 at 22:17	comment	added	Arseni Mourzenko		Good question. I took a liberty to make a few changes, in order to make the question clearer and reduce the risk for it to be downvoted and closed. Check if your intention was preserved. You may also want to add the example of your particular case, i.e. why exactly do you want to extract OGP in the first place—answers may vary depending on that.
Sep 8, 2025 at 22:15	history	edited	Arseni Mourzenko	CC BY-SA 4.0	added 126 characters in body; edited tags; edited title
S Sep 8, 2025 at 21:39	review	First questions
Sep 9, 2025 at 1:40
S Sep 8, 2025 at 21:39	history	asked	Lamron	CC BY-SA 4.0

toggle format