Skip to main content
19 events
when toggle format what by license comment
S Sep 18, 2025 at 18:23 history suggested Ali Khakbaz CC BY-SA 4.0
fixed grammar
Sep 18, 2025 at 12:11 review Suggested edits
S Sep 18, 2025 at 18:23
Sep 11, 2025 at 17:43 comment added JimmyJames I know this doesn't help you but I can't see why they didn't use headers for this. Maybe I'm missing something but it looks like another example of meta creating a new standard for something that's already solved by HTTP natively.
Sep 10, 2025 at 19:12 history edited Lamron CC BY-SA 4.0
"deny" detail
Sep 9, 2025 at 15:46 answer added Doc Brown timeline score: 2
Sep 9, 2025 at 12:23 comment added Lamron To find trending words, my program analyze Bluesky posts and it isn't related to OGP.
Sep 9, 2025 at 11:08 comment added Basilevs Larmon, how do you find trending words? Are you scanning some sites? I could be wrong here.
Sep 9, 2025 at 10:54 comment added Doc Brown Correct me, but AFAIK the purpose of a robots.txt is usually to stop search crawlers to scan an entire web site frequently, not to stop anybody from seeing the content of a site (or their headlines) at all. If someone adds OGP data to their site, they want the headlines to be presented on social media / newsfeeds, and the content of robots.txt should usually be in line with that goal (otherwise is misdesigned, which is nothing which should not be your concern.)
Sep 9, 2025 at 10:29 comment added freakish @Basilevs yeah, yeah. I'm pretty sure companies around the world are ethical with regards to our data as well. Sorry, I don't give a f**k.
Sep 9, 2025 at 10:10 comment added Basilevs @freakish ethical bot respects robots.txt and presents accurate agent name.
Sep 9, 2025 at 8:30 comment added freakish If you truely want to download only meta tags, which typically reside inside <head></head> tag, then you can always just download the page (and parse) chunk by chunk, until you see </head> tag. Choose an xml parser that works chunk by chunk, there are plenty of them. Doable, but pain in the a**. Plus closing an incomplete connection might be suspicious.
Sep 9, 2025 at 8:25 comment added freakish "If website denies bots, it becomes impossible to get OGP data." I don't understand this statement. You literally just make a request to the web server and parse the result. There's no way for the server to prevent that (well, unless you do like millions of requests in short time). They cannot deny you. Just like they cannot deny a human user. There is no difference, as long as you behave. As for the first question: why downloading entire page is a problem? HTML doesn't weight that much compared to say images or videos.
Sep 9, 2025 at 6:31 review Close votes
Sep 14, 2025 at 3:00
Sep 9, 2025 at 3:05 history edited Lamron CC BY-SA 4.0
Add actual cases
Sep 8, 2025 at 22:20 history edited Arseni Mourzenko CC BY-SA 4.0
added 126 characters in body; edited tags; edited title
Sep 8, 2025 at 22:17 comment added Arseni Mourzenko Good question. I took a liberty to make a few changes, in order to make the question clearer and reduce the risk for it to be downvoted and closed. Check if your intention was preserved. You may also want to add the example of your particular case, i.e. why exactly do you want to extract OGP in the first place—answers may vary depending on that.
Sep 8, 2025 at 22:15 history edited Arseni Mourzenko CC BY-SA 4.0
added 126 characters in body; edited tags; edited title
S Sep 8, 2025 at 21:39 review First questions
Sep 9, 2025 at 1:40
S Sep 8, 2025 at 21:39 history asked Lamron CC BY-SA 4.0