2

I recall a mention by Jeff and Joel, perhaps on the podcast, of forwarding requests from web-crawlers to a dedicated environment.

The idea was to serve the same content as the real site, but to have more control over the format for SEO optimization as well as to keep the crawlers from overwhelming the "live" site.

I'd like to know if there is any additional descriptions of this approach and whether it was just a passing thought or an idea that was implemented.

2
  • To receive the bounty I'd like to know not only whether/whether-not SO does this, but some basic pros/cons and considerations of the approach Commented May 4, 2011 at 22:05
  • 3
    You would be better served for the general question (pros/cons and considerations) by going to webmasters.stackexchange.com Commented May 4, 2011 at 22:54

1 Answer 1

6
+50

Search engines are routed through a separate back-end on our HAProxy server - but they point to the same servers and see the same content that anyone else would see when browsing (well except most of them don't implement javascript - but that's not something we do). We do this simply to allow us to have more fine grained control of the resources that web crawlers utilize and to be able to tweak it without effecting real people's experience.

4
  • 1
    What do you use to detect the spider-ness of a spider? User agent sniffing? IP block ownership? Magic unicorn dust? Commented May 4, 2011 at 23:19
  • We just base it off user agent, not perfect but reliably catches the big guys. Commented May 4, 2011 at 23:27
  • @Zypher So search engines, like have a standard for their bots or somethin'? Commented Sep 11, 2012 at 14:51
  • @YatharthROCK no no standard, but the big guys A) have good UA strings, and B) those UA strings are well known. Commented Sep 11, 2012 at 16:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.