16

Visiting this deleted MSO post yields this (in a private window, I used ?noredirect=1). I can reproduce this in Chromium 135.0.7049.52 and Firefox 137.0.

Screenshot of 404 page with broken 404 image

Sure enough, this appears in the dev console (whitespace removed)

GET https://meta.stackoverflow.com/questions/433526/braces-being-shown-above-question-title?noredirect=1 [HTTP/2 404 184ms]
Security Error: Content at https://meta.stackoverflow.com/questions/433526/braces-being-shown-above-question-title?noredirect=1 may not load or link to file:///Content/Sites/stackoverflowmeta/img/keyboard-waffles.jpg.
Security Error: Content at https://meta.stackoverflow.com/questions/433526/braces-being-shown-above-question-title?noredirect=1 may not load or link to file:///Content/Sites/stackoverflowmeta/img/keyboard-waffles.jpg. 

This is the offending element:

<img class="wmx100" src="file://cdn.sstatic.net/Content/Sites/stackoverflowmeta/img/keyboard-waffles.jpg" alt="Page not found">

Can the 404 image be restored?


Title taken from this post by nicael licensed under the CC BY-SA 4.0 license.

7
  • Bug. I doubt that anyone has the requisite image in their local /Content subdirectory. (file: discards the server part of the URL.) Most browsers wouldn't load it even if anyone did.
    – Nanigashi
    Commented Apr 8 at 5:02
  • Interestingly, if you replace file: with https:, the image still doesn't exist.
    – Nanigashi
    Commented Apr 8 at 5:07
  • You would need to put the domain in, @Nanigashi ; https://stackoverflow.com/Content/Sites/stackoverflowmeta/img/keyboard-waffles.jpg
    – Thom A
    Commented Apr 8 at 8:50
  • 9
    I've fixed the symptom for now, but there's a surprisingly deep multilayered bug involved here. I'll post an answer once I've unraveled and fixed all that, but it might be a day or two before I get to that.
    – balpha StaffMod
    Commented Apr 8 at 13:24
  • I was like "wait that sounds familiar" and then like "oh I see"
    – nicael
    Commented Apr 10 at 12:59
  • 1
    @balpha: I'm looking forward to the explanation once y'all get it fixed.
    – V2Blast
    Commented Apr 13 at 19:54
  • 2
    @V2Blast There you go 😀
    – balpha StaffMod
    Commented Apr 14 at 11:18

1 Answer 1

21

As I mentioned in a comment, I fixed the symptom quickly, but the root cause of this was surprisingly interesting.

Buckle up, I'm putting on my "back in my day" hat.

Historically, when we've deployed the Q&A application, we've been doing this in two steps. The first step only deploys to https://meta.stackexchange.com/ (MSE) and https://meta.stackoverflow.com/ (MSO). The second step then deploys to the rest of the network.

We called those steps "deploying Meta" and "deploying others".

This allowed us to use MSE and MSO as simple canaries if we wanted to check that things behaved as expected in production. In most cases, we simply deployed both, and now that we're in GCP, this distinction doesn't exist anymore.

The way that this two-tier deployment worked was quite simple: Our data center had eleven primary web servers, numbered NY-WEB01 thru NY-WEB11. Each of those web servers ran the identical Q&A application, and each web server was able to serve any site (MSE, MSO, or otherwise). But the load balancer was configured to send MSE and MSO traffic to NY-WEB10 and NY-WEB11, and all other traffic to all the other servers.

That way, the distinction between deploying Meta and deploying others was simply to which servers we deployed the application in the respective step.

Now, as you may or may not know, we usually serve our static assets (JavaScript, style sheets, images etc.) from a separate domain called sstatic.net. Back in the stone age, this was a best practice; these days probably not so much, and it's more of a historical artifact now.

At some point, we started using a dedicated CDN for those static files, under the domain cdn.sstatic.net. So, a request to sstatic.net would go to our data center, while a request to cdn.sstatic.net would go to a CDN edge node close to you, and the CDN would probably already have that file cached. Only if it didn't, would it then request it once from sstatic.net. (These days with everything behind CloudFlare, the separate domains are yet another historical artifact.)

So far so good. But a lot of the static assets are shared between all the sites. And if we wanted to deploy "Meta" and "others" independently, that independence should include those static files. For that reason, on MSO and MSE we did not use sstatic.net, but instead served the files from the site's local /Content folder.

You can still see that happening: If you inspect this very page, you will see this:

<script src="https://meta.stackoverflow.com/Content/Js/stub.en.js?v=31c1a92afca8"></script>

but if you do the same thing on https://stackoverflow.com/, you'll see this instead:

<script src="https://cdn.sstatic.net/Js/stub.en.js?v=31c1a92afca8"></script>

And this finally brings us to the "page not found" image that this bug report was about. This image can be configured per-site. A lot of sites have a pretty boring default, but some of them have a quirky dedicated image that fits the site. Some examples:

On Server Fault for example, the image is configured as https://sstatic.net/Sites/serverfault/img/spaghetti-networking.jpg (on the static assets domain). But here on MSO, it's configured as /Content/Sites/stackoverflowmeta/img/keyboard-waffles.jpg (served locally as a relative URL).

Bored yet? We're getting close to the bug!

We have code (a method aptly named CDNify()) that changes the domain to cdn.sstatic.net if it's such an absolute URL, but that leaves relative URLs alone.

How does that code check whether it's an absolute or a relative URL? Us being Stack Overflow, you might think that we used a weird regular expression. But no, we actually did it the proper way, by using the framework-provided functionality in System.Uri.

Specifically in this case, we used

Uri.TryCreate(path, UriKind.Absolute, out var uri)

which returns true if it's an absolute URI, and false if it's not.

And this worked perfectly – until we moved to the cloud. What's the difference? In our data center, the application ran on Windows Server. In GCP, it runs in a Linux container. And it turns out that System.Uri behaves differently between the two operating systems. We aren't the first ones running into this.

Unlike on Windows, on Linux /foo/bar is a perfectly valid absolute file path. And because this framework functionality handles URIs, not just URLs (yes, this is the once-in-a-lifetime situtation where the difference actually matters!), Microsoft decided that on Linux, Uri.TryCreate should return true for /Content/... and create a file URI.

And that is why the 404 image had a file:// address, as you noticed.

My quick fix was to simply change the configuration value to an absolute URL, but I also wanted to make sure that we don't run into this issue anywhere else, by creating a helper that behaves identically across operating systems, and an automated check that prevents you from using the framework functionality directly. That's why it took me a few days before writing this answer.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.