Internet Explorer and cache problems

As we producers know, Internet Explorer has loads of unwanted features. One of the worst for generated content is cache handling.

On my famous and sad project Pyhä-Luosto I was running into this kind of problems. On the frontpage there is weather report, which updates in real life once an hour. I made a script to check whether the report was written correctly from the client-side and to get the previous if not.

I started getting complains of weather showing old data and thought first that it was due to this error checking. Ok, so the error checking goes off and we display only the most recent data (which is still an hour old at least).

Complains continued and only now I understood that it might be browser specific. Internet Explorer - surprise - wasn't loading the freshly generated data.

Browsing a lot through the net I found many different solutions. Most of them suggested sending HTTP headers with expiry date set to -1. Some told to just mark the time in the past. Pragma-headers with no cache set. Cache-control headers to tell not to store, not to cache and to force revalidation.

This is an example I was using and similar examples were given all around.

header('Pragma: no-cache');
header('Cache-Control: no-store, no-cache, must-revalidate'); header('Cache-Control: post-check=0, pre-check=0', FALSE); header('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); header('Expires: -1');

None of this worked.

Microsoft took a different stand in article "Pragma: No-cache" Tag May Not Prevent Page from Being Cached:

A page that Internet Explorer is browsing is not cached until half of the 64 KB buffer is filled. Usually, metatags are inserted in the header section of an HTML document, which appears at the beginning of the document. When the HTML code is parsed, it is read from top to bottom. When the metatag is read, Internet Explorer looks for the existence of the page in cache at that exact moment. If it is there, it is removed.

They offered also a resolution: place another head-tag in the end of the document after the body element and into that insert . Now how nice is that, they advice doing against HTML specifications as their solution. Tells again of the company policy.

Fortunately it is also possible to get around the specification issues with IE conditional tags. No need to ruin validity of anything.

Still, even with their own documentation I just simply kept getting the old pages. Even with checking through the HTTP headers sent to browser I couldn't figure out what was the issue, since I was getting the headers I was trying to send, but IE just ignored them.

Today I started digging through the HTTP headers and looking for something I might have missed. The answer was - hopefully - on the first row. The first thing server sends back to URI request is status code. Here we get to define the possible 404 errors and such.

MidCOM kept on sending 304 Not Modified. After overriding this with 201 Created I got my best good friend Internet Explorer load the pages without whining. At least after quick testing this seems to be the solution:

header('HTTP/1.x 201 Created'); 

Now I need to be running more tests and let others test as well.

Update

MidCOM doesn't like this while in AIS, probably it has some kind of conflict with sessions. Don't send the header above if you are using AIS. I commented it out this way:

<?php
if (!ereg('/midcom-admin/', $_MIDGARD['uri']))
{
header('HTTP/1.x 201 Created');
}
?>

Second Update

Internet Explorer acts still weirdly. It loads the page itself without problems, but when going back to the page with Back button, IE loads the page from cache. Only Lord knows - if even he - why it works this way.

Back