ETag and Last-Modified Headers

ETags and Last-Modified headers are two ways that feed publishers can save bandwidth, but they only work if clients take advantage of them. Universal Feed Parser gives you the ability to take advantage of these features, but you must use them properly.

The basic concept is that a feed publisher may provide a special HTTP header, called an ETag, when it publishes a feed. You should send this ETag back to the server on subsequent requests. If the feed has not changed since the last time you requested it, the server will return a special HTTP status code (304) and no feed data.

Using ETags to reduce bandwidth

>>> import feedparser
>>> d = feedparser.parse('https://feedparser.readthedocs.io/en/latest/examples/atom10.xml')
>>> d.etag
'"6c132-941-ad7e3080"'
>>> d2 = feedparser.parse('https://feedparser.readthedocs.io/en/latest/examples/atom10.xml', etag=d.etag)
>>> d2.status
304
>>> d2.feed
{}
>>> d2.entries
[]
>>> d2.debug_message
'The feed has not changed since you last checked, so
the server sent no data.  This is a feature, not a bug!'

There is a related concept which accomplishes the same thing, but slightly differently. In this case, the server publishes the last-modified date of the feed in the HTTP header. You can send this back to the server on subsequent requests, and if the feed has not changed, the server will return HTTP status code 304 and no feed data.

Using Last-Modified headers to reduce bandwidth

>>> import feedparser
>>> d = feedparser.parse('https://feedparser.readthedocs.io/en/latest/examples/atom10.xml')
>>> d.modified
Fri, 11 Jun 2012 23:00:34 GMT
>>> d.modified_parsed
(2004, 6, 11, 23, 0, 34, 4, 163, 0)
>>> d2 = feedparser.parse('https://feedparser.readthedocs.io/en/latest/examples/atom10.xml', modified=d.modified)
>>> d2.status
304
>>> d2.feed
{}
>>> d2.entries
[]
>>> d2.debug_message
'The feed has not changed since you last checked, so
the server sent no data.  This is a feature, not a bug!'

Clients should support both ETag and Last-Modified headers, as some servers support one but not the other.

Important

If you do not support ETag and Last-Modified headers, you will repeatedly download feeds that have not changed. This wastes your bandwidth and the publisher’s bandwidth, and the publisher may ban you from accessing their server.

Note

You can control the behaviour of HTTP caches between your application and the origin server by using the extra_headers parameter. For example, you may want to send Cache-control: max-age=60 to make the caches revalidate against the origin server unless their cached copy is less than a minute old. Again, this should be used with consideration.