Bozo Detection

Universal Feed Parser can parse feeds whether they are well-formed XML or not. However, since some applications may wish to reject or warn users about non-well-formed feeds, Universal Feed Parser sets the bozo bit when it detects that a feed is not well-formed. Thanks to Tim Bray for suggesting this terminology.

Detecting a non-well-formed feed

>>> d = feedparser.parse('')
>>> d.bozo
>>> d = feedparser.parse('')
>>> d.bozo
>>> d.bozo_exception
<xml.sax._exceptions.SAXParseException instance at 0x00BAAA08>
>>> exc = d.bozo_exception
>>> exc.getMessage()
"expected '>'\\n"
>>> exc.getLineNumber()

There are many reasons an XML document could be non-well-formed besides this example (incomplete end tags) See Character Encoding Detection for some other ways to trip the bozo bit.