Microformats

An emerging trend in feed syndication is the inclusion of microformats. Besides the semantics defined by individual feed formats, publishers can add additional semantics using rel and class attributes in embedded HTML content.

Note

To parse microformats. Universal Feed Parser relies on a third-party library called Beautiful Soup, which is distributed separately. If Beautiful Soup is not installed, Universal Feed Parser will silently skip microformats parsing.

The following elements are parsed for microformats:

rel=enclosure

The rel=enclosure microformat provides a way for embedded HTML content to specify that a certain link should be treated as an enclosure. Universal Feed Parser looks for links within embedded markup that meet any of the following conditions:

  • rel attribute contains enclosure (note: rel attributes can contain a list of space-separated values)
  • type attribute starts with audio/
  • type attribute starts with video/
  • type attribute starts with application/ but does not end with xml
  • href attribute ends with one of the following file extensions: .7z, .avi, .bin, .bz2, .bz2, .deb, .dmg, .exe, .gz, .hqx, .img, .iso, .jar, .m4a, .m4v, .mp2, .mp3, .mp4, .msi, .ogg, .ogm, .rar, .rpm, .sit, .sitx, .tar, .tbz2, .tgz, .wma, .wmv, .z, .zip

When Universal Feed Parser finds a link that satisfies any of these conditions, it adds it to entries[i].enclosures.

Parsing embedded enclosures

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/rel-enclosure.xml')
>>> d.entries[0].enclosures
[{u'href': u'http://example.com/movie.mp4', 'title': u'awesome movie'}]

rel=tag

The rel=tag microformat allows you to define tags within embedded HTML content. Universal Feed Parser looks for these attribute values in embedded markup and maps them to entries[i].tags.

Parsing embedded tags

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/rel-tag.xml')
>>> d.entries[0].tags
[{'term': u'tech', 'scheme': u'http://del.icio.us/tag/', 'label': u'Technology'}]

XFN

The XFN microformat allows you to define human relationships between URIs. For example, you could link from your weblog to your spouse’s weblog with the rel="spouse" relation. It is intended primarily for “blogrolls” or other static lists of links, but the relations can occur anywhere in HTML content. If found, Universal Feed Parser will return the XFN information in entries[i].xfn.

Universal Feed Parser supports all of the relationships listed in the XFN 1.1 profile, as well as the following variations:

  • coworker in addition to co-worker
  • coresident in addition to co-resident
  • relative in addition to kin
  • brother and sister in addition to sibling
  • husband and wife in addition to spouse

Parsing XFN relationships

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/xfn.xml')
>>> person = d.entries[0].xfn[0]
>>> person.name
u'John Doe'
>>> person.href
u'http://example.com/johndoe'
>>> person.relationships
[u'coworker', u'friend']

hCard

The hCard microformat allows you to embed address book information within HTML content. If Universal Feed Parser finds an hCard within supported elements, it converts it into an RFC 2426-compliant vCard and returns it in entries[i].vcard.

Converting embedded hCard markup into a vCard

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/hcard.xml')
>>> print d.entries[0].vcard
BEGIN:vCard
VERSION:3.0
FN:Frank Dawson
N:Dawson;Frank
ADR;TYPE=work,postal,parcel:;;6544 Battleford Drive;Raleigh;NC;27613-3502;U
.S.A.
TEL;TYPE=WORK,VOICE,MSG:+1-919-676-9515
TEL;TYPE=WORK,FAX:+1-919-676-9564
EMAIL;TYPE=internet,pref:Frank_Dawson at Lotus.com
EMAIL;TYPE=internet:fdawson at earthlink.net
ORG:Lotus Development Corporation
URL:http://home.earthlink.net/~fdawson
END:vCard
BEGIN:vCard
VERSION:3.0
FN:Tim Howes
N:Howes;Tim
ADR;TYPE=work:;;501 E. Middlefield Rd.;Mountain View;CA;94043;U.S.A.
TEL;TYPE=WORK,VOICE,MSG:+1-415-937-3419
TEL;TYPE=WORK,FAX:+1-415-528-4164
EMAIL;TYPE=internet:howes at netscape.com
ORG:Netscape Communications Corp.
END:vCard

Note

There are a growing number of microformats, and Universal Feed Parser does not parse all of them. However, both the rel and class attributes survive HTML sanitizing, so applications built on Universal Feed Parser that wish to parse additional microformat content are free to do so.