/[svn.andrew.net.au]/scripts/movies.py
ViewVC logotype

Annotation of /scripts/movies.py

Parent Directory Parent Directory | Revision Log Revision Log


Revision 50 - (hide annotations)
Thu Dec 29 06:17:19 2011 UTC (10 years, 9 months ago) by apollock
File MIME type: text/x-python
File size: 1172 byte(s)
Updated for latest format of IMDb page.

Stopped using BeautifulSoup because it was choking on the HTML.

1 apollock 6 #!/usr/bin/python
2    
3 apollock 50 import copy
4 apollock 49 import urllib2
5 apollock 6 import time
6     import datetime
7 apollock 50 import lxml.html
8 apollock 6 import xml.sax.saxutils
9    
10     def main():
11 apollock 50 imdb = urllib2.urlopen("http://www.imdb.com/movies-in-theaters/")
12     doc = lxml.html.fromstring("".join(imdb.readlines()))
13     for element in doc.iter(tag=lxml.etree.Element):
14     if element.tag.endswith("div"):
15     if element.attrib.get("id", "") == "main":
16     break
17     new_releases = copy.deepcopy(element)
18     movies = xml.sax.saxutils.escape(lxml.html.tostring(new_releases))
19 apollock 6 print """<?xml version="1.0" encoding="utf-8"?>
20     <feed xmlns="http://www.w3.org/2005/Atom">
21    
22     <link href="http://home.andrew.net.au/~apollock/movies.xml" rel="self"/>
23    
24     <title>This week's movies from IMDb</title>
25     <updated>%(updated)sZ</updated>
26     <author>
27     <name>Andrew Pollock</name>
28     </author>
29     <id>http://www.andrew.net.au/</id>
30    
31     <entry>
32 apollock 49 <id>http://home.andrew.net.au/movies/%(timestamp)s</id>
33 apollock 6
34     <updated>%(updated)sZ</updated>
35     <title>This week's movies</title>
36    
37     <content type="html">
38     %(movies)s
39     </content>
40     </entry>
41     </feed>
42 apollock 16 """ % { 'updated': datetime.datetime.utcnow().isoformat()[0:19], 'movies': movies, 'timestamp': int(time.time()) }
43 apollock 6
44     if __name__ == "__main__":
45     main()

Properties

Name Value
svn:executable *

  ViewVC Help
Powered by ViewVC 1.1.22