?

Log in

Col
Small data hack: bin day calendar 
2016-12-28 (Wed) 15:26
shamrock
I'm very lazy. Rather than having to keep track of Cambridge bin collection days manually, especially around holidays, I wrote a thing to convert it into an iCalendar file for me so that I could import it into Google Calendar. Here it is in case it's useful to anyone else:
#! /usr/bin/python3

from argparse import ArgumentParser
from datetime import datetime
import os.path
import re

from bs4 import BeautifulSoup
import dateutil.parser
from icalendar import (
    Calendar,
    Event,
    )
import requests


parser = ArgumentParser(
    description="Generate iCalendar file for Cambridge bin collection days.")
parser.add_argument(
    "id",
    help=(
        "Unique identifier for this calendar, normally your host name.  Make "
        "sure that this does not collide with any other calendars."))
parser.add_argument("address", help="Your street address in Cambridge.")
parser.add_argument("postcode", help="Your postcode.")
args = parser.parse_args()

now = datetime.now()
req = requests.get(
    "http://bins.cambridge.gov.uk/bins.php",
    params={"address": args.address, "postcode": args.postcode})
soup = BeautifulSoup(req.text)
cal = Calendar()
cal.add("prodid", "-//riva.pelham.vpn.ucam.org//bin-days//EN")
cal.add("version", "2.0")
cal.add("calscale", "GREGORIAN")
cal.add("x-wr-calname", "Bin days")
cal.add("x-wr-timezone", "Europe/London")
for div in soup.find_all("div", style=re.compile(r"^text-align:center")):
    desc = div.contents[0]
    when = dateutil.parser.parse(div.b.get_text(" ").rstrip("*"))
    while when < now:
        when = when.replace(year=when.year + 1)
    event = Event()
    event.add("uid", "bin-days/{:%Y%m%d}@{}".format(when, args.id))
    event.add("dtstart", when.date())
    event.add("summary", desc.capitalize())
    event.add("transp", "TRANSPARENT")
    cal.add_component(event)
with open(os.path.expanduser("~/public_html/bin-days.ics"), "wb") as out:
    out.write(cal.to_ical())
On Debian, this requires the python3-bs4, python3-dateutil, python3-icalendar, and python3-requests packages. You'll probably want to change the output path to somewhere that your calendar software can see (so if it's a web service such as Google Calendar then it needs to be something that corresponds to an accessible URL). The web-scraping is pretty gross, but it's the best I can do given the council's published data. Ideally this would itself be a web service that could generate calendars on demand for a given address and postcode, but like I say I'm lazy.

This entry was originally posted at http://cjwatson.dreamwidth.org/21454.html. Please comment there using OpenID.
Comments 
2016-12-28 (Wed) 19:06 (UTC) - Alternative method
I'm even more lazy, this was only 7 lines :-)

to be run each evening from cron -- this way schedule changes should still be caught, but it doesn't have iCal integration:

#!/bin/bash
TOMORROW=`date --date=tomorrow '+%-d %B'`
BODY=`w3m -no-cookie -dump 'http://bins.cambridge.gov.uk/bins.php?uprn=10002568779' | grep -B2 "^${TOMORROW}" -`
if [ -n "$BODY" ]
then
echo "Details at http://bins.cambridge.gov.uk/bins.php?uprn=10002568779" | mail -a "From: Bin Reminder <nobody@jump.org.uk>" -s "Bin day: $BODY" emaildestination@example.internal
fi
This page was loaded Jul 20th 2017, 10:45 pm GMT.