Tracking OERs

The ability to track open educational resources (OERs), when they are distributed for example via RSS or as IMS Common Cartridges, was a live topic at the recent CETIS conferenceOER technical roundtable session, and is part of OLnet research question 5. Jenny Gray and I are taking part in a JISC CETIS online seminar Thursday, 19 November, where I hope we can touch on the ideas below.

The problem: there is little evidence for how, how frequently and where open educational resources are being used and reused. Evidence would be useful for many reasons including research into topics like quality and usefulness, and when applying for funding and resources.

Acknowledgements: Will Woods was supportive of the initial idea. Jenny Gray suggested using a Creative Commons license icon as the image - thank you!

Requirements

Suggestions for requirements:

  1. License - a license that requires that any software code/markup/binary data that is in an OER is preserved during use, reuse and remixing. (For example, Creative Commons Share-Alike + additional terms.)
  2. Useful - it must be possible to gather meaningful data.
  3. Granularity - ideally, it should be possible to dis-aggregate an OER "course" or unit, and track individual resources, for example, individual images, or sections of text.
  4. Location agnostic - it should be possible to track OER use behind a firewall. (It probably won't be possible to track use on a person's local computer - privacy + technical constraints.)
  5. Compact - the size of the tracking code/markup should not be dis-proportionate to the size of the (dis-aggregated) OER.
  6. Device/system agnostic.
  7. Caching - prevent caching of the image, scripts or other assets, which would impede the collection of data. Careful cross-browser testing will be required to catch bugs.

Suggested solutions

  1. Javascript, similar to Google Analytics, possibly based on Piwik. Advantage: potential to collect a lot of data, eg. URL referring to the OER, capabilities of the client device. Dis-advantages: not compact; not all devices/organisations have Javascript enabled; JS filtered/removed because of perceived security threat. [ Nick, I thought about this after we talked - we strip out our own JS from download packages like IMS CC because many VLE import / CC players don't support JS -Jenny Gray 11/18/09 5:55 PM ]
  2. A server-side script that generates an image - a transparent 1x1 GIF.
  3. A server-side script that generates an image - a Creative Commons License icon. Advantage: compact; pervasive/(almost) device agnostic; low security risk/less likely to be filtered; image serves a secondary purpose. Dis-advantages: limited data can be collected.

Example markup

An example of suggestion 3, distributed via an RSS feed (Unit A180 RSS feed, OpenLearn). The example server-side script at http://olnet.org/track will receive HTTP headers including the "referer" and user-agent. From the referer, the IP address and approximate geographical location of the server that hosts the embedding page can be deduced, for example via online IP-location services.

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns... version="2.0">
<channel>
<title>RSS Feed for the unit Aberdulais Falls: a case study in Welsh heritage</title>
<link>http://openlearn.open.ac.uk/course/view.php?name=A180_2</link>
<copyright>http://creativecommons.org/licenses/by-nc-sa/2.0/uk/</copyright>
...
<item>
<title>Introduction
<link>http://openlearn.open.ac.uk/mod/resource/view.php?id=360826
...
<description><![CDATA[<div id="content">
   <h2>Introduction
   <p class="paradefault">This case study, which is taken from the Open University course <i>Heritage, whose heritage? (A180),
...
  <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/uk/"><img
  alt="Creative Commons License" style="border-width:0"
  src="http://olnet.org/track/openlearn/unit/A180_2/resource/360826/i.creativecommons.org/l/by-nc-sa/2.0/uk/80x15.png"
  /></a>
  </div>
]]></description>
...
</item>
<item>
<title>1.1 Background
<link>http://openlearn.open.ac.uk/mod/resource/view.php?id=361730
...
<description>
   <h2>1 Aberdulais Falls</h2>
...
  <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/uk/"><img
  alt="Creative Commons License" style="border-width:0"
  src="http://olnet.org/track/openlearn/unit/A180_2/resource/361730/i.creativecommons.org/l/by-nc-sa/2.0/uk/80x15.png"
  /></a>
  </div>
</item>
...
</channel>
</rss>

Appendix 1: sample request

A sample HTTP image request, showing "Referer" and "User-Agent" HTTP headers that are routinely added by your browser:

Request Headers, for the image http://upload.wikimedia.org/wikipedia/commons/0/07/Waterhouse_miranda_th..., contained within the page http://olnet.org/node/143

Host:       upload.wikimedia.org
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2
Accept:     image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-gb,en;q=0.7,zh-cn;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset:  ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer:    http://olnet.org/node/143
If-Modified-Since: Sun, 08 Mar 2009 18:16:57 GMT
If-None-Match:   "24eca-49b40b99"
Cache-Control:   max-age=0

Appendix 2: pseudo-code

Some PHP/pseudo-code to log tracking data and generate the image. Obviously there needs to be some (separate) code to generate the elements for RSS feeds and other distribution channels.

<?php
function track(array $segments, string $filename, string $ext) {
  $data = array(
    'repository'=> $segments[1],
    'course'    => $segments[3],
    'unit'      => $segments[5],
    'referer'   => isset($_SERVER['HTTP_REFERER'])   ? $_SERVER['HTTP_REFERER']   : NULL,
    'user_agent'=> isset($_SERVER['HTTP_USER_AGENT'])? $_SERVER['HTTP_USER_AGENT']: NULL,
    #...
    'timestamp' => time(),
  );
  $image = array(
    'host'   => $segments[6],
    'license'=> isset($segments[8]) ? $segments[8] : NULL,
    #...
    'size'   => $filename,
    'type'   => $ext,  #And/or "Accept" header.
  );
  $data = host_to_geo($data);
  db_insert_tracking($data);
  no_cache();
  render_image($image);
  #header('Content-Type: image/png');
  #...
}
?>

References

(From a Google Doc, N.D.Freear, OLnet/The Open University, 17 November 2009.)

Comments

Web bug

So, the idea of an embedded image presented above, is called a Web bug, tracking bug, tracking pixel.

It's also worth pointing out that Google-Analytics, and Piwik are actually wrappers around a "Web bug" that:

  1. Add extra data (screen size, plug-ins installed etc.) that is only possible through Javascript.
  2. Probably improve latency problems.

Piwik looks very interesting. More notes later, from the CETIS online seminar that happened today.

Nick