This blog can't be viewed on LiveJournal. Instead see http://www.apparently.me.uk/.

Moved to TypePad

3rd Feb 2009

Apparently.me.uk is now hosted on TypePad rather than LiveJournal. All of the old content remains over here in LiveJournal land, but those who are watching [info]apparentlymart on LJ should switch over to [info]apparentlymeuk to see new entries. Sorry for the inconvenience!

Moving the Goalposts

28th Jan 2009

In the few weeks since I published the first drafts of AtomActivity, ActivitySchema and friends several things have come about:

  • FriendFeed is collapsing multiple photo-related activities together into a single entry in its activity feeds.
  • FriendFeed is using MediaRSS-in-Atom to publish Flickr photos in its activity feeds.

The former opposes a decision made in the name of simplicity at the last activity streams meetup to go with one activity per entry. However, since the spec is being lead by implementations rather than the other way around, I'm planning to reverse this decision and go for a slightly less constrained model where an activity entry can have multiple objects, as long as the verb, actor, context and other activity properties are the same for each. Doing any further coalescing (similar activities by multiple actors, for example) is the job of the UI layer of the aggregator and should not be reflected in feeds. The next iteration of the spec will contain a section with requirements specifically targetting activity re-publishers and aggregators describing what their feeds should look like.

The latter further erodes the argument that we can do AtomActivity because MediaRSS-in-Atom is not yet widely deployed. However, it's still not a slam-dunk for MediaRSS-in-Atom because the way FriendFeed is publishing these elements (as direct children of the activity's atom:entry element) puts them outside the purview of AtomActivity, which expressly ignores everything except the publication date and the activity-specific elements in an activity entry, under the assumption that the content there is intended for non-activity feed readers... therefore (and this will be written explicitly in the next draft of the spec) activity feed publishers can put anything they like in there that will make non-activity feed readers behave in the desired way, and activity-aware readers should ignore it and use the activity information.

Folks who were keen on AtomMedia as an alternative to MediaRSS-in-Atom should take note, though, that the likelyhood of success of the former is getting weaker with each system that implements MediaRSS-in-Atom; I don't personally have time to work on AtomMedia at the same time as AtomActivity, so I'd love it if someone else would take over as author of the media spec.

Activity Streams and Comment Aggregation

15th Jan 2009

One pain point that exists for activity streams right now is the dispersal of responses over various networks. When I post a blog entry like this one, folks get the opportunity to comment on my blog itself (via TypePad Connect), or they can comment on the copy of my entry that gets sucked into LiveJournal from my Atom feed, or they can comment on the activity that shows up in FriendFeed or Plaxo. If I had it set up, they'd be able to comment on Facebook too.

It would be useful if all of these comments were aggregated together so that the entire thread of conversion could be viewed whatever context you end up reading my entry in. This is, unfortunately, not an easy problem to solve on the decentralized social web. Bloggers are accustomed to being "master of their domain", so in an ideal world they want their blog to be the master source of comments. However, it's clear that FriendFeed users want to leave their comments directly from the FriendFeed UI, not follow the link through to the original entry and comment there.

One idea is to provide some sort of endpoint where comments can be submitted by remote systems, but it's difficult to see how that would work with authenticated comments and with comment forms that have features such as CAPTCHAs. It would also be tricky to get right with the "paste in a chunk of HTML" comment systems such as Disqus and TypePad Connect. Every blog has its own variation of allowed comment markup, too, not to mention odd-ball cases like YouTube's video comments. Coming at it from the other side, it's unlikely that the other systems will be willing to relinquish control of "their" comments; for many sites, the discussions they host are a big part of their value.

Another approach is a more passive model where comments simply get added to an activity stream somewhere and it somehow gets consumed by all other sites displaying comments, but then discovery of these various streams and figuring out how to deal with abuses becomes the problem.

I don't know the solution right now, but I do feel that this is an important problem to work on as we move towards a more decentralized social web; as people start to use more and more different activity aggregators it will become increasingly difficult to stay engaged with the conversations that are going on.

Activity Streams Next Steps

11th Jan 2009

Last Thursday Six Apart hosted a very productive meet-up for the Activity Streams community -- which turned out to be far bigger than I imagined -- where we had some good discussions about where we are and were we're going. I think overall the feedback on the current spec drafts was positive, though there was a definite desire to grow the schema to support the activities exposed by more social sites. MySpace joining the fray has also made purely social activities such as friendship relationships, which we were previously deferring for a later draft, suddenly more important.

I like to work to defined goals, so here are my high-level goals for the next iteration:

  • Write a spec for the representation of people as Atom entries, to enable them to be used as activity objects. This will probably be based on the XML serialization of the PortableContacts schema, though there will be some adjustments to address the redundancy that exists between some existing Atom elements and the PortableContacts fields.
  • Expand the schema to include verbs and object types necessary to support a large proportion of the publishers currently supported by FriendFeed and Plaxo. These are easier to specify because FriendFeed and Plaxo already process these in a particular way so there are examples to draw from.
  • Start to spec out some schema additions for the purely social activities exposed by MySpace. This will be harder, because we don't really have any good examples of what this might look like in Atom, but I hope to work with Monica Keller from MySpace to figure out what makes sense for them and hopefully extrapolate that to Facebook and other similar systems. MySpace also exposes activities raised by OpenSocial, so we'll need to address how AtomActivity and OpenSocial work together at some point, but I'm hoping we can defer that at least to the next iteration.

Since I'm working on this largely in my spare time I can't really give a timeframe for the above, but I'd certainly like to do them sooner rather than later. MySpace in particular seems to be ready to launch, so that's pushing things forward faster than they might otherwise have moved. I'm getting some good feedback from a number of folks online too, and I've made a list of the outstanding issues that've been raised which I intend to post online shortly.

It was exciting to see so many folks at the meet-up enthusiastic about solving this problem. Hopefully with a couple more iterations we'll get to a place where folks can start to feel more comfortable implementing this stuff, as things solidify.

2008 on OpenStreetMap

2nd Jan 2009

ItoWorld.com has produced a really neat visualization of 2008's OpenStreetMap edits. It starts zoomed in on Ipswich, which is not far from my former home of Colchester, and I was quite pleased to see the little flash of Colchester as it zooms out... that's (partially) me! You wouldn't spot it if you didn't know exactly what to look for, but whatever.

More interesting on the global scale is just how much editing went on last year, not only in the UK but all over the world. It's great to see OpenStreetMap taking off, though I'm still sceptical whether it can really ever have any use beyond getting CC-licenced maps to go on a website's "How To Find Us" page, since you can't rely on it for any place you don't know. (Having said that, though, there are of course parts of the world where maps are not so readily available, and I do hope OpenStreetMap can go some way to addressing that problem.)

(FWIW, Colchester is still incomplete &mdash there are a few folks still working on it, but there are still large chunks of it not mapped — but I believe neighbouring town Wivenhoe (which is where I actually lived) has all of the important streetmap-level details.)

Using RSS for Activity Streams: Analysis

16th Dec 2008

I've been thinking some more and talking a bit with folks about whether Activity Streams should be in RSS or Atom. I did get some feedback saying that both should be supported, but I'm not sure I really want to create two different ways to publish/consume activity data. Here are some advantages of each...

First the advantages of switching to RSS:

  • We don't have to invent a new way to represent media objects.
  • Almost all sites publish RSS, in some cases exclusively. (So in order to publish an activity stream, they'd need to build out an extra feed endpoint.)
  • Sites that don't currently publish Atom would need to add an additional autodiscovery link, which may confuse aggregators and complicates the UI for feed subscription in browsers.

But here are the advantages of staying with Atom:

  • Its core elements are in an XML namespace, which makes it easier/nicer to include inside weird containers like XMPP stanzas.
  • We can use atom:source to deal (to a certain extent) with activity aggregation feeds such as the Atom feeds that FriendFeed publishes. No such concept exists in RSS.
  • We don't have to deal with the complexities and ambiguities of Media RSS. (In other words, we can decide on something sensible without being constrained by existing practice.)
  • The Atom schema and data model is much better defined than RSS. (Though lots of software just treats Atom as a funny serialization of RSS, so this benefit doesn't really manifest in practice.)

Here's where my head is at right now: the concept of "object feeds" in the AtomActivity spec could in theory be adapted to map onto RSS without many changes. Therefore we could include a section in AtomActivity for how to construct the "implied activity" for an RSS item much like we currently describe how to construct the same for a non-action Atom entry. The concept of "activity entries" is more complicated to adapt to RSS due to its re-use of Atom elements, but given that there are currently only a few implementations that contain something resembling "activity entries", so hopefully we can get them to converge on Atom for this.

What this means in practice is that sites publishing feeds of objects can take their existing feeds, whether Atom or RSS, and add the activity:verb and activity:object-type annotations, and be done. Sites publishing feeds of activities (FriendFeed, Plaxo, Movable Type Action Streams, Wordpress Activity Streams, ...) would need to use Atom, because there would be no representation defined for this in RSS. Consumer libraries would need to support both RSS and Atom, but there would be a well-defined mapping for how to turn both sorts of object entries into Atom-based activity entries.

This would make Atom the primary format but there would be some limited (but well-defined) support for RSS. Does that seem reasonable to folks?

Feed Publishing Research

15th Dec 2008

In my previous entries I alluded to research into the popularity of different approaches for publishing feeds, particularly those containing media objects such as photos, videos and audio. I've now written up a short summary of my findings.

The three things that spring right out here are:

  • RSS is published by just about everyone.
  • You usually find Atom in the traditional blogging space, but it isn't even in the game when it comes to media publishing.
  • The only thing you can actually reliably get out of MediaRSS is a thumbnail image.

I'm continuing to mull over whether to rewrite activity streams in terms of RSS or to hope for increased adoption of Atom. My leaning right now is to the former.

Another interesting fact not reflected in my results document is that none of the RSS feeds I examined used any RSS features that are not available in Atom when augmented with my AtomMedia draft, and AtomMedia allows only one way to publish each case rather than the myriad combinations of media:group, media:content and other element nesting that are allowed by Media RSS and used by feeds in the wild. It's too bad that if I move to RSS/MediaRSS for activity streams I'll have no need for AtomMedia; I'd be delighted if someone else would pick it up and finish it off, though.

ActivityRSS instead of AtomActivity?

14th Dec 2008

If you've been following my adventures this weekend you'll know that I started off wondering why RSS is still so prevalent when we have Atom. However, not long after that I started doing research in preparation for specifying the media object types in AtomActivity and discovered one big reason why RSS is still widely used: folks publish media objects like podcasts using MediaRSS, but there is no standard for media objects in Atom.

So faced with the need to mark up activities involving photos and videos in Atom, what is a boy to do? Last night I took a whack at adapting a subset of MediaRSS to Atom, with the hope that AtomActivity could refer to that. However, today I played around some more with various software that consumes feeds with embedded media, and found that there does seem to be a subset of MediaRSS that does actually work in software today, and that made me take a step back and reassess my goals.

My design goal with AtomActivity was always to describe some minimal extra markup that would allow existing feeds to be consumed by activity aggregators. Asking providers to add a bunch of additional junk to their Atom feeds when they already have fully functional MediaRSS feeds doesn't really jive well with that design goal.

The research that motivated me to ask why sites still publish RSS does seem to indicate that RSS is far more widely deployed, both by publishers and by aggregators, than Atom is. Aside from a few Six Apart products, no major service that I looked at publishes Atom only. Most publish both Atom and RSS at the same time with only basic content in the Atom feed. Many others publish only RSS, or they publish both but only have autodiscovery for RSS. I'm less certain about the consumer side, but given that only a tiny handful of publishers actually publish Media information in Atom at all I'm guessing that today systems like FriendFeed and Plaxo Pulse are using the RSS feeds when they're pulling from sites like YouTube and Flickr. If the goal is to make only minimal changes to existing practice, it does look like we're barking up the wrong tree by building on Atom.

The question now is whether to persevere with AtomActivity or to repurpose it as an RSS extension instead. Using RSS has the benefit that MediaRSS is already widely used in RSS to mark up media content, so we can do a reasonable job at consuming these feeds as they exist today. It does mean that we lose out on some Atom features such as atom:source and the Atom Threading Extensions, but neither of these are widely used today so that's no major loss.

If we did go this route I'd still want to write up a proper spec for a subset of MediaRSS that serves the same use-cases as my AtomMedia draft, since the current "specification" for MediaRSS is to big and not really detailed enough. However, at least this approach means that there is existing implementation practice to base such a subset on, so I'd be describing what works today rather than what might work in a few years if anyone actually bothers to implement it when their RSS feeds already work anyway.

As ever, I'm eager to hear what the rest of the world thinks. It's lonely here inside my head...

Money Where Mouth Is

14th Dec 2008

Sam Ruby quite fairly called me out for hating on folks that publish RSS while doing it myself. The reason is quite unexciting, though: my blog is, for historical reasons, hosted by LiveJournal. LiveJournal provides Atom and RSS feeds for all blogs it hosts.

However, I'm already doing a bunch of munging of LiveJournal's output to do things like using TypePad Connect for comments, so it didn't take long to munge out the RSS stuff. While I was at it I finally got shot of all of the script and CSS cruft that LiveJournal adds to every page to support ads, contextual popups, navigation strips and all sorts of other things that I don't have on my blog anyway.

The long-term plan is to move from LiveJournal to something else — either MovableType or TypePad most likely — but I'm putting that off until I can figure out a way to keep all of my old content appearing at the same URLs with the same comments attached.

Atom Media Extensions

14th Dec 2008

In my last entry I noted that there doesn't seem to be any standard practice for publishing media in Atom. A handful of publishers do the best they can with the stock Atom spec and make a single link with rel="enclosure", while Google (Picasa, YouTube) is the only publisher I could find that actually uses the MediaRSS elements in Atom. Most sites just don't bother: if you want that information, you need to go fetch the RSS feed.

Since only Google's using it right now anyway, rather than import wholesale the whole of MediaRSS into Atom — MediaRSS is a pretty big, complex beast with lots of stuff that's arguably unimportant for most use-cases — I decided to design an Atom extension that's based on some of the features of MediaRSS but bashed into a more Atom-like shape and without the elements for which Atom already provides equivalents.

I now have a first draft of "AtomMedia". Here are the main differences between AtomMedia and MediaRSS:

  • AtomMedia has the narrower scope of being aimed at the aggregation and activity stream use-cases. Much of MediaRSS's complexity is so that it can be used by the indexer for Yahoo! Video Search, but that's not my goal here.
  • MediaRSS uses extension elements exclusively, while AtomMedia extends the atom:link element. In particular, it extends standard Atom's link rel="enclosure" for compatibility with existing implementations.
  • AtomMedia excludes the MediaRSS features that are not directly useful for the aggregation and activity stream use-cases. In particular, I did not include content ratings, regional exclusions, "credits", timed text and media hashes. Many of these feel like things that are more general than this use-case, anyway.
  • AtomMedia excludes some bits that Atom already has equivalents or near-equivalents of: category, title, keywords.
  • Due to the tighter scope, I was able to include tighter requirements for specific use-cases that will hopefully mean that there will be less variation between publishers.
  • AtomMedia reduces the media metadata considerably: it has only width/height for visual things and duration for time-based things. Some of the other attributes (fileSize, lang, type, ...) have equivalents in generic Atom and are thus omitted.
  • AtomMedia assumes that each entry describes exactly one media object that might have multiple representations. MediaRSS looks like it's trying to allow entries with multiple objects associated with them, but it doesn't define well exactly how that works in practice and I've seen no feeds actually make use of this feature.

If I can get some traction on this I'd like to use it as the representation format for the photo and video object types in the AtomActivity schema specification. The main important thing I'm missing right now is a namespace URI. How does one register URLs under http://purl.org/syndication/, as seems to be the done thing for Atom extensions in development?

The sorry state of media in Atom and RSS

13th Dec 2008

Part of the AtomActivity work is defining a single standard way to publish the metadata about the core object types in an Atom entry. For the object type "weblog entry", our work is basically done: that's what Atom was built for. Things get interesting when you consider representing photos and videos in Atom.

Photos and videos have lots of interesting properties above what weblog entries have, the most important of which are the URLs for different-sized representations in different formats. Many moons ago, while Atom was in its infancy, Yahoo! invented Media RSS for this purpose. Media RSS is an extension to RSS that adds a multitude of interesting new elements, including content and thumbnail that together handle locating the different-sized representations of a media object.

MediaRSS seems to have been adopted in the RSS feeds exposed by most of the popular media hosting sites, including Flickr, YouTube and Picasa Web Albums. There's also a handful of "media aggregators" — mostly focused on audio — that support MediaRSS. However, as seems to be the case for many things RSS, the specification gives loads of options and no clear guidance on what to actually do and consequently everyone implements it slightly differently.

What of Atom? The situation in Atom land is considerably more grim. Atom itself has nothing more than a workalike of RSS's original enclosure element, and while a few publishers are making use of it (Flickr, for example) this isn't enough to provide the various representations you generally want to publish for a media resource. It seems that around the time MediaRSS was being developed there was a thread about developing something similar for Atom on the Atom IETF mailing list, though as far as I can tell the outcome was "wait until MediaRSS is finished and use it as a basis". MediaRSS was of course eventually finished, but I guess by that time the Atom working group at IETF had published its two RFCs and wound up.

My research suggests that today most publishers just omit media information (beyond the basic "enclosure" link) completely in their Atom feeds, while publishing it via MediaRSS in their RSS feeds. Google's YouTube and Picasa Web Albums are the only example I could find where MediaRSS elements are published both in RSS and Atom feeds, though in both cases they do it differently than everyone else (everything's wrapped in a single media:group element, rather than included as direct children of item) and FeedValidator says that Picasa's feeds are in fact invalid because they only include one media:content element in the group, though of course the MediaRSS spec itself says little about this.

MediaRSS also, on a more subjective level, feels like a bit of a foreign citizen in Atom. Many of its elements overlap with elements already defined in Atom, and of course it doesn't use Atom's link element because that concept does not exist in RSS.

So the question now is how to specify media element handling in AtomActivity. MediaRSS has far more functionality than is required for the Activity Stream use-cases. As I see it, the options here are:

  • Write AtomActivity to specify that, for photos and videos, you are to "retrieve the media object metadata as defined by MediaRSS" and leave it at that. However, I feel that MediaRSS is too big and underspecified, with two many possible variations.
  • Define a subset of MediaRSS that only includes the minimum necessary for the activity streams use-cases, and is far more rigid about how things are to be published.
  • Use MediaRSS as the basis for a separate specification that has a narrower scope and feels more at home in Atom.

If MediaRSS in Atom were already widely and consistently deployed I wouldn't hesitate to go for the second of these options, but since everyone except Google would have to add to their Atom feeds anyway, and since existing Atom parser implementations are unlikely to have support for MediaRSS right now anyway, I'm leaning towards the last of these, defining an extension that builds upon (and is backwards-compatible with) how "enclosures" are already represented in Atom. The MediaRSS folks have already done the hard work of figuring out the featureset, so the work would largely be just mapping MediaRSS concepts onto Atom structures.

I'm fully expecting to hear loads of cries of "don't reinvent the wheel!", which is fair enough, but my review of current practice suggests that Atom enclosures are currently far more widely deployed than MediaRSS-in-Atom, so defining something that extends Atom's enclosure mechanism seems like a better way to go than switching to something entirely different. I'm going to take a whack at an "Atom Media Extensions" and see how it turns out.

Why do sites still publish RSS?

13th Dec 2008

In my travels all over the web looking for examples to use as the basis for AtomActivity it was interesting to note the number of sites that are still publishing both Atom and RSS feeds in parallel.

Given that Atom was designed to be the "cure to all ills" of RSS, it seems like you ought to be able to publish anything you can publish in RSS as an Atom feed, even just as a mechanical transformation. Perhaps it's just the ease of doing it that's the motivation? "Neither are hard to generate, so let's just do both."

Where this becomes troublesome is the definition of extensions. AtomActivity, by virtue of being an Atom extension, can't be used directly in RSS. While it's true that you can plug the namespaced XML elements that are specified into an RSS item element, there are still several incongruities: first and most obvious is that activity:object is defined to contain the same elements you find in the atom:entry element (more or less), but also some of the object types we'll be adding in will also have a description of how to extract their properties (such as a photo's image URL) from an Atom entry, and that description won't work without modification on an RSS item.

So what's an extension author to do? Do I need to write a parallel "RSSActivity" spec that's fundamentally the same but uses RSS elements in place of Atom ones? Do we need to define for every object type a mapping for both Atom and RSS?

Another place this problem manifests is libraries that act as an abstraction layer over RSS and Atom. It's true that for the basic case of publishing feeds of weblog entries the interface to both of those is basically the same, but Atom is in fact a superset of RSS (functionality-wise) and so any such libraries are necessarily restricted to supporting only what RSS can do. The use-case for these libraries is "Here's the URL for a feed. Parse it at all costs. I don't care what format it's in". That sounds useful on the surface, but are there really any significant sites left that publish RSS but don't also publish Atom? Can't we just leave RSS to die and use Atom-specific libraries?

Browsers suffer in this department also. On just about every site I visited, when I clicked the "Feed" icon in my browser I got a pop-up menu with two options: "Feed (Atom)" and "Feed (RSS)". Do we really want to be forcing users to make the choice between two options that, as far as their browser is concerned, behave in exactly the same way? Firefox and Opera -- and, I assume, every other major browser -- supports Atom, so can't we just remove the RSS autodiscovery links, even if the underlying feeds remain? Consign RSS to the bucket of "we maintain this for backwards compatibility" rather than "this is functionality we actively promote". In an ideal world, browsers would ignore the RSS feed if an Atom feed is present, but since the browser can't reliably determine that the RSS and that Atom versions really are the same content, it's left to the page author to make that decision.

If you publish both RSS and Atom feeds on your site I'd love to hear why. If you're publishing exclusively RSS I'd love to hear why as well.

Draft Activity Streams Specification

9th Dec 2008

I took a first whack at an Atom extension for describing activity streams. The format described therein is the format expected (and, a few funky bugs notwithstanding, generated) by my experimental activity streams library for Perl.

There's definitely a lot of editorial cleanup to do, but I'm not doing that right now since I'm anticipating that this'll be changing quite a lot once some more folks throw their two cents in. Already there's discussion on the Activity Streams Google Group about using atom:category for the verb and object type annotations in preference to custom elements; I intend to do some testing to see whether existing feed processing software acts well or acts badly with the various alternative serializations — including the one in my draft — since it's important that these feeds do something sensible when they turn up in traditional feed readers and other feed processing software, else there will be reluctance to add this stuff.

There's also discussion about alternatives to my proposal of nesting an atom:entry-like structure inside the activity to describe the activity "object". A valid concern is that applications that consume and re-publish feeds are likely to drop unknown extension elements on the floor, so it would be good to find a way to behave well in this case.

I encourage folks who are interested in contributing to this specification, whether in the form of spec feedback or in the form of experimental implementations and testing, to join the discussion on the Google Group. I think Chris Messina will also be doing a talk about this topic at the Learning About the Open Stack for the Social Web event on Dec 19th.

The .tel domain

3rd Dec 2008

Recently launched (although with registration restricted to trademark holders right now) is the .tel top-level domain. The marketing on their website sells it as a domain where you can publish your contact information without needing to make a website. Implementation-wise this means that they run the DNS service for you and point it at their webapp which publishes the information you supplied.

One thing they don't make a big deal of is what's going on in the DNS for these domains:

NAPTR	100 101 "u" "E2U+voice:tel+x-mobile" "!^.*$!tel:+16468889999!" .
NAPTR	100 102 "u" "E2U+voice:tel+x-work" "!^.*$!tel:+12125551234!" .
NAPTR	100 103 "u" "E2U+email:mailto" "!^.*$!mailto:emma@aol.com!" .
NAPTR	100 104 "u" "E2U+x-im:skype" "!^.*$!skype:emma123!" .
NAPTR	100 105 "u" "E2U+web:http+x-lbl:Myspace_page" "!^.*$!http://www.myspace.com/emma!" .

Yes, someone has actually made use of the NAPTR record type for something. I'm not sure if any clients can make use of the above right now, but at least someone publishing it is one step in the right direction. I find NAPTR interesting because -- with this application at least -- it's using domain names to identify people. OpenID gets a lot of flack for using URLs to identify people, so I'm doubtful that identifying onesself as "nickname.tel" would catch on either, especially since you need to pay for the privilege.

They support access control on the user pages so that you can make your information available only to your friends. The catch is that your friends must sign up for an account at your .tel-provided site to do this, which also seems unlikely. This seems like somewhere OpenID could be useful... (lack of actual OpenID users aside) you could just pre-seed the approved list with your friends' URLs and then they can just log in when they need it without waiting to be approved. (I have to assume the NAPTR records go away when your contact information isn't public, which also kinda reduces the usefulness of it.)

This service is of course a lot like what Chi.mp does. Chi.mp puts a much more personal, social-networking-kinda spin on it, while .tel seems more aimed at businesses, but the idea is at least similar.

Too Many Mailing Lists

30th Nov 2008

The number of mailing lists that are being used for discussing "the open web" is ridiculous. Most of them are so low-traffic that they are rarely looked at and of no use to anyone. I'm sure everyone's missing at least a few of these that they should be watching. Let me try to enumerate... (though I'm sure I'm missing some as well, so please feel free to enlighten me.)

  • OpenID General - the only mailing list at openid.net that really gets any traffic
  • OpenID Specs - occasionally has a splutter of activity whenever someone proposes something, but then discussion seems to quickly redirect to some obscure, private mailing list and it all goes quiet again
  • OpenID User Experience - does what it says on the tin
  • OpenID PAPE - mailing list for the OpenID PAPE Working Group
  • Open Web Foundation Discussion - mostly about the business of getting this foundation up and running, it seems
  • idib - not really sure what this is. Identity In Browser, I guess? Someone should really fill out the information about this group a bit better
  • Social Network Portability - started to discuss what the name suggests. Now mostly dead and just attracts the occasional spam message.
  • OAuth - the main discussion group about OAuth
  • DataPortability-public - The public mailing list for the DataPortability folks
  • Social Graph API - Group specifically about Google's Social Graph API.
  • OpenID Test Framework - Started as a project to make a test suite and test harnesses for OpenID. Pretty-much stalled.
  • DiSo Project - Stands for "Distributed Social" and is about implementing social networking features without a centralized social networking site. (I guess?)
  • axschema - Discussion about the core schema for OpenID Attribute Exchange
  • EAUT - Discussion about a particular approach for mapping email addresses to HTTP URLs
  • PortableContacts - Discussion about a specification for shifting lists of contact information between sites
  • Activity Streams - Discussion about how to generalize activity streaming so that consumers don't need to hard-code support for particular services
  • Step2 - Apparently this started as a place to discuss the OpenID/OAuth hybrid protocol, but somewhere along the line also started being about using email addresses with OpenID
  • OpenSocial - a bunch of different mailing lists about various aspects of OpenSocial
  • Microformats - discussion about Microformats
  • XRDS-Simple - (aka "Yadis 2.0")
  • OASIS XRI TC - Where XRI and XRD(S) live. (Joining this one will cost you $300 in OASIS membership.)
  • OAuth Extensions - "A forum to discuss and develop extensions to the OAuth protocol to be published seperately or added to future versions of OAuth."
  • OpenDD - Reinventing RDF, it seems. I'm sure there's more to it than that, but that's all I've been able to understand from their Google Group.
  • User-centric Identity Interop - "Forum for user-centric interop planning, reporting, etc. for OSIS and other user-centric identity interop efforts that are allied with Identity Commons"
  • SIOC - "Semantically Interlinked Online Communities": an approach to interconnect online communities using the technologies developed by the Semantic Web community
  • IDtbd - I have no idea what this one's about
  • Metatada Discovery Coordination - (presumably this is meant to be "metadata") Aims to help coordinate the separate ongoing efforts regarding URI metadata discovery by providing a place to share requirements, use cases, and solutions.

The most frustrating thing about this plethora of mailing lists is that many of them are deliberately obscured, whether by making their archives available to members only, or by configuring them not to show up in searches, or by posting little or no information about what the group is for, or in some cases just not telling announcing that they exist in any useful location.

Extensible JSON

30th Nov 2008

JSON has a number of advantages over XML, the main one being that it maps nicely onto the data structures developers are used to. However, it struggles a little with something that you could argue is an advantage of XML: decentralized extensibility. Creating ad-hoc extensions simply by adding new keys to someone else's schema works fine as long as everyone's playing together, but can we define a mechanism that is similar in capability to XML, where anyone can invent extensions without inadvertently colliding with someone else?

I've previously proposed the following, and I'm sure I'm not the only one:

{
    "name": "Martin Atkins",
    "{http://some.namespace.example.com/}membershipNumber": 1243523,
    "{http://some.namespace.example.com/}hoursOfEntry": [ 1, 12 ]
}

This works, but it's really verbose and not incredibly readable. Today I have an alternative proposal:

{
    "name": "Martin Atkins",
    "$ext": {
        "http://some.namespace.example.com/": {
            "membershipNumber": 1243523,
            "hoursOfEntry": [ 1, 12 ]
        }
    }
}

The first thing I did here was invent a de-facto key name under which extensions can live. Ideally no JSON-based schema would ever define a key with the name $ext, knowing that it's used for extensions. The second thing is to separate each namespace out into its own object, so the namespace URIs don't get repeated over and over and so that, in theory, that innermost object could be an instance of another schema which you can pass into some other library that understands that schema without it needing to know that it's being used as an extension rather than a top-level object.

If the "magic key" $ext doesn't sit right, this could also be defined on a per-schema basis. A particular schema could say "Extension fields are allowed under the key extensions" and still use the above structure within the extensions field. Some schemas might use a different key name, or might forbid extensions altogether.

I'm sure I'm not the first person to propose a structure like this, and I know there is a certain amount of resistance to trying to formalize JSON schemas in the same way as XML schemas are usually defined, but I think moving forward we do need to find a way to re-use schemas across multiple applications rather than everyone rolling their own.

People Search

27th Nov 2008

My project for today: People Search.

It's a mashup of Google AJAX Search API and Google Social Graph API that finds pages that represent people and displays the people rather than the pages.

Only known to work in Firefox. Doesn't quite work in Opera. Probably won't work in Safari. Definitely won't work in Internet Explorer.

Warning: URLs can contain at signs!

17th Nov 2008

This should not be surprising to anyone, but it has apparently caught out both me and Ma.gnolia: URLs can contain at signs!

Ma.gnolia has support for one of the fledgeling attempts at a protocol for email addresses as OpenID identifiers. A few weeks ago I posted about my own experimental implementation of a different approach to the same problem. Both of us made the mistake of identifying an email address by simply looking for an at sign anywhere in the entered URL.

This is, of course, not good enough. Flickr's OpenID identifiers that are already in the wild have at signs in them. There's nothing constraining anyone else from using an at sign, either. So what is a boy to do? Time for a more restrictive regex, I guess. /^[^:/]+@[^:/]+/ ought to do the trick, I think. There is of course the big elephant in the room that all of these are breaking backward-compatibility with existing implementations that turn mart@example.com into http://mart@example.com/.

I've had on my to-do list for a while now some research to see what existing implementations do when presented with URLs like that. I'm sure it's suboptimal whatever it is, but we need to consider how existing implementations will behave if we change the rules now. In an ideal world, we'd find that current implementations all behave basically the same and we can document that as opt-in fallback behavior when "proper" email address support is not available at a particular RP.

The representative hCard for a Page

15th Nov 2008

In my previous entry I mentioned that I couldn't find a way to go from an XFN-discovered URL to an hCard describing the corresponding person. It turns out that David's response was correct: there is a way to do this already. The catch is that rather than linking from the page to the hCard, it instead links from the hCard to the page. The fact that I already had half a solution in my mind when I was searching for existing practice prevented me from finding this one. Mea Culpa, I guess.

This is, however, a good example of what I consider a failure in the design of some Microformats. For me, the big advantage of Microformats over other data publishing mechanisms is that I just need to add a few adornments to data I'm already publishing, so I can add Microformats support quickly with no visual or structural impact on my page. This approach for marking "representative hCards" fails to deliver on this promise: my page doesn't have a link to itself. Why would it? You're already there!

This does draw my attention to something I hadn't noticed before: the hCard on my site doesn't contain my URL, so if you export it using existing tools you won't get the URL field populated. I'm loathe to put a self-referential link on my page, since that'd be confusing. It feels like hCard parsers should be able to infer that my URL is the current page URL having determined that this is the representative hCard... but of course, as currently specified, it can't determine whether it's the representative hCard unless I publish that self-referential link.

I've posted the proposal from my previous entry on the Microformats mailing list to see what the Microformats community thinks of it. I think it complements nicely the approach they're already recommending, allowing some additional possibilities that it can't support alone. It also doesn't invent anything new: the link element and rel="me" are being used to mean what their respective specifications say they should mean, and the hCard documentation already says that if a fragment is present in the URL the parser must look only within the identified element.

When hCard meets XFN

15th Nov 2008

hCard is a microformat for encoding the contact information for a person, company, organisation or place. XFN is a microformat that uses URLs to represent people and links between those URLs to represent relationships.

If you've got a URL representing a person, how do publish the contact information for that person? An obvious answer is to include an hCard in the page returned at that URL. However, as far as I can tell there's no way presently to mark up the fact that a particular hCard on a page at a particular URL is the hCard of the person the URL represents, which I find to be an irritating disconnect.

Since I was unable to find any prior art for this, I'll make a straw-man proposal. On my main website I've had for some time my basic contact information marked up with hCard. To support discovery of my hCard, I added id="contactinfo" to the element that holds the vcard class and then added the following to <head>:

<link rel="me" href="#contactinfo">

My intent here is to say that the element with the id "contactinfo", which in this case is an hCard, represents the same person as the page as a whole. This technique could be used for any other person-related microformat too, such as perhaps an hAtom feed of a person's activity stream. (though rel="alternate" might make more sense in this case.)

This seems like a nice, straightforward way of filling this missing link. If there's an existing practice I missed then please let me know, or else I'd love to hear feedback on this approach.