SDShare - A Protocol for the Syndication of Resource Descriptions

SDShare - A Protocol for the Syndication of Resource Descriptions

Version 1.0 Final Draft
Date: 2012-07-10
Author: Graham Moore (gra@brightstardb.com, @gra_moore)
Author: Lars Marius Garshol (larsga@garshol.priv.no, @larsga)

Abstract

SDShare is a protocol for the syndication of resource descriptions. It defines how a RESTful service can publish a series of feeds that list snapshots and changes to collections of resources. This protocol also defines how a client should process those feeds and the linked resource descriptions so that a local store can be kept in sync.

References

HTTP
RFC 2626, HTTP 1.1. September 2004.
Atom
RFC  4287, The Atom Syndication Format. December 2005.
RDF
Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommendation, February 2004.
NTriples
RDF Test Cases, W3C Recommendation, 10 February 2004.
RDF/XML
RDF/XML Syntax Specification (Revised), W3C Recommendation, 10 February 2004.
Atom Paging
RFC 5005: Feed Paging and Archiving, IETF Standard, September 2007
RFC 2119
RFC 2119: Key words for use in RFCs to Indicate Requirement Levels, IETF, March 1997.

Introduction

This specification describes how one RDF system can maintain a local copy of master data contained in another RDF system. The master data will change over time. This protocol defines how the master data server can publish the changes to the resources it manages and how a client can consume them to ensure it is in sync.

SDShare also specifies how a client should interpret and process these feeds in order to consume the resource descriptions. A client that wishes to maintain a local resource collection in sync with one held on the server first fetches the most recent snapshot for the required collection. It then subscribes to the update feed for that collection. This feed lists the resources whose state has changed in the underlying resource collection. A client can retrieve the RDF for a updated resource and update its local collection.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Conceptual Model

A server is a node which exposes feeds so that other nodes can observe and retrieve the state of the data being exposed by the server.

A client is a node which subscribes to one or more feeds and implements the update semantics defined in this protocol. It is possible for the same node to be both a server and a client at the same time.

An SDShare server exposes a set of collections. A collection is a coherent data set, defined by the server in any way it sees fit. When exposing an RDF data store via SDShare, a natural set of collections to use is to let each graph be a collection, but any set of collections is allowed.

A snapshot is a complete representation of a collection as it existed at some point in time.

A resource description is a complete representation of a single resource that exists in a collection.

A fragments feed exposes a list of the resources that have been modified in the collection.

Protocol

Server Role

An SDShare server SHOULD publish the following hierarchy of Atom feeds to expose one or more collection(s) of resource descriptions:

An SDSshare server SHOULD also provide resource descriptions and snapshots when requested. Note that these MAY be provided from a different node.

Overview Feed

The overview feed lists all collections exposed by the server. It MUST contain exactly one entry per collection. Each entry MUST contain a link element with a link relation of http://www.sdshare.org/2012/core/collectionfeed, type set to application/atom+xml, and an href whose value links to the collection feed for that collection.

To conform with the Atom specification the link must also be duplicated with link relation alternate.

Example Request URL:
  http://example.org/collections

Example Response:
   <feed xmlns="http://www.w3.org/2005/Atom">
   <title>Collection managed by example.org</title>
   <link href="http://example.org/collections"/>
   <updated>2008-12-13T18:30:02Z</updated>
   <author>
     <name>SDShare Server</name>
   </author>
   <id>http://example.org/collections</id>
   <!-- collection entry -->
   <entry>
     <title>A Resource Collection</title>

     <!-- a link to the collection feed -->
     <link rel="http://www.sdshare.org/2012/core/collectionfeed" type="application/atom+xml"
           href="http://example.org/collections/collection-one"/> 
     <link rel="alternate" type="application/atom+xml"
           href="http://example.org/collections/collection-one"/> 
     <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
     <updated>2008-12-13T18:30:02Z</updated>
     <summary>A set of RDF resources describing a taxonomy.</summary>
   </entry>
   <!-- an entry follows for each collection being exposed.
   ...
   -->
 </feed>

Collection Feed

A collection feed lists exactly two entries, one entry linking to a snapshots feed and one to a fragments feed.

The Atom feed MUST contain exactly two entries, one that links to a snapshot feed for the collection (the link relation is http://www.sdshare.org/2012/core/snapshotsfeed), and another to the fragment feed of the collection (the link relation is http://www.sdshare.org/2012/core/fragmentsfeed). To conform with the Atom specification both links must be duplicated with link relation alternate.

Example Request URL:
  http://example.org/collections/collection-one

Example Response:
  <feed xmlns="http://www.w3.org/2005/Atom">
  <title>A Resource Collection</title>
  <updated>2008-09-26T11:13:40-01:00</updated>
  <id>http://example.org/collections/collection-one</id>
  <author>
    <name>SDShare Server</name>
  </author>

  <entry>
    <title>Collection Updates</title>
    <id>http://example.org/collections/collection-one/fragments</id>
    <updated>2008-09-11T17:58:39-01:00</updated>
    <author>
      <name>SDShare Server</name>
    </author>
    <link href="http://example.org/collections/collection-one/fragments"
             rel="alternate" 
             type="application/atom+xml"/>
    <link href="http://example.org/collections/collection-one/fragments" 
             rel="http://www.sdshare.org/2012/core/fragmentsfeed" 
             type="application/atom+xml"/>
  </entry>
  <entry>
    <title>Collection Snapshots</title>
    <id>http://example.org/collections/collection-one/snapshots</id>
    <updated>2008-09-11T17:58:39-01:00</updated>
    <author>
      <name>SDShare Server</name>
    </author>
    <link href="http://example.org/collections/collection-one/snapshots"
             rel="alternate"
             type="application/atom+xml"/>
    <link href="http://example.org/collections/collection-one/snapshots"
             rel="http://www.sdshare.org/2012/core/snapshotsfeed" 
             type="application/atom+xml"/>
  </entry>
</feed>

Snapshots Feed

The snapshots feed lists all the snapshots of a collection that are exposed by a server. Each entry in the feed represents a single snapshot. The entry MUST contain at least one link element with a link relation of http://www.sdshare.org/2012/core/snapshot and a type attribute specifying the format of the snapshot. It must also have an href attribute that links to the data for the snapshot. The entry's updated element MUST reflect the time when the snapshot was taken. To conform with the Atom specification the link MUST also be duplicated with link relation alternate. Every link referring to a snapshot MUST have a type attribute.

In cases where the snapshot is very large the server MAY expose an additional link to a chunked snapshot feed, with link relation http://www.sdshare.org/2012/core/chunked-snapshot and type application/atom+xml. The chunked snapshot feed will contain a list of atom entries where each one links to part of the data.

Example Request URL:
http://example.org/collections/collection-one/snapshots

Example Response:
 <feed xmlns="http://www.w3.org/2005/Atom">
   <title>The Snapshots of the Resource Collection</title>
   <subtitle>A list of all snapshots of this collection</subtitle>
   <author>
     <name>SDShare Server</name>
   </author>
   <updated>2008-07-17T12:15:07.020071Z</updated>
   <id>http://example.org/collections/collection-one/snapshots</id>

   <entry>
     <title>Snapshot 2008-07-17</title>
     <updated>2008-07-17T14:04:42.205299Z</updated>
     <!-- a link to the snapshot RDF -->
     <link rel="alternate" 
              type="application/rdf+xml"
              href="http://example.org/collections/collection-one/snapshots/0001"/>
     <link rel="http://www.sdshare.org/2012/core/snapshot"
              type="application/rdf+xml"
              href="http://example.org/collections/collection-one/snapshots/0001"/>

     <!-- optional link to the chunked snapshots feed -->
     <link rel="http://www.sdshare.org/2012/core/chunked-snapshot"
              type="application/rdf+xml"
              href="http://example.org/collections/collection-one/snapshots/0001/chunks"/>

     <id>urn:uuid:60a76c80-d300-11d9-b93C-0003939e0af6</id>
   </entry>
   <!-- an entry follows for each RDF snapshot being exposed.
   ...
   -->
 </feed>    

Chunked Snapshot Feed

The chunked snapshot feed contains links to all the chunks that comprise the snapshot. It is up to the server to decide the amount of data in each chunk.

Each entry in the feed represents a subset of the snapshot. The entry MUST contain at least one link element with a link relation of http://www.sdshare.org/2012/core/snapshot-component and a type attribute specifying the format of the snapshot. It must also have an href attribute that links to the data. To conform with the Atom specification the link MUST be duplicated with link relation alternate. Every link referring to a snapshot chunk MUST have a type attribute.

Example Request URL:
http://example.org/collections/collection-one/snapshots/0001/chunks

Example Response:
 <feed xmlns="http://www.w3.org/2005/Atom">
   <title>The Chunked Snapshots of the Resource Collection</title>
   <subtitle>A list of all chunks comprising the snapshots of this collection</subtitle>
   <author>
     <name>SDShare Server</name>
   </author>
   <updated>2008-07-17T12:15:07.020071Z</updated>
   <id>http://example.org/collections/collection-one/snapshots/0010/chunks</id>

   <entry>
     <title>Snapshot Chunks 2008-07-17</title>
     <updated>2008-07-17T14:04:42.205299Z</updated>
     <!-- a link to the snapshot component RDF -->
     <link rel="alternate" 
           type="application/rdf+xml"
           href="http://example.org/collections/collection-one/snapshots/0001/part-1"/>
     <link rel="http://www.sdshare.org/2012/core/snapshot-component"
           type="application/rdf+xml"
           href="http://example.org/collections/collection-one/snapshots/0001/part-1"/>

     <id>urn:uuid:60a76c80-d300-11d9-b93C-0003939e0af7</id>
   </entry>
   <!-- an entry follows for each snapshot chunk.
   ...
   -->
 </feed>    

Fragments Feed

The fragments feed is an Atom feed listing resources that have changed in a given time period. The order of the entries is undefined.

The Atom content contains an entry for each updated, deleted, or created resource. Each entry contains one or more links to the fragment and the updated element contains the time at which the resource was updated. Links to the fragment must use the http://www.sdshare.org/2012/core/fragment link relationship, and can be in different formats identified with the type attribute, allowing clients to choose their preferred format. To conform with the Atom specification the link MUST be duplicated with link relation alternate. Every link referring to a fragment MUST have a type attribute.

This protocol introduces one new Atom extension element called resource in the namespace http://www.sdshare.org/2012/core/. The resource element indicates to a client which resource is being updated from all those present in the fragment. This element MUST occur exactly once as a child element of each entry.

The fragments feed service MUST accept an optional request parameter called since. This parameter is be used to specify that the client only wishes to see entries for fragments produced, modified or deleted after the time given in the parameter. The effect of this parameter is to remove from the response body entry elements with timestamps in the updated element older than the given time. The value of the parameter is a datetime value matching the format specified in section 3.3 of the Atom specification.

Note that links to the service in the collection feed will not include this parameter. Clients must therefore add it to the URL themselves should they wish to use it.

As the number of changes made to the collection grows, the number of entries in the fragment feed can become very large. To avoid returning excessively long responses, the server MAY page the fragment feed, using the conventions defined in RFC 5005. If the feed is paged, each page except the last MUST contain one link element with the next relation type. Other link types MAY also be provided.

Example Request URL (with since):
  http://example.org/collections/collection-one/fragments?since=2011-03-21T14:49:23Z 

Example response body:
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:sdshare="http://www.sdshare.org/2012/core/">
  <title>Fragments feed from the collection</title>
  <author>
    <name>SDShare Server</name>
  </author>
  <updated>2008-07-17T15:47:17.062211Z</updated>
  <id>urn:uuid:28C5DBD8-652A-4617-8C4A-C0FFC49B4475</id>

  <entry>
    <!-- Best practice: a resource's RDFS Label or the resource URI -->
    <title>Government Spending</title>
    <!-- the published date and time of the fragment -->
    <updated>2008-07-17T15:55:21.971145Z</updated>

    <!-- the id value is some unique value -->
    <id>urn:uuid:69CD5264-DB78-49c1-A7E4-04EECFA0AA85</id>
    <link rel="http://www.sdshare.org/2012/core/fragment"
          type="application/rdf+xml"
href="http://example.org/collections/collection-one/fragments?id=2321"/>
    <link rel="alternate" type="application/rdf+xml"
href="http://example.org/collections/collection-one/fragments?id=2321"/>
         
    <!-- the resource indicates which resource this representation is for -->
    <sdshare:resource>http://example.org/concepts/governmentspending</sdshare:resource>
  </entry>
  <!-- an entry follows for each changed resource being exposed
  ...
  -->
</feed>

For RDF data, the set of statements S making up the fragment for a resource R in a collection corresponding to a graph G is produced by the following algorithm:

  • Include in S all statements in G in which R is the subject.
  • For all statements in S where the object is a blank node B, find all statements with B as the subject, and add those statements to S. Repeat recursively with all B's statements.

Client Role

This section describes the most common operations performed by SDShare clients, and some minimal requirements for client behaviour. This specification does not restrict what clients may do with the data they retrieve.

The client generally updates a local collection with data from a remote collection using the SDShare feeds. A local collection is typically considered to be an RDF graph but can be some other local data store such as a relational database or archiving system. The client often starts off with no local data, and would then typically begin by retreiving a snapshot and loading that into the local collection. Thereafter, the client waits for resources to appear on the fragments feed. When an entry appears on the fragments feed, the local representation of that resource is deleted and the new representation offered by the server is loaded in its place.

While processing, the client MUST keep track of the time of the last update it has seen, called t. Initially, t is not set.

Snapshot Processing

To process a snapshot, the client should first delete the local copy of the collection (if it has one). The snapshot can then be downloaded from the server, and written into the local collection.

If the server exposes a chunked snapshot feed, the client should use it. For each entry in the feed the client should use the approperiate link to retrieve the snapshot chunk and load it into the local collection.

Once a snapshot has been processed successfully, t MUST be set to the time in the updated element of the snapshot entry that was processed.

The chunked snapshot feed and the complete snapshot data MAY be produced asynchronously by the server. If this is the case, the server may return an HTTP 202 response code, in which case the client MUST wait, and retry the same request until it receives an HTTP 200 response.

Fragment Processing

A client wishing to update its local collection as new changes occur on the server, should process the fragments feed for the appropriate collection. For each fragment, the clients should delete its local copy of the fragment, and then replace it with the new fragment from the server. Section 5.2.3 describes a simple update algorithm for RDF data stores.

In the case where the fragments feed is paged, as described in Atom Specification 3.3, then the client should follow the next links to find the rest of the fragments.

Once all fragments have been successfully processed, the client MUST set t to the highest (that is, latest) value observed in the updated element of any fragment entry. If t has a value when fragment processing begins, the client MUST pass it in the since parameter described in section 5.1.4.

A Fragment Update Algorithm for RDF Stores

Each Atom entry contains a resource element, which identifies which resource is to be updated. We call this resource R.

The Atom entry provides a set of links with link relation http://www.sdshare.org/2012/core/fragment referring to different representations of the resource. It is up to the client to choose which link to follow, based on the type attribute.

The link is used to retrieve a set of RDF statements that are the new representation of the resource R. We call this set of statements S. These statements are to be written to a graph G, where G is determined by the client.

  • Follow the fragment generation algorithm in section 5.1.5 for R in G, and call the resulting set of statements S'.
  • Delete all statements in S' from G.
  • Insert all statements in S into G.

Recommended Resource Representation Formats

Any formats may be used for resource representations, but the following are recommended: