Tuesday, November 18, 2008

RESTful Query URLs (Cont.)

As a follow up on my previous RESTful Query URLs post, I'm going to be looking at implementing RESTful querying to the JSON repository I developed. The repository stores JSON documents and is searchable using a 'query-by-example' approach, e.g. you provide a document with the key/value combinations you're interested in and the repository returns all documents that match.

Since a query involves a template document, the simplest approach would be to provide a /search URI and let the client POST their query to that. My main hangup with this approach is the lack of linkability, e.g. I can't email/im/twitter a URI to the results of a query.

Doug talks about this in his recent REST via URI's and Body Representation blog post. In it, he suggests an approach where the client would POST the query/payload and recieve a 201 and a GETable URI with the results. This has some interesting implications (as Doug points out). How long does the /resource/request/[id] stay around? Presumably it could stick around indefinitely or until whatever is being queried changes. Do two clients POSTing the same body payload get the same results id. If you're going to support this, then you'll either have to query to see if the request has already been assigned an id or you're going to have to assign the ids based on the contents of the request, perhaps a SHA hash of the body payload. In either case, you're going to have to store the original request along with the id you've assigned to it.

I think this is why URI-encoding appeals to me: I don't have to keep any extra state around because I can re-create the request from the URI. This falls down when you need more expressive request capabilities than URI-encoding allows. I can also see an advantage in Doug's approach if the majority of your interactions are going to be workflow-based rather than single shot queries.

For the interface to the JSON repository, I chose to represent queries via the URI. I even went so far as to avoid using the query string, opting rather to put everything into the URI structure. The choice to do this was mainly one of exploration. I wanted to see if it offered any advantages (readability, cachability, simplicity) over using the query string.

The URIs take the form of: /collection[/term/value(s)]+ where term is either a direct property/key in the desired JSON document or a derived property (such as 'fulltext' which looks at the full text index of the document). The value section can either be a single value or a comma separated list of values. Some annotated examples include:
  • /and2 - return all documents in the and2 collection
  • /and2/1 - return the document with id = 1 in the and2 collection (special case)
  • /and2/type/image.SplitCore - return all documents with a type property of 'image.SplitCore'
  • /and2/fulltext/calcite,calcareous,carbonate - return any documents that contain 'calcite' OR 'calcareous' OR 'carbonate'
  • /and2/depth/100,200 - return any documents between the depth 100 and the depth 200. This changes the semantics of the comma operator as it no longer means the OR as it did with the fulltext term. If you pass in only one depth, it only returns documents at exactly that depth. If you pass in more than two depths, then the additional depths are ignored.
Multiple terms can be chained together:
  • /and2/type/image.SplitCore/depth/100,200 - return any documents of type 'image.SplitCore' AND between depth 100 and 200.
  • /and2/fulltext/calcite/fulltext/carbonate - return any documents containing 'calcite' and containing 'carbonate'
And I've added some special query operators:
  • /and2/type/!psicat.* - return any documents not of any psicat type, e.g. this would exclude documents with type properties of 'psicat.Interval', 'psicat.Unit', and 'psicat.Occurrence'.
What I like about this approach is that I can build a URI to just about any subset of documents in the collection (though it does require some prior knowledge of the structure). There are a few warts though. For one, it requires that URI components occur in pairs, so you can't peel back the URI like an onion: e.g. /and2/type/image.SplitCore is valid but /and2/type doesn't make sense. There is also an issue of canonicality, e.g. /and2/type/image.SplitCore/depth/100,200 will always return the same results as /and2/depth/100,200/type/image.SplitCore but they appear as separate resources to the caching layer. There's also aesthetics. I don't yet know how I feel about commas in the URL; they look weird to me. The scheme also doesn't pluralize the terms when multiple values are sent, e.g. /and2/types/image.SplitCore,image.WholeCore.

I'd love to hear any feedback on what you think of this approach.

No comments: