Friday, October 31, 2008

RESTful Query URLs

The last couple of days I've been working on writing a RESTful JSON document database. While a number of these already exist (CouchDB, FeatherDB, DovetailDB, Persevere, JSONStore, etc.), I decided to write my own because I wanted a bit more control over the URL scheme used by the REST interface, and I needed the ability to tweak the search functionality to achieve decent performance on some common but complicated queries. All in all it was an interesting diversion. The actual server clocked in at about 1000 SLOC, with much of that boilerplate because I wrote it in Java/JDBC rather than Groovy/GroovySQL.

The most interesting problem came in designing the query scheme for the REST interface. There seems to be a couple different ways to implement it with no real consensus as to which is the "right" way. As with most things, I suspect it depends on how you've implemented other pieces of the architecture and even personal preference. Below I describe three approaches I considered. The nice thing with REST is there's nothing stopping you from implementing all of these approaches in your interface.

NB: I'm no REST expert so the information below is my observations rather than any best practices. I'd love for anyone who knows better to chime into the discussion.

POST query parameters/document
In this approach, you provide a search endpoint, say something unoriginal like '/search', and queries are POSTed to that URI. The query is either a set of form encoded key-value pairs or a search document using a schema shared between the client and server.

This approach seems closer to RPC than REST to me, but may be the best approach if your search functionality requires a more complex exchange of information than simple key-value pairs allow. The obvious downside to this approach is that there is no way to bookmark a query or email/IM a query to someone else. This approach also can't take advantage of the caching built into the HTTP spec.

GET query string
Similar to above, you expose a URI endpoint, possibly something like /search, and queries are sent to that endpoint with the parameters encoded in the query string of the URL, e.g. http://www.google.com/search?q=REST+query+string

This approach improves on the bookmarkability of searches, since all of the parameters are in the URL. However, the use of the query string may interfere with caching as described in Section 13.9 of the HTTP spec. Overall, I think there is nothing inherently un-RESTful about this approach, especially if you provide more resource-oriented URIs than /search, e.g. /documents?author=Reed. In my head, I interpret the latter as "give me all of the document resources but filter on the author Reed. Removing the query string will still give you a resource (or collection of resources in this case).

Where this approach falls down is when you start trying to represent hierarchical or taxonomic queries with the query string, e.g. http://lifeforms.org?k=kingdom&p=phylum&c=class&o=order&f=family&g=genus&s=species as described on the RestWiki.

Encoding query parameters into the URI structure
In this approach the query parameters are encoded directly into the URI structure, e.g. /documents/authors/Reed, rather than using the query string. Another example of is described at Stack Overflow.

This approach solves both the bookmarkability and the caching issues of the previous approaches, but can introduce some ambiguity, especially if your resources aren't strictly hierarchical in nature. The biggest stumbling block for me was this: looking at the URI /documents/authors/Reed, it's not immediately clear what will be returned. For example, if I sent you the URI /documents you might infer that you would get a list or the contents of some documents. From the URI /documents?author=Reed, you might infer that the resource(s) returned would be documents authored by Reed. So what might you expect to get from the URI /documents/authors/Reed? Information about the author Reed or all documents authored by Reed?

How important is this? I guess it's really up to you. A machine likely infers about as much from
/documents/authors/Reed as it does from /documents?author=Reed.

Thursday, October 09, 2008

Core Gallery

It seems like every couple of months I end up with a project that involves a fair amount of Javascript. Back in March it was working with Simile Timeline to visualize depth-based data. This time around, I wanted to create a lightweight way to visualize our drill core imagery. We already have full-featured visualization tools that scientists use, so I was looking to create something simple that would engage non-geologists.

The result is the Core Gallery. It shows an animated whole core image next to a split core image. Since the images are too large to display on the screen, there's a slider that lets you see different parts of the core. The page also displays some additional information about the core.

I'm really happy how it turned out. The page is 100% HTML, Javascript, and CSS. No Flash and no Java. For the Javascript, I'm using JQuery for no reason other than I wanted to see how it stacked up to other JS libraries I've used. It was perfect for this project and a treat to work with. Below I'm going to sketch out how various parts of the page are built.

Core Slider
The core slider is the most complicated part of the page. It uses the JQuery UI/Slider component. I used this screencast to help me acquaint myself with the slider. To achieve the highlighted core effect as the slider handle moves, I used two thumbnails of the core. One thumbnail is regular and one is washed out. I set the washed out thumbnail as a CSS background image on the slider element. I set the regular thumbnail as a CSS background image on the slider handle. The handle has a fixed size based on the height of the thumbnail vs. the height of the real core images so only part of the thumbnail is shown. From the slider's slide() callback, I simply update the CSS background-position property on the handle to ensure that handle's image is showing the same portion of the core as the underlying slider. I use this same technique to move the rotating whole core and split core images, taking the difference in image height between the thumbnail and the other core images into account.

Animated Whole Core Image
The slider was the most complicated but the animated whole core image was the most challenging. I wanted show the image animated in faux 3D. I initially started with a Java applet using JOGL. The applet worked on my Mac but not on Windows or Linux, so I abandoned it. I then got the idea to employ the CSS Sprites technique. So I used a tool to render the 3D whole core image 90 times each rotated by 4 degrees and montaged them together. Once I had this, it was simply a matter of setting up a Javascript Timer interval to fire every 50ms and move the image right by a fixed amount each time. This simulates animation fairly effectively. I keep track of the current rotation and vertical offset in global variables so the core keeps rotating when you move the slider.

Split Core Image
I use the same technique as on the slider handle to make the image track the slider's position.

Core Links
In the text description, it is possible to link to different parts of the core. This is a somewhat neat trick. To accomplish it, I wrap portions of the description text in span tags. Each span tag has an id attribute in the form of a ratio between 0.0 and 1.0. Using JQuery, I find these special span tags and add an onClick handler that updates the slider position based on the span's id attribute. So if the span had an id of 0.8, clicking on it would move the slider to the 80% position of the core. 0.0 takes you to the top and 1.0 takes you to the bottom.

Conclusion
Overall the Core Gallery turned out surprisingly well for being 100% browser-based. It took much less work than I originally envisioned thanks to JQuery. I'd definitely consider JQuery for future projects.