Tuesday, November 18, 2008

RESTful Query URLs (Cont.)

As a follow up on my previous RESTful Query URLs post, I'm going to be looking at implementing RESTful querying to the JSON repository I developed. The repository stores JSON documents and is searchable using a 'query-by-example' approach, e.g. you provide a document with the key/value combinations you're interested in and the repository returns all documents that match.

Since a query involves a template document, the simplest approach would be to provide a /search URI and let the client POST their query to that. My main hangup with this approach is the lack of linkability, e.g. I can't email/im/twitter a URI to the results of a query.

Doug talks about this in his recent REST via URI's and Body Representation blog post. In it, he suggests an approach where the client would POST the query/payload and recieve a 201 and a GETable URI with the results. This has some interesting implications (as Doug points out). How long does the /resource/request/[id] stay around? Presumably it could stick around indefinitely or until whatever is being queried changes. Do two clients POSTing the same body payload get the same results id. If you're going to support this, then you'll either have to query to see if the request has already been assigned an id or you're going to have to assign the ids based on the contents of the request, perhaps a SHA hash of the body payload. In either case, you're going to have to store the original request along with the id you've assigned to it.

I think this is why URI-encoding appeals to me: I don't have to keep any extra state around because I can re-create the request from the URI. This falls down when you need more expressive request capabilities than URI-encoding allows. I can also see an advantage in Doug's approach if the majority of your interactions are going to be workflow-based rather than single shot queries.

For the interface to the JSON repository, I chose to represent queries via the URI. I even went so far as to avoid using the query string, opting rather to put everything into the URI structure. The choice to do this was mainly one of exploration. I wanted to see if it offered any advantages (readability, cachability, simplicity) over using the query string.

The URIs take the form of: /collection[/term/value(s)]+ where term is either a direct property/key in the desired JSON document or a derived property (such as 'fulltext' which looks at the full text index of the document). The value section can either be a single value or a comma separated list of values. Some annotated examples include:
  • /and2 - return all documents in the and2 collection
  • /and2/1 - return the document with id = 1 in the and2 collection (special case)
  • /and2/type/image.SplitCore - return all documents with a type property of 'image.SplitCore'
  • /and2/fulltext/calcite,calcareous,carbonate - return any documents that contain 'calcite' OR 'calcareous' OR 'carbonate'
  • /and2/depth/100,200 - return any documents between the depth 100 and the depth 200. This changes the semantics of the comma operator as it no longer means the OR as it did with the fulltext term. If you pass in only one depth, it only returns documents at exactly that depth. If you pass in more than two depths, then the additional depths are ignored.
Multiple terms can be chained together:
  • /and2/type/image.SplitCore/depth/100,200 - return any documents of type 'image.SplitCore' AND between depth 100 and 200.
  • /and2/fulltext/calcite/fulltext/carbonate - return any documents containing 'calcite' and containing 'carbonate'
And I've added some special query operators:
  • /and2/type/!psicat.* - return any documents not of any psicat type, e.g. this would exclude documents with type properties of 'psicat.Interval', 'psicat.Unit', and 'psicat.Occurrence'.
What I like about this approach is that I can build a URI to just about any subset of documents in the collection (though it does require some prior knowledge of the structure). There are a few warts though. For one, it requires that URI components occur in pairs, so you can't peel back the URI like an onion: e.g. /and2/type/image.SplitCore is valid but /and2/type doesn't make sense. There is also an issue of canonicality, e.g. /and2/type/image.SplitCore/depth/100,200 will always return the same results as /and2/depth/100,200/type/image.SplitCore but they appear as separate resources to the caching layer. There's also aesthetics. I don't yet know how I feel about commas in the URL; they look weird to me. The scheme also doesn't pluralize the terms when multiple values are sent, e.g. /and2/types/image.SplitCore,image.WholeCore.

I'd love to hear any feedback on what you think of this approach.

Encounters at the End of the World

I just got my copy of Encounters at the End of the World. If you're interested in Antarctica and what it's like to live and work there, then I highly recommend this movie. It was actually filmed while I was down there the first time, but I was far too busy to put in a cameo. :) It will probably come off as a little out there, but it's a good representation of the people and life down there.

Monday, November 17, 2008

ImageMagick DSL 2

This is a quick post that shows another way to work with the ImageMagick DSL (and other Java DSLs). It comes from a trick I saw in a DSL talk given by Neal Ford. Basically you can use initializer blocks to construct objects in a bit less verbose way:

new ImageMagick() {{
option("-rotate", "90");
option("-resize", width + "x");
}}.run(in, out);

This still requires external variables, such as width, need to be declared final. So to wrap up, here's three different ways to invoke the DSL:


ImageMagick convert = new ImageMagick();
convert.option("-rotate", "90");
convert.option("-resize", width + "x");
convert.run(in, out);

Method Chaining:

new ImageMagick().option("-rotate", "90").option("-resize", width + "x").run(in, out);

Initializer Block:

new ImageMagick() {{
option("-rotate", "90");
option("-resize", width + "x");
}}.run(in, out);

Friday, November 07, 2008

ImageMagick DSL

I've been fighting with JAI/Java2D over the last day or two to manipulate (resize, crop, composite) some large images. I have working code that produces decent quality images, but I really have to crank up the heap space to avoid OutOfMemoryExceptions. If I try to process more than one or two images concurrently, OutOfMemoryExceptions are inevitable. Since this code is going to be called from a servlet, I'm expecting to handle multiple concurrent requests.

This is not a new problem and people have been tackling it in various ways. Since I was working in a server environment and have control over what applications are installed, I decided to use ImageMagick for the image manipulation. ImageMagick is great; I've used it quite often in various shell scripts.

There's basically two ways to work with ImageMagick from Java. You can use JMagick, a JNI layer over the ImageMagick interface, or you can use Runtime.exec() to call the ImageMagick command line application. I opted for the latter as it seemed simpler when I pushed the code from my Mac to my Linux server.

Since finding and invoking ImageMagick's convert command can be somewhat problematic, I decided to write a simple fluent API in Java to hide the details. The result allows you to invoke convert using method chaining:

ImageMagick convert = new ImageMagick(Arrays.asList("/opt/local/bin"));
convert.in(new File("in.jpeg"))
.option("-fuzz", "15%")
.out(new File("out.jpeg")).run();
convert.option("-resize", "250x").run(new File("in.jpeg"), new File("out.jpeg"));

You create a new ImageMagick object. As a convenience, you can pass in a list of additional paths to check for the convert command in the event that it isn't on the default path. If the convert command can't be found, the constructor throws an IllegalArgumentException.

Once you have an ImageMagick object, you can execute convert by chaining various method calls, ending in a run(). run() returns true if the command succeeds, false otherwise.

In less than 200 lines of Java code, I had a much nicer way to interact with ImageMagick. A fun experiment would be to take the code and implement an even nicer DSL in Groovy. methodMissing() would allow fluent method chaining on steroids:

convert.fuzz("15%").trim().run(in, out)

As Guillaume Laforge tweeted, using metaprogramming, categories, and syntactic sugar like named parameters you could end up with a full blown DSL that looks like this:

convert fuzz:15.pct, trim: true, in: file, out:file