Latest Updates: Our Blog

How to: Grab Thumbnail Images

Posted
Oct 8th, 2010

Tags
Workspace , ,

Author
Amanda Hickman

When Krista Kjellman Schmidt was putting together the chart illustrating key deletions in a set of documents, she needed a quick way to grab thumbnail images of particular pages. She was lucky — I just happened to be at ProPublica’s offices and just happened to overhear a few snips of conversation. You might be able to use firebug, someone suggested, and then reduce the images? another followed. I’m not sure what prompted me to look up (possibly the fact that I’m a busybody?) but I did, and when I realized what she was trying to do I had a far better suggestion. It went something like this:

If you know the URL of your document, say:

http://www.documentcloud.org/documents/10412-nsf-plan.html

you can access a great deal of additional data in a JSON file by changing the url ever so very slightly to something more like:

http://www.documentcloud.org/documents/10412-nsf-plan.json

See the difference? I just swapped “json” for “html.” Nothing fancy. Raw, the JSON probably comes out looking a lot like gibberish, but if you look closely you’ll see a string like "image":"http://www.documentcloud.org/documents/10412/pages/nsf-plan-p{page}-{size}.gif" tucked in among the other notations.

{page} and {size} are just placeholders for the page of the document and the size you’d like. At this point it helps to know that DocumentCloud stores images in three sizes: “thumbnail,” “normal” and “large.” So to find a thumbnail of page three, I’d use a URL like:

http://www.documentcloud.org/documents/10412/pages/nsf-plan-p3-thumbnail.gif

You can paste that into your browser’s location bar if you just need a single image, or use a tool like cURL or wget to grab a handful of images at once.

Hint: if you’re very crafty, you can work out the pattern to these URLs and skip the JSON altogether!

Leave a Reply