DocumentCloud

Archive for the ‘Code’ Category

Uploading Documents Gets a Little Easier

without comments

You’ve always been able to script batch uploads using our API, but for users without coding skills, uploads were one at a time. Today we’re rolling out an improved document uploading dialog that will let you upload as many documents as you want, all in one fell swoop.

You’ll still use the “New Documents” button, but now that button takes you straight to a file selection dialog.
Use the control (on MS Windows) or command (on Macs) key to select additional documents, just like you would in your file browser.

File selection screenshot

We’ll start you off by suggesting a title, based on each file’s name, but you can edit that name and add additional information, including the source of each document and a description. As with the old upload dialog, you can decide right when you upload your document whether or not you’re ready to share it with the world yet. As ever, you can edit all of these fields again later.

If your documents share a common source or description, use the “Apply to All Files” link to copy your metadata to each document in this batch. Note: this new upload interface requires Flash. If that’s an issue for you, let us know ASAP and we’ll whip up an alternate interface that doesn’t require any plugins. Promise.

As the files upload from your computer to DocumentCloud, you’ll see the progress of each transfer.

Better Processing, Too
This week’s release is more than just a new upload dialog. We’ve made some big changes to Docsplit and the RightAWS gem, and we’re hoping this means the dreaded “import failed” error will be a thing of the past. If looking under the hood is your thing, both Docsplit and our fork of RightAWS are on on github for your viewing (and reusing) pleasure.

Don’t be a Stranger
If you have gigabytes worth of documents to upload, get in touch before you start uploading so we can add more horsepower to handle your job. Otherwise, happy uploading! And don’t forget to tell us about what you’re publishing with DocumentCloud.

Written by Amanda Hickman

August 18th, 2010 at 9:22 am

Posted in Code,Workspace

Introducing Page Notes

without comments

The Document Viewer has always supported the ability to create “page notes” — annotations that sit between two pages and provide commentary about a specific page as a whole or an introduction to a new section of a document. This morning, we released an update to DocumentCloud that provides a way for you to create page notes from within the viewer.

To try it out, open a document you’ve uploaded and click on one of the “Add a Note” links in the sidebar. Hover your crosshair over the margin in between pages, and you’ll see a dotted line appear, with a note tab on the left:

If you click, you’ll create a page note in between the two pages. Add a title and some text and click the “Save” button. The note’s title will appear in the navigation on the right. Of course, page notes are viewable and editable from the workspace, just like any other.

If you’re logged in, you can take a look at the sample document shown here.

Written by Jeremy Ashkenas

July 27th, 2010 at 12:13 pm

Posted in Code

HTTPS Support (and Other Updates)

without comments

Monday morning we rolled out SSL support on DocumentCloud.org — visit https://www.documentcloud.org to view, browse and edit documents in your workspace over an encrypted connection. When you use HTTPS, all traffic between your computer and DocumentCloud is encrypted before it’s sent over the internet. If you’re working on a public wireless connection, are on an unsecured network or are dealing with highly-sensitive documents, we recommend using HTTPS.

You can tell if you’re an secure connection by looking at your browser. When visiting a secure website, all browsers display a lock icon somewhere on the window. Here’s what the lock looks like in Google Chrome:

More Search Parameters

We’ve also added new ways to filter your DocumentCloud searches. You can now use “access” to filter your documents by their access level, and “projectid” to designate a specific project when you’re using our search API. (Access to searches and the API are limited to registered users during the beta.)

Use "access" to search documents by access level.

To view only your private documents in a particular project, you can add “access: private” to your search terms. Searching by “access: public” will show you only public documents, while “access: organization” will show you those documents shared within your organization.

Already using the search API? We’ve added search terms that let you limit public results to a single project. Drop a line to support AT documentcloud DOT org if you’d like to take advantage of this one.

Still waiting for an important feature? Let us know!

These improvements are only available to users who have an account on DocumentCloud. If you’re a reporter who works with primary source documents, and you’re not using DocumentCloud yet contact us to find out how to start.

Written by Jeremy Ashkenas

July 20th, 2010 at 10:15 am

Posted in Code

Guides and How-to’s

without comments

For some time now, instructions about how to use the DocumentCloud workspace have been available through our wiki. This morning, we released an update that pulls the help pages right into the workspace for easy access, and hopefully makes it faster to get your questions answered.

We’ve included pages on searching, account management, collaboration, privacy, uploading documents, troubleshooting failed uploads, and editing and publishing documents on your web site. The next time you log in to DocumentCloud, take a peek at the new “Help” tab, and let us know if there’s anything you think we should add to the guides.

Written by Jeremy Ashkenas

July 12th, 2010 at 5:15 pm

Posted in Code

Bidding IE6 Adieu

without comments

Last week, we rolled out an update to DocumentCloud’s document viewer that included a wide range of improvements that you might never even notice. Page layouts and scrolling look very different under the hood, pages load and scroll much faster now, annotations work better, readers can resize a document viewer without setting off a barrage of little hiccups. We replaced much of the viewer’s JavaScript with CSS, which we hope will form a much more stable foundation for DocumentCloud development going forward. In the process, however, we stopped supporting for Internet Explorer 6.

IE6 has long been the bane of web developers: developing web applications that work as well in IE6 as in other browsers is substantially more difficult than bypassing the ten year old browser.

IE6 users will still be able to download a original PDF of any document and will see a landing page that encourages IE6 users to upgrade their browser or install Chromeframe.

The New York Times, with whom we continue to collaborate closely on development of the viewer component of DocumentCloud, has long planned to phase out support for IE6. They don’t test new tools against the browser and will soon update to the same version of their document viewer that DocumentCloud is running on. The Times isn’t alone: YouTube began phasing out support for IE6 in March and other Google products are expected to follow suit. We’re certainly open to feedback on our implementation.

Meantime, take a look at some of the great things reporters are doing with DocumentCloud.

Written by Amanda Hickman

July 9th, 2010 at 6:18 pm

Posted in Code

Collaboration

without comments

Since we launched DocumentCloud’s beta, one of the most common requests has been: “How can I share documents with reporters from other organizations?”

Now you can share a project with any other DocumentCloud user — in any newsroom.

How does it work?
Let’s say I have a project with documents relating to the Madoff Ponzi scheme, and I want to share them with Scott. To open the project for editing, I click on its edit icon.

Inside of the project, I click on the “Add a collaborator to this project” link, and I type in Scott’s email address — the one that he uses to log in to DocumentCloud.

After clicking the “Add” button, Scott now appears as a collaborator on this project.

The next time Scott logs in to DocumentCloud, “The Madoff Files” will show up as one of the projects in his sidebar. He can now view, edit and annotate all of the documents inside of it. He can add documents of his own to the project and I’ll be able to see and edit those as well.

Project collaborators can do anything with the documents in a project that you can do: they can edit public notes, change settings like the document’s title or source, add “related article links.” Collaborators can also add or remove additional people to the project. You can only collaborate with fellow DocumentCloud users, though: if you’re collaborating with a newsroom that isn’t yet part of DocumentCloud, send them our way and we’ll get them set up.

We’d love it if you would give it a spin and let us know what you think: write to support@documentcloud.org or suggest improvements where fellow users can weigh in as well: in our support forum.

Written by Jeremy Ashkenas

June 28th, 2010 at 10:44 am

Posted in Code,Workspace

Announcing Docsplit: Break Documents into Images, Pages, and Plain Text

without comments

We’ve been spending a lot of time in the DocumentCloud Lab researching the best way to break apart documents into their component parts, to make it easier to index them for searching and to display them on the web. The latest open-source piece of DocumentCloud is a tool to help you extract images, thumbnails, plain text, and individual pages from any kind of document. It wraps up the PDFBox, GraphicsMagick, and JODConverter libraries, providing you with a command-line utility and a Ruby API for breaking apart documents.

Docsplit is our fourth open-source project, but is perhaps the most immediately useful in the newsroom. We’ve been talking to the Guardian and the New York Times about techniques for pulling images and text out of documents, and Docsplit synthesizes some of the best practices into a single package with a simple interface. We’re hoping it comes in handy the next time you need to analyze a pile of documents.

Written by Jeremy Ashkenas

December 7th, 2009 at 10:17 am

Posted in Code

Announcing Jammit: DocumentCloud’s Asset Packager

without comments

The DocumentCloud prototype includes a “Journalist Workspace” — a tool for searching, organizing, and visualizing the relationships among documents. We’re building the workspace as a modern web application, which means that there’s a lot of static assets behind the scenes (JavaScript, templates, CSS, and images). The problem arises: how do you keep all of these assets organized while still delivering them as efficiently as possible to a web browser?

Our answer, Jammit, is a Rails gem that takes care of merging and compressing all of a website’s static assets. It runs JavaScript and CSS through the excellent YUI Compressor, zips them up for speedy downloads, and can embed small images right into the stylesheets. Using it in the DocumentCloud prototype has cut the time that it takes to load the workspace in half.

The project page contains complete overview of Jammit, including installation instructions, documentation, and examples. We hope you can use it to help speed up your Rails applications.

Written by Jeremy Ashkenas

November 16th, 2009 at 10:35 am

Posted in Code

Underscore.js: Our Second Open-Source Release

with one comment

We released the first open-source component of DocumentCloud a little over a month ago. Since then CloudCrowd has picked up a lot of steam, with hundreds of developers watching it on GitHub, and many patches and features being contributed by the community. Among other uses, it’s running gene sequence analysis on strains of influenza virus — something we certainly never expected to see. Since anything worth doing is worth doing twice, this morning I’m pleased to announce the release of the second open-source component of DocumentCloud: Underscore.js.

Underscore is a Javascript library that provides a lot of the functional programming support that users of Prototype.js or Ruby expect, but does so by introducing a single object, the underscore: “_”. It’s a partial adaptation of many of the utility methods from the Prototype.js project, in order to use them without touching the prototypes of any of the core Javascript objects. This is important because it means you can use Underscore right alongside jQuery without having to worry about conflicting variables, redundant functionality, or differences in expected coding style. For Javascript 1.6 compliant browsers, it delegates to the native implementations of the functional methods, so that you can enjoy them at full speed where available.

This release has a much smaller scope than the previous one, but we think that it’s a helpful bit of code for any team that takes Javascript seriously — especially in conjunction with jQuery. The production version of the library weighs in at only 4kb when gzipped, a relatively fat-free download that you can add to your page without worrying too much about load time. We’re using it to develop our “journalist workspace”, the area in which researchers can search and organize documents, and visualize the relationships between them. We hope you find it useful.

Written by Jeremy Ashkenas

October 28th, 2009 at 9:51 am

Posted in Code

Two Dozen Media Outlets and Others Join Us as Beta Testers

without comments

We have some more news: About two dozen news and other organizations have signed on as beta-testers. They’ll be contributing documents to DocumentCloud, and giving us feedback as we work out the kinks. It’s a wide-ranging list:

  • ACLU National Security Project
  • Arizona Republic
  • The Atlantic
  • Center for Democracy and Technology / OpenCRS
  • Centre for Investigative Journalism, City University London
  • Center for Investigative Reporting / California Watch
  • Center for Public Integrity
  • Chicago Tribune
  • Dallas Morning News
  • The Investigative Reporting Workshop at American University
  • The New Yorker
  • NewsHour
  • MinnPost
  • MSNBC
  • Mother Jones
  • Public.Resource.Org
  • St. Petersburg Times
  • Sunlight Foundation
  • Voice of San Diego
  • Washington Post
  • WNYC

These organizations will be joining our original set of contributors — The New York Times, ProPublica, Talking Points Memo, The National Security Archive, and Gotham Gazette — all of whom will of course be working with us during the testing too.

Earlier this morning we also announced that we’re working with Thomson Reuters’ OpenCalais service to extract and make available information from the documents contributed to DocumentCloud.

E-mail us if you’d like to participate in the testing. We’re interested in any organization, including non-profits and academic institutions, that have obtained documents during their research.

If you’re new here, the goal of DocumentCloud is to super-charge investigations by making documents, and the information in them, easier to find and share. Readers will be able to search documents on DocumentCloud and then will be pointed to the documents themselves on contributing organizations’ Web sites. (Here’s a FAQ with more details.)

Finally, you can keep following our progress on this blog — or follow us on Twitter, or RSS. And we’re releasing our code each step of the way.

Written by Scott

September 24th, 2009 at 8:00 am

Posted in Code