Latest Updates: Our Blog

Category: Documents

When Documents Are Challenged

Posted
Apr 5th, 2012

Tags
Documents,IdeaLab

Author
Mark Horvit

Last week, DocumentCloud received a complaint seeking the removal of a collection of emails posted by journalists with the Australian Financial Review. The emails involved a company called NDS, which hired a law firm to try and have the documents pulled from public view. This kind of thing is rare, but it happens. This case in particular has a couple of wrinkles that make it unusual, and it presents a good opportunity to remind all of our members that DocumentCloud has policies and options in place that allow you to keep all documents processed through our service available to the public for as long as you desire.

I’ll detail those below. But first, a little context.

DocumentCloud was created as a 501c3 nonprofit organization and remains so as part of Investigative Reporters and Editors (IRE). The service is offered free of charge and all expenses and manpower are covered through grants and IRE’s normal operations. We provide a suite of tools that allow you to analyze and publish documents, and we don’t control what you post. And, we don’t have a large budget to fight legal challenges to items you post.

There have been only a handful of cases in which DocumentCloud has received legal challenges to material posted on the site, where we now host more than 4 million pages.

Typically those challenges have involved allegations of copyright violation. In every case, we have contacted the posting news organization and asked them how they would like to handle the complaint. Our terms of service detail how we handle those cases, using a process based on the Digital Millennium Copyright Act (DMCA). DocumentCloud is a neutral party hosting content on behalf of users and is protected by the DCMA’s safe-harbor provisions. If we receive a formal complaint, we contact the organization that uploaded the material. If they assert their right to publish, the documents remain public and the matter is resolved between the complainant and the posting organization.

We also offer an alternative for organizations that would prefer to host their own documents and still use DocumentCloud’s viewer. A number of news organizations have chosen this option, for a variety of reasons. We make document data and our viewer code available for download to journalists directly through our workspace. Downloading a viewer will provide a news organization with an html file that is functionally indistinguishable from the viewers we host.

It’s also worth noting that all of our software is available free and open source to any journalist or software developer who wishes to use or improve upon it (and members of both groups have done so).

The case that came up last week involving the Australian Financial Review presented some new issues. The company filing the complaint over the posted emails alleged a variety of issues, but didn’t cite the DMCA. AFR opted to take down the documents rather than provide us with a letter asserting their right to publish and offering indemnity for DocumentCloud. The company said it did so because it believes that action is more appropriate in Australia, so it did not wish to become involved in a U.S. dispute with NDS. They opted to download the viewer, and AFR plans to repost the documents using the DocumentCloud software.

Dealing with such challenges is an inevitable byproduct of hosting documents. If you have questions about our policies or suggestions on how we can improve our service, please get in touch; my email is mhorvit@ire.org.

Printing Document Annotations

Posted
Sep 26th, 2011

Tags
Documents

Author
Ted Han

We’ve been hard at work during our short Columbia, Missouri hackathon at DocumentCloud’s new home at the Investigative Reporters & Editors office. As a result we’ve rolled out a new feature for readers and journalists to print annotations made on documents.

Journalists have been publishing documents through DocumentCloud for a while now as well as annotating documents both for readers and for their own story writing processes. We think it’s just as important for DocumentCloud to make story writing quicker and easier as it is to help readers find primary source material.

So, when Marshall Allen of ProPublica told us that he would like to try using DocumentCloud to take his story notes, we did our best to help out. As a result, you can now select one or more documents in the workspace and choose “Print Notes” under the “Publish” menu.

This way you can annotate your sources in DocumentCloud, and have a single copy of all your research ready at hand for your copy editor or read when your flight attendant announces that all power switches should be in the off position.

And readers can find a “Print Notes” link in the sidebar footer of the document viewer too.

We hope this will help readers and journalists alike note and collect information in the format the best suits their workflows. Happy Printing (and remember to recycle)!

FAQ: Should I Try Again?

Posted
Apr 1st, 2011

Tags
Documents,Workspace ,

Author
Amanda Hickman

Every once in a while, DocumentCloud gets hit with the kind of document stash that really slows us down. We can take a lot, but if one newsroom finally gets a 25,000 page FOIA turned over to them and another gets a hold of 30,000 pages of documents for a breaking news story about the on the same afternoon, that’s a volume that will tax our servers.

We recently established a “fast lane” to ensure that smaller documents don’t have to get in line behind behemoths, but that doesn’t help if you’ve got a few MB of documents about a local scandal — you’ll still have to shuffle into line with the big sets. Continue reading »

Embed a Set of Documents

Posted
Mar 30th, 2011

Tags
Documents,Workspace ,

Author
documentcloud

Sets of documents are nothing new to DocumentCloud.

The Las Vegas Sun published hundreds of pages of legislation, emails, court filings and medical records alongside their award winning package on hospital care in Las Vegas. The Sun‘s Marshall Allen assembled each document collection by hand to produce that page. Plenty of other newsrooms have used our API to do likewise. Even with the API it isn’t trivial to assemble and publish a set of documents.

Some document sets are living creatures that continue to grow: Chad Skleton at The Vancouver Sun has been adding documents retrieved from the local ferry authority’s website to a growing cache of public records on DocumentCloud. The only way to ensure his readers will find new documents as they roll in, is to point the public straight to DocumentCloud to find ferry authority FOIAs. It should be easier to embed that growing set of public documents right at The Vancouver Sun.

Samuel Clay has been working very hard to make that possible for every DocumentCloud newsroom. Continue reading »

Improved Document Collaboration

Posted
Mar 9th, 2011

Tags
Documents,Workspace , ,

Author
Amanda Hickman

From inviting a law professor to help Arizona readers understand recent legislation to asking some top notch designers to review New York’s new ballot, DocumentCloud users have already found some great ways to bring experts from outside the newsroom in, and we thought it was time to make it much easier to do just that.

We spent some time at ONA last year, brainstorming with the good folks from the Public Insight Network — they really helped us distill this into a workable feature. We’re looking forward to seeing PIN newsrooms do some great reporting aided by this new feature. Continue reading »

A Million Pages

Posted
Feb 28th, 2011

Tags
Documents

Author
Jeremy Ashkenas

This morning, not quite one year since we opened our beta to newsrooms at NICAR 2010, the millionth page of primary source material was uploaded to DocumentCloud. Reaching this milestone so soon is a tribute to our users and the amazing document-driven investigative reporting you have published over the past year.

Most of the thousands of documents in our catalog have arrived in small batches: five documents here, 20 there, most often accompanying a breaking story. Take a look for yourself: browse through recently published documents by searching for “filter: published” or read up on other searches you can run.

Now is a good moment to highlight some notable recent stories:

Last week, Center for Public Integrity launched a series of articles on hidden hazards at oil refineries in the United States. Readers of Regulatory Flaws, Repeated Violations Put Oil Refinery Workers at Risk can review a dozen citations and court filings that the Center’s journalists used in the reporting.

Sunday, The New York Times published the first installment of an investigation into lax regulation of natural-gas drilling across the US, accompanied by a large cache of E.P.A. and industry documents.

The Seattle Times reported last week on evidence of financial abuse in Seattle public schools, based on documents released by state auditors. The documents detail over-billing, intimidation, and ethics violations that add up to $1.8 million in potentially fraudulent expenses.

Thanks for a great first year, and here’s hoping that the next year brings millions more pages, and more great document-driven reporting.

Going Public

Posted
Jan 26th, 2011

Tags
Documents,Workspace

Author
Amanda Hickman

With close to 200 newsrooms contributing documents and thousands of documents in our catalog, we decided it was time to open DocumentCloud to public searches.

Wondering who is still covering the Deepwater Horizon oil spill? Try a search for “deepwater horizon” organization: transocean, and see documents that both reference the rig by name as well as the drilling contractor, Transocean. Then, click on the “Entities” tab to see more data provided by OpenCalais’ entity extraction.

Did you miss Memphis Commercial Appeal‘s coverage of Ernest Whithers? Catch up with a search for
group: commercial-appeal withers, and find every document uploaded by reporters in the Commercial Appeal newsroom that mentions Whithers by name. Curious to see the annotations journalists have been making on the documents they’re sharing? Try a search for filter: annotated and you’ll skip any documents that were published without annotations.

There’s plenty more you can do with DocumentCloud’s search syntax. Check out our primer and try a few searches.

We’d love to know what you think, and what you’ve found.

PS. Finding bugs rather than documents? We want to know about those, too.

Embedding Documents on Your Site (UPDATED)

Posted
Jun 4th, 2010

Tags
Documents

Author
Jeremy Ashkenas

Over the past few months, you might have noticed a handful of news organizations using embedded documents to complement their reporting.

This morning, we’re opening up the ability to embed documents to all of the newsrooms participating in DocumentCloud. When you log into your workspace, you’ll notice a new menu: “Publish”.

From here, you can grab an embed code (a short snippet of HTML) that can be dropped onto a web page to create a document viewer. You may be familiar with such snippets from embedding YouTube videos: this works in a similar fashion. For guidelines on setting up a template and other help, check out our documentation.
If you still have questions about the process, we’re listening at support@documentcloud.org.

Note: we know you’re eager to host documents yourself, and you can do that now, but we recommend that you stick with embedded documents so that you can take advantage of bug fixes and other improvements to the viewer. We don’t know yet whether we plan to offer embedding as a long term service. Keep in mind, as well, that this is still a beta. As described in our terms, our capacity to commit to uninterrupted service is limited, as is our liability if service is interrupted in some way.

For those news organizations that want to host documents on their own servers, we’re now offering that as an alternative too. Click on “Download Document Viewer” to get a zipped up folder with all the code, text, and images bundled together as a web page. Drop the folder into any web server (no special software required), and voila, it’s online.

Search of the document’s text is provided by DocumentCloud as a service, but everything else in the package is completely static — just HTML, images, JavaScript and CSS. If you choose to use this alternative, there is a caveat: If you edit your annotations, or want to make any changes to the document, you’ll have to download it again.

Here at DocumentCloud, we’re looking forward to seeing the great reporting you do with embedded documents — don’t forget to use the workspace to add a “Related Article” link.

Documents Rolling In

Posted
Apr 12th, 2010

Tags
Documents

Author
Amanda Hickman

Reblogged from the PBS IdeaLab.

Eagle-eyed followers of the DocumentCloud Twitter feed have already picked up on the fact that we began adding users to our beta last month.

We made a strategic decision to peg our beta to NICAR’s March 2010 computer assisted reporting conference, where we knew we’d be able to gather a sizable group of just the sort of investigative reporters we hope to support with DocumentCloud, and get them excited about using our tools to do more with their documents. Nothing beats hands-on support when you’re using a new tool. Plus, we identified dozens of quick fixes we could make after watching over journalists’ shoulders as they explored DocumentCloud.

In the month since NICAR, we’ve added more than 150 users who’ve uploaded a cumulative 54,000 pages of text, and made close to 300 documents available in DocumentCloud. Our repository is already home to police reports from New Orleans, a confirmation hearing transcript that adds context to coverage of Justice Stevens’ resignation, and disaster preparedness plans from Haiti. There’s even a collection of emails that document how some hedge funds not only saw the mortgage crash coming, but wagered on the collapse and won big. (The hedge fund that these reporters investigated argues it never had the hands-on role ascribed to them; that’s in DocumentCloud, too.) Eventually, anyone will be able to connect with those documents right through our website.

Want to be part of the beta? Get in touch and tell us a bit about the documents you’re working with.

We’re still adding beta testers and actively listening to the users we’ve got as we prioritize and refine our to do lists, but we think we’re off to a great start.