Latest Updates: Our Blog

Category: IdeaLab

When Documents Are Challenged

Posted
Apr 5th, 2012

Tags
Documents,IdeaLab

Author
Mark Horvit

Last week, DocumentCloud received a complaint seeking the removal of a collection of emails posted by journalists with the Australian Financial Review. The emails involved a company called NDS, which hired a law firm to try and have the documents pulled from public view. This kind of thing is rare, but it happens. This case in particular has a couple of wrinkles that make it unusual, and it presents a good opportunity to remind all of our members that DocumentCloud has policies and options in place that allow you to keep all documents processed through our service available to the public for as long as you desire.

I’ll detail those below. But first, a little context.

DocumentCloud was created as a 501c3 nonprofit organization and remains so as part of Investigative Reporters and Editors (IRE). The service is offered free of charge and all expenses and manpower are covered through grants and IRE’s normal operations. We provide a suite of tools that allow you to analyze and publish documents, and we don’t control what you post. And, we don’t have a large budget to fight legal challenges to items you post.

There have been only a handful of cases in which DocumentCloud has received legal challenges to material posted on the site, where we now host more than 4 million pages.

Typically those challenges have involved allegations of copyright violation. In every case, we have contacted the posting news organization and asked them how they would like to handle the complaint. Our terms of service detail how we handle those cases, using a process based on the Digital Millennium Copyright Act (DMCA). DocumentCloud is a neutral party hosting content on behalf of users and is protected by the DCMA’s safe-harbor provisions. If we receive a formal complaint, we contact the organization that uploaded the material. If they assert their right to publish, the documents remain public and the matter is resolved between the complainant and the posting organization.

We also offer an alternative for organizations that would prefer to host their own documents and still use DocumentCloud’s viewer. A number of news organizations have chosen this option, for a variety of reasons. We make document data and our viewer code available for download to journalists directly through our workspace. Downloading a viewer will provide a news organization with an html file that is functionally indistinguishable from the viewers we host.

It’s also worth noting that all of our software is available free and open source to any journalist or software developer who wishes to use or improve upon it (and members of both groups have done so).

The case that came up last week involving the Australian Financial Review presented some new issues. The company filing the complaint over the posted emails alleged a variety of issues, but didn’t cite the DMCA. AFR opted to take down the documents rather than provide us with a letter asserting their right to publish and offering indemnity for DocumentCloud. The company said it did so because it believes that action is more appropriate in Australia, so it did not wish to become involved in a U.S. dispute with NDS. They opted to download the viewer, and AFR plans to repost the documents using the DocumentCloud software.

Dealing with such challenges is an inevitable byproduct of hosting documents. If you have questions about our policies or suggestions on how we can improve our service, please get in touch; my email is mhorvit@ire.org.

Welcome Aboard, Ted Han

Posted
Sep 21st, 2011

Tags
IdeaLab,People

Author
Amanda Hickman

Back in August, we announced that we’d be welcoming a new lead developer, but he’s been on the job two weeks already and we managed to forget to say anything like “Welcome aboard!”

Well, better late than never. Continue reading »

Much Ado About Obama’s Birth Certificate on DocumentCloud

Posted
May 11th, 2011

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

As we watched traffic stats skyrocket last month as newsroom after newsroom uploaded President Obama’s birth certificate to DocumentCloud and then embedded it, my reaction was hardly one of joy.

Why on Earth is a birth certificate more interesting than, say, the pages and pages of receipts documenting some outrageous meals (15 steaks, two orders of fish and a lamb chop — for five people submitted by National Grid to the Long Island Power Authority after their Hurricane Earl cleanup)?

I like to think these are the documents we built DocumentCloud for — that we’re here to give a leg up to reporters scrutinizing spurious spending reports (reporting that prompted a formal state investigation) or documenting patent dishonesty and the unusual lengths one California town went to in order to conceal extraordinary salaries paid to city officials.

Vote of Confidence

obama birth certificate.jpg

Forgive me if I was underwhelmed by all the attention that the birth certificate got. My esteemed colleagues, however, helped me see the bright side of the flurry. For one thing, it was fast. Within minutes, 10 different newsrooms had uploaded the birth certificate and embedded it.

That says a lot: It says that when they have something they know their readers want to see, reporters turn to DocumentCloud. That’s a huge vote of confidence in us. Plus, we didn’t falter under the weight of the tenfold increase in traffic — that’s solid architecture for you. We built DocumentCloud with the hope that we could improve the way newsrooms share source documents with their readers, and at that, we’re thrilled to be succeeding.

Increasingly, DocumentCloud is a resource for breaking news. When the news broke that Osama bin Laden had been killed in a town called Abbottabad, a search for “Abbottabad” turned up some pretty rich stuff, most notably that a former Gitmo detainee led U.S. authorities to the Pakistani town back in 2008.

New Feature Roundup

Meanwhile, we’re still listening to our users and looking for more ways to make DocumentCloud easier to use and to help reporters give their readers the documents behind the story.

We’re looking forward to seeing what our users do with our new tool that lets you embed a single annotation, and we’re excited to watch the great uses newsrooms have put document sets to.

From embedding documents accumulated over two decades spent covering an Oregon commune where things went horribly awry to sharing the documents detailing the Federal Reserve’s support for ailing financial institutions, or the background material from coverage of a profoundly embarrassed local philanthropist, reporters seem to be getting the hang of embedding document sets.

So we have a question for the reporters who have been using DocumentCloud already: What would have made this even easier for you?

Discuss Much Ado About Obama’s Birth Certificate on DocumentCloud on PBS’s IdeaLab.

How DocumentCloud Helped Award-Winning Investigations

Posted
Apr 14th, 2011

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Investigative Reporters and Editors (IRE) announced their medal winners this week, and we were impressed to see that both winners wove DocumentCloud into their winning reporting. Since 1979, IRE has honored outstanding investigative work with their annual awards. This year they honored a Los Angeles Times series on outrageous salaries in one of California’s poorest towns and a collaboration between International Consortium of Investigative Journalists and the BBC for a report on the global asbestos trade.

Breach of Faith

Los Angeles Times was awarded an IRE Medal for Breach of Faith. An investigation of financial impropriety in a small town revealed what turned out to be a dramatic case of corruption and mismanagement in the quiet city of Bell, California. Los Angeles Times reporters uncovered exorbitant city salaries (including compensation packages topping the million dollar mark) and errors in financial reporting that were more than just
mistakes.

breach of faith image.jpg

They used DocumentCloud to post the falsified salary information that city administrators had provided to concerned citizens years earlier, and the subsequent indictment of the administrators who provided those false records.

Dangers in the Dust

dangers in the dust.jpg

International Consortium of Investigative Journalists and the BBC share an IRE Medal for Dangers in the Dust: Inside the Global Asbestos Trade. Throughout their year-long reporting project, they added all manner of document source material to a growing archive of documents.

Great work, and congratulations to both teams.

More Prize-Winning Work

We’ve seen other prestigious journalism awards go to DocumentCloud users, too. Last month, Alex Richards and Marshall Allen (who has moved from the Las Vegas Sun to ProPublica since) were honored with the Goldsmith Prize for Investigative Reporting, given each year by Harvard’s Shorenstein Center to honor investigative reporting that “promotes more effective and ethical conduct of government, the making of public policy, or the practice of politics” for their in-depth report on hospital care in Las Vegas.

Alongside each of the five stories that made up that award-winning coverage, Richards and Allen used DocumentCloud to share their source documents with readers. It’s great reporting and exactly the kind of work we imagined we could help support when we set about building DocumentCloud.

At least one finalist for the Goldsmith prize also put DocumentCloud to excellent use. ProPublica, NPR’s “Planet Money” and Chicago Public Radio’s “This American Life” collaborated on Betting Against the American Dream, an alarming expose of Wall Street’s role in exacerbating their own meltdown. ProPublic used DocumentCloud to detail correspondence with their uncooperative subjects.

Discuss How DocumentCloud Helped Award-Winning Investigations on PBS’s IdeaLab.

DocumentCloud Enables Public Searches, Embeddable Sets

Posted
Mar 31st, 2011

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

We quietly opened DocumentCloud’s catalog to public searches in January, and we’ve been working since to do more with the great documents that reporters have added to our catalog.

When Vancouver Sun investigative reporter Chad Skelton asked if there was a way to automate display of the growing cache of documents he was retrieving from the city’s ferry authority, the best answer we could offer was to point his readers to a search for the DocumentCloud project he was stashing them in. Our goal from the outset has been to help news organizations make their own substantive reporting more engaging online, not to drive traffic to DocumentCloud.org. Moreover, Chad was far from the only reporter asking us to make it easier to embed whole document sets. Homicide Watch even built a JavaScript widget to embed their sets. So the latest DocumentCloud feature, built out by our own Samuel Clay, is embeddable document sets.

Any DocumentCloud user can embed pretty much any set of documents on their site. It works whether or not the user’s own newsroom originally published the documents. This means that the Vancouver Sun can embed their ferry documents, and that any user can embed a set of documents matching a search for Scientology. Documents initially published by the New Yorker will open on newyorker.com while documents that were published by ProPublica will open there. Someone could also embed the complete set of public documents that match a search for former Illinois governor Rod Blagojevich:

More Tools

We’ve added plenty more tools to help newsrooms get the most out of DocumentCloud, too. A dozen different “How do I …” questions led us to dramatically increase the options available when users publish documents. Plus, a brainstorming session with American Public Media’s Andrew Haeg in the halls of this year’s Online News Association conference led to a tool newsrooms can use to share documents with reviewers outside of the newsroom.

Our users continue to help us make the most of the tools we’ve built, too. It’s been a few weeks since the unstoppable Chicago Tribune news apps team released dcupload, but the python script, written against our API, makes it a whole lot easier for DocumentCloud users to upload a great heap of documents in one fell swoop.

Discuss DocumentCloud Enables Public Searches, Embeddable Sets on PBS’s IdeaLab.

DocumentCloud Passes Major Milestone: 1 Million Pages Uploaded

Posted
Mar 1st, 2011

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

DocumentCloud’s Jeremy Ashkenas collaborated on this post.

It has been less than a year since DocumentCloud began adding users to our beta. Late Monday morning, a user uploaded our millionth page of primary source documents.

The thousands of documents in our catalog have arrived in small batches: five pages here, twenty there. The vast majority of the 65,000 documents that those million pages comprise remain private, but we’re fast closing in on 10,000 public documents in our catalog.

Broad Appeal

Journalists are using DocumentCloud to publish all sorts of documents, including these:

Remaking History

Documents in our catalog reach back into the past, as well. In 1970 Ruben Salazar was killed by police while covering an anti-war protest in east Los Angeles. A story rife with controversy, questions, and suspicions, his death became a rallying point in the Mexican American civil rights movement. Forty years later — after refusing a public records request for documents that might shed some light on the circumstances of his death — the Los Angeles County Sheriff’s Department agreed to turn the files over to the Office of Independent Review.

While Los Angeles Times reporters waited for the report, they assembled their own folio of early clippings on Ruben Salazar. Readers can review FBI files obtained by the Times in 1999 and LAPD records on the department’s repeated clashes with the journalist as well as a draft of the report prepared by the Office of Independent Review.

Join the Cloud

You can browse recently published documents by searching for “filter: published” or read up on other searches you might want to run. Here’s hoping that the next year brings millions more pages, and more great document-driven reporting.

Discuss DocumentCloud Passes Major Milestone: 1 Million Pages Uploaded on PBS’s IdeaLab.

DocumentCloud Passes Major Milestone: 1 Million Pages Uploaded

Posted
Mar 1st, 2011

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

DocumentCloud’s Jeremy Ashkenas collaborated on this post.

It has been less than a year since DocumentCloud began adding users to our beta. Late Monday morning, a user uploaded our millionth page of primary source documents.

The thousands of documents in our catalog have arrived in small batches: five pages here, twenty there. The vast majority of the 65,000 documents that those million pages comprise remain private, but we’re fast closing in on 10,000 public documents in our catalog.

Broad Appeal

Journalists are using DocumentCloud to publish all sorts of documents, including these:

Remaking History

Documents in our catalog reach back into the past, as well. In 1970 Ruben Salazar was killed by police while covering an anti-war protest in east Los Angeles. A story rife with controversy, questions, and suspicions, his death became a rallying point in the Mexican American civil rights movement. Forty years later — after refusing a public records request for documents that might shed some light on the circumstances of his death — the Los Angeles County Sheriff’s Department agreed to turn the files over to the Office of Independent Review.

While Los Angeles Times reporters waited for the report, they assembled their own folio of early clippings on Ruben Salazar. Readers can review FBI files obtained by the Times in 1999 and LAPD records on the department’s repeated clashes with the journalist as well as a draft of the report prepared by the Office of Independent Review.

Join the Cloud

You can browse recently published documents by searching for “filter: published” or read up on other searches you might want to run. Here’s hoping that the next year brings millions more pages, and more great document-driven reporting.

Discuss DocumentCloud Passes Major Milestone: 1 Million Pages Uploaded on PBS’s IdeaLab.

Which Metrics Matter for Measuring User Engagement?

Posted
Jan 18th, 2011

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Gail Robinson’s recent post on traffic in a post-loyal era got me thinking about measures of web traffic and, more broadly, how to measure the impact of non-profit journalism.

I certainly don’t disagree with Gotham Gazette‘s decision to pass on providing Yahoo with free content. There’s no good reason that Yahoo can’t create a lively community without wholly reprinting Gotham Gazette’s excellent original reporting free of charge.

There are probably good reasons that it would complicate Gotham Gazette’s work to license stories to a commercial outlet like Yahoo Local, too: As a non-profit, the local policy publication regularly livens up stories by illustrating them with images licensed only for non-commercial use, or by independently licensing photos that aren’t available under a Creative Commons license at all. Sorting out the images that can be re-licensed to a commercial entity like Yahoo isn’t a trivial project, especially not for a small local publication.

It doesn’t look like Gotham Gazette is alone in declining Yahoo’s advances — Yahoo Local’s New York City page was recently dominated by pleas for piety from someone in Georgia:

yahoolocal_crop.png

And I definitely appreciate the impulse to own your traffic. One of the reasons DocumentCloud is thriving right now is that we’ve been very careful to ensure news organizations aren’t handing traffic off to us. They own their traffic. They can keep track of their readership numbers, evaluate efforts to increase site visits, and slap as many ads and extra navigation elements on embedded documents as they want. Even so, they want more: Users and prospective users alike regularly ask for better metrics on the documents they’re publishing.

Meaningful Metrics

Oakland Local, a project as commendable for its willingness to share insights as for its local coverage and community, has been quite open about the stats they look at as meaningful: Page views, unique visitors, average time on site and returning traffic. Returning visitors made up half their traffic when they spoke with Michele McLelland last spring. They also keep an eye on where their readers are coming from — they’re interested in how much of their audience is reading from Oakland.

When I was at Gotham Gazette, in addition to those basic web analytics, I kept a close watch on our comments — their vibrancy struck me as a good measure of participation.

So what do you measure?

So I’m curious: Do you look for measures of your impact beyond the kind of numbers you show to advertisers? Share your thoughts in the comments below.

Discuss Which Metrics Matter for Measuring User Engagement? on PBS’s IdeaLab.

Altering Docs? Now There’s a Tool for That in DocumentCloud

Posted
Dec 9th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

When we embarked on the DocumentCloud project, tools for altering documents were the furthest thing from our minds. After all, a responsible journalist doesn’t tweak source documents!

But one of the first papers to embed material using DocumentCloud needed to do just that. The Chicago Tribune accompanied their coverage of a troubled foster home with a collection of letters and court orders. Though the documents offered an excellent illustration of the state child services agency’s lax oversight and slipped follow-ups, they were predictably full of personal information about children in the foster care system, individual agency staff names and other personal and identifying details about private individuals that the Tribune opted to omit from their reporting. That decision, however, left the news apps team replacing the whole stack of letters multiple times before the package was finally ready to post.

A tool, right inside of DocumentCloud, for replacing, removing and re-ordering the pages of a document would have helped them a lot.

When the “PBS NewsHour” shared a century old hand-written Mark Twain essay, our OCR tools were not nearly up to the task of reading his handwriting. NewsHour transcribed the 10-page essay by hand and we worked with them to replace the text stored in DocumentCloud and displayed on the embedded letters.

By the time that Memphis’ Commercial Appeal wanted to run a lengthy series of handwritten letters from Abdulhakim Mujahid Muhammad, a young Memphis-born man who opened fire on a military recruiting center in Little Rock last summer, we at DocumentCloud were busy supporting nearly 200 newsrooms — offering to hide the text tab was the best we could do.

What NewsHour and Commercial Appeal really needed was a tool, right inside of DocumentCloud, with which to edit the text of each document.

And so, we’ve assembled what we think is a sweet suite of tools to let you re-order pages, insert new ones, delete old ones and edit the text that will appear in your embedded document. Check out our user guide to see how it all works. We welcome your bugs, feedback, rants, raves and, as ever, your documents.

Discuss Altering Docs? Now There’s a Tool for That in DocumentCloud on PBS’s IdeaLab.

Last Minute News Challenge Tips: Tell a Story, Be Realistic, and More

Posted
Nov 24th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Planning to spend the long weekend finalizing your Knight News Challenge application? It’s too late for my favorite bit of advice (“don’t wait until the last minute!”), but as someone who’s been involved with three different winning projects, I like to fancy that I’ve got got some insight into what makes a good project.

A half dozen prospective applicants have sat down with me to workshop their News Challenge ideas, and I think I’ve helped them think through their projects to get them to a more viable place. The application process isn’t hard, but you do need to give some sincere thought to your project or you’re just wasting your time. Here’s the advice I keep giving people: Continue reading »

DocumentCloud Users Make Ballot Design An Election Issue

Posted
Oct 27th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

When we make lists of the kinds of source documents users can upload to DocumentCloud, they can get pretty long. DocumentCloud is court filings, hearing transcripts, testimony, legislation, lab reports, memos, meeting minutes, correspondence. I can say with absolute confidence that in all of our planning, “ballots” never once came up as the sort of document a news organization might want to annotate for readers. Our relentlessly creative users have shown us otherwise.

This summer, the Memphis Commercial Appeal rounded out its guide to August’s primary elections with a sample ballot. Their digital content editor told us that many readers who’d missed the sample ballot in the print edition turned to the version online as primary day approached. Earlier this month, they added the general election ballot to that guide.

New York Ballots

WNYC, New York City’s NPR affiliate, also published a few ballots this summer. In an effort to comply with a 2002 federal law that mandates significant updates to voting systems in each state, New York City introduced paper ballots for the 2010 primary election, replacing the city’s famously arcane voting machines. One look at the new design and everyone was up in arms, proclaiming its absurdity, but WNYC actually invited a group of ballot design experts to review the city’s new ballots. Their findings: the ballot was confusing.

Design for Democracy works to increase civic participation, in part through a ballot design project that aims to make voting easier and more accurate. WNYC used Design for Democracy’s feedback to annotate a sample ballot on their blog, offering readers vital voting advice.

When the city released sample ballots for November’s general election, a local think tank pointed out that the instructions erroneously advise voters to mark the oval above their candidate’s name. In fact, the relevant ovals appear below candidate’s names. WNYC highlighted the issue by embedding a sample ballot on their blog. Apparently the “oval above” language was mandated by state law. Don’t believe me? See for yourselfWNYC posted the legislation, with the relevant passage highlighted.

From now on, my laundry list of things DocumentCloud catalogs will most definitely include ballots.

Discuss DocumentCloud Users Make Ballot Design An Election Issue on PBS’s IdeaLab.

The Best User Feedback Comes From Watching and Listening

Posted
Oct 12th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

ProPublica used DocumentCloud to develop an excellent story they published Friday. I’d planned to write it up, but Krista Kjellman Schmidt, the news applications editor who worked on the story, put it much better than I ever could have. Here’s the opening of her post:

On Oct. 8, we published an investigation examining how a judicial opinion in a pivotal lawsuit brought by a Guantanamo detainee vanished, only to be replaced weeks later by an entirely different opinion. At the center of our reporting are two documents representing separate versions of that same opinion: the original opinion written by Judge Henry H. Kennedy, and a second opinion quietly put in the original’s place more than a month later.

Read the rest of the post to enjoy the tale of these two important documents. What Schmidt doesn’t mention, however, is that I just happened to have been camped out in ProPublica’s offices last week while she was putting the story together. House guests and a nearby demolition project conspired to drive me from my home office, so I had an unusual vantage point on their story.

At one point, while I was savoring an excellent apple cake (I should decamp to Exchange Place more often!), I overheard the news applications team comparing notes on how best to prepare thumbnail images for a chart that was to accompany the investigation, and I realized that we had failed to alert our users to some of the bells and whistles under the hood of DocumentCloud.

For instance, we’ve got those thumbnail images ready already. No need to break out firebug and manually resize graphics!

Which brings me to my real point: The only way we know what our users need from us is by watching them try to use DocumentCloud and listening to them describe the use cases at the outer edges of what we’d expected.

Discuss The Best User Feedback Comes From Watching and Listening on PBS’s IdeaLab.

DocumentCloud Helps Newspapers Bring Transparency to Government

Posted
Sep 7th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Since we last updated readers on DocumentCloud’s progress, we’ve made it much easier to upload a lot of documents at once, and introduced a related documents search that uses data about names and places provided by OpenCalais to find documents that are probably related to the one you’re looking at. We’ve also added a bit more contextto the data we help reporters comb through. Most of this work is happening inside the gates of the DocumentCloud workspace, but it is resulting in some lively reporting. For example…

Using Documents to Tell the Story

This summer, as the federal 5th Circuit Court of Appeals prepared to hear arguments in a challenge to the University of Texas’s affirmative action policy, Texas Tribune complemented its coverage of the case with nearly 200 pages of annotated court documents, including the original district court ruling, the university’s appellate brief, as well as that of the plaintiffs in the case.

The Las Vegas Sun incorporated quite a trove of documents into its series on hospital care in Las Vegas. Readers were invited to browse everything from Department of Health and Human Services reports to individual records, right along with the Sun’s reporters. When they say that hospital-acquired infections cost the country $30 billion per year or account for close to 100,000 deaths, they back each number up with original documents.

The Columbia Missourian annotated the city budget and took a local blogger to task for exaggerating Columbia, Missouri’s cash reserves.

When Texas Governor Rick Perry challenged reporters to find anyone who can out-work him, Texas Tribune posted the governor’s May 2010 schedule alongside that of Florida’s Gov. Crist, New York’s Gov. Paterson and California’s Gov. Schwarzenegger and invited readers to help them skim over a hundred pages of briefings, receptions and photo ops for stories deserving of a closer look.

The Washington Post supplemented its reporting on the cozy relationship between the oil industry and the federal agency assigned to regulate them with an annotated report on the prospects for “Moving beyond Conflict” between regulator and regulated. Their document cache also included reports outlining just how cozy things had gotten by 2008. As Emily Keller pointed out in Free Government Info, a transparency project, documents like these give more transparency to journalism itself.

New Features in the Testing Lab

We’re also hard at work fine tuning the document viewer, transforming it into something that users could reasonably plug into a template with a narrower content column. Thus far folks have been stuck with a full page viewer. We haven’t fully rolled it out yet, but we’ve worked with a couple of our beta testers to implement it already.

Iowa State has a new men’s basketball coach, and the Des Moines Register included all 14 pages of his contract to their coverage of the finer points contained in it. Among the unusual clauses? Hoiberg can walk away if the university decides to increase academic standards for student athletes beyond the NCAA’s minimum.

Meanwhile, at the Santa Fe Reporter, Alexa Schirtzinger opted not to publish tables of information right inside her story on elder abuse in New Mexico, but she did use her staff blog to share the data that she had such a hard time tracking down. An annotation highlights the numbers that showed her that New Mexico fields more abuse complaints per nursing home bed than any other state.

DocumentCloud watchers will notice that they posted the contract right on the same page as Randy Peterson’s writeup instead of displaying the document in a full page. We’ll be making tweaks like this a lot easier for all of our users. In the meantime, if you’re skilled at the art of reverse engineering JavaScript, you can view the source of the Register’s story (or the Reporter’s) to see just how they toggled the sidebar or zoom on those documents.

Discuss DocumentCloud Helps Newspapers Bring Transparency to Government on PBS’s IdeaLab.

DocumentCloud Helps Arizona Paper with Annotated Immigration Law

Posted
Aug 3rd, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

We opened the DocumentCloud floodgates less than six months ago and we’re still working hard to make DocumentCloud a better tool. We’re rolling out improvements at a healthy clip including SSL support, better documentation, and support for cross-newsroom collaboration. We continue to listen to feedback from our really incredible crop of beta testers (who now number close to 500!).

There are nearly 100 newsrooms participating in the DocumentCloud beta and requests are still pouring in. We’ve been doing a fair amount of outreach and more is in the works, but it turns out that our users are our best advocates: After John Addams in Great Falls, Montana, blogged about his experiences with DocumentCloud we were deluged with requests from Montana news organizations large and small.

Uses in Arizona, Chicago, Memphis

The really great stories about how reporters are using DocumentCloud continue to surprise all of us.

Not long after Arizona’s governor signed that state’s now infamous immigration law, the Arizona Republic published the bill in full, complete with annotations by a local law professor. Republic reporters told us that traffic to the annotated legislation outpaced the paper’s popular entertainment guide in its first weekend, and continues to draw traffic as the bill stays in the news.

Meanwhile, in Chicago, reporters at the Tribune have been uploading each document and transcript entered into evidence in former governor Rod Blagojevich’s corruption trial — the documents are just part of their extensive coverage of the trial.

In Memphis, the Commercial Appeal published a sample ballot alongside their voter guide.

These are just a few of the great uses reporters have put DocumentCloud to — there are many more great stories already out there and plenty of new ones on the way.

Discuss DocumentCloud Helps Arizona Paper with Annotated Immigration Law on PBS’s IdeaLab.

Gathering Examples of Collaboration in Investigative Reporting

Posted
Jun 23rd, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

I recently attended the Investigative Reporters and Editors 2010 conference and ended up talking with Astrid Gynnild, a post doctoral research associate in the Department of Information Science and Media Studies at the University of Bergen in Norway. She’s researching collaborations in investigative journalism.

I showed her some of my favorites: The Los Angeles Times, ProPublica, ABC News and Washington Post working together on Disposable Army was one of them. Frontline, ProPublica and the Times-Picayune’s coverage of police shootings that were never investigated after Hurricane Katrina is also great reporting, but she’s looking for more. Are you collaborating internationally? On an enterprise reporting team? Do you work closely with one or two other reporters in your own newsroom? She’s curious to hear about it. I am, too. What makes collaboration work when you’re an investigative reporter?

Share your thoughts and experiences in comments. Or, if you don’t want to comment here, you can write to her directly. She’s astrid.gynnild@infomedia.uib.no

Discuss Gathering Examples of Collaboration in Investigative Reporting on PBS’s IdeaLab.

In Need of a DocumentCloud for Video, Data

Posted
Apr 30th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Brainstorming the next brilliant News Challenge project? I’ve got two for you, and you’ve got until fall to noodle over them.

As the program director for DocumentCloud I spend a lot of time talking to journalists, writers and researchers about what DocumentCloud is and, often, what it isn’t. DocumentCloud is great for documents. It is a repository of primary source texts and a great set of semantic analysis tools for text.

Whether you want to use our annotation tools for reporting jujitsu, as ProPublica did when the subject of an extensive report offered only “no comment” on nearly every question put to them, or to put broken news back together, as the Chicago Tribune did with the former Illinois governor’s attempts to subpoena President Obama, DocumentCloud is great. For documents.

What About Video, Data?

I’m consistently surprised, though, by folks who want to know how DocumentCloud handles video. Or spreadsheets. It doesn’t. I suppose you might want to annotate a spreadsheet. But rows and columns? Functionally, our software has no idea what those are. Same with pictures and sounds.

You’ll need a different name for your project, but I can tell you now that there are people looking for something like DocumentCloud for data and for video.

Data sets are everywhere. Reporters have them, advocates have them. Transportation Alternatives spends a lot of time FOIA-ing data like the location and severity of pedestrian and cyclist injuries on New York City streets. I happen to know them unusually well (my better half has worked there for almost a decade) but they’re not unique. I’ve had similar conversations with reporters all over the country, reporters with interesting data sets and nowhere to put them.

Video confuses me a little more, but it might just not be my thing. Nonetheless, almost every time I give a presentation, whether it is to a newsroom or a conference, someone asks, “What about video?” My answer is pretty simple: no, no video. No plans to support video. Video and audio material is not text, and figuring out a great way to handle audio clips or videos is just not part of our project. But I get asked about it often enough that I can tell you for sure: There’s something there.

Reblog this post [with Zemanta]

Discuss In Need of a DocumentCloud for Video, Data on PBS’s IdeaLab.

Documents Pouring in as DocumentCloud Goes Beta

Posted
Apr 11th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Eagle-eyed followers of the DocumentCloud Twitter feed have already picked up on the fact that we began adding users to our beta last month.

We made a strategic decision to peg our beta to NICAR’s March 2010 computer assisted reporting conference, where we knew we’d be able to gather a sizable group of just the sort of investigative reporters we hope to support with DocumentCloud, and get them excited about using our tools to do more with their documents. Nothing beats hands-on support when you’re using a new tool. Plus, we identified dozens of quick fixes we could make after watching over journalists’ shoulders as they explored DocumentCloud.

Close to 300 documents

In the month since NICAR, we’ve added more than 150 users who’ve uploaded a cumulative 54,000 pages of text, and made close to 300 documents available in DocumentCloud. Our repository is already home to police reports from New Orleans, a confirmation hearing transcript that adds context to coverage of Justice Stevens’ resignation, and disaster preparedness plans from Haiti. There’s even a collection of emails that document how some hedge funds not only saw the mortgage crash coming, but wagered on the collapse and won big. (The hedge fund that these reporters investigated argues it never had the hands-on role ascribed to them; that’s in DocumentCloud, too.) Eventually, anyone will be able to connect with those documents right through our website.

Want to be part of the beta? Get in touch and tell us a bit about the documents you’re working with.

We’re still adding beta testers and actively listening to the users we’ve got as we prioritize and refine our to do lists, but we think we’re off to a great start.

Reblog this post [with Zemanta]

Discuss Documents Pouring in as DocumentCloud Goes Beta on PBS’s IdeaLab.

So You Want to Try Crowdsourcing?

Posted
Feb 26th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

It is no secret that I’m always on the hunt for great crowdsourcing projects. We’re still learning a lot about what “the crowd” can tackle and what it can’t, but turning to your readers (listeners, community, neighbors) is a great way to foster civic participation because it gives people a stake in the news.

What I really want to know, though, is what makes crowdsourcing sing? Sunlight’s Transparency Corps project to slice Kentucky legislative voting records has been sitting less than half complete for months now, while the Brooklyn Museum’s “posse” is madly tagging, flagging and organizing projects digital photos of the museum’s permanent collection. The only reward offered by Brooklyn Museum, besides a crash course in art history, are the series of short movies of… well, I won’t spoil the surprise. Go tag some art and see for yourself.

Field Guide for Making Crowdsourcing Work

WNYC Labs has been working on an incredibly useful field guide to crowdsourcing. For the harried beginner, they offer 10 simple tips for getting started. Set the tone, and be an editor, is some of their advice, which is to say: Crowdsourcing doesn’t have to mean that you turn over the microphone and walk away. Looking for advice on adapting your editorial process or hoping for some case studies? Done, and done. There is no secret sauce to crowdsourcing. No holy grail. But this field guide is the most comprehensive guide that I’ve seen.

I’d been planning to write something as well about the outcome of ProPublica’s Super Bowl Blitz — they asked readers to ask their representatives if they planned to be at the Super Bowl, and then request that they bring their cameras to the game. The only result worth reporting on so far seems to be that a handful of planned congressional super bowl parties were scrubbed or rebranded. Their reporting is worth a read if you don’t think of the NFL as a major campaign donor (though it doesn’t look like the crowd played a big role in reporting the story). I haven’t asked ProPublica how they measured the impact the project had on reader engagement. If a group of readers who don’t otherwise spend much time thinking about campaign finance suddenly decide they’re heavily invested in campaign finance, maybe there’s something to it. Hooking substantive news coverage to big sporting events is never an easy assignment.

Discuss So You Want to Try Crowdsourcing? on PBS’s IdeaLab.

New Tools for Mapping News

Posted
Feb 17th, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Want to illustrate a story by displaying data on a map? Don’t have a team of whiz kids at your fingertips? One good option has long been IBM’s Many Eyes. Their maps, however, stop at the state level. Not especially helpful if you cover local politics!

I haven’t struggled too much with maps lately, but a tweet from Sunlight Labs’ Clay Johnson caught my eye this morning, nonetheless. I think that what got my attention was that Clay directed his nudge to Nathan Yau, a bottomless well of great data visualization insights and tutorials if ever there was one. He recently offered up a thorough tutorial on making thematic maps of US counties using free tools.

So what was Clay on about?

ClearMaps, which Sunlight released this week, is “an ActionScript framework for interactive cartographic visualization” and it is open source software. You’ll probably still need a reporter with some programming chops to get the most out of it–it isn’t nearly as plug and play (or upload and display) as Many Eyes, but it still should make it much easier for news organizations to tell stories better. What do you think? Is this what you were looking for?

Discuss New Tools for Mapping News on PBS’s IdeaLab.

How Could News Organizations Manage Documents Better?

Posted
Jan 21st, 2010

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

How are you handling primary source material on your website?

OaklandLocal is summarizing a new report on a shootout in March that left five people dead. They use Scribd to embed reports directly on their site, but can’t provide annotations.

California Watch is looking at what campaign season generosity bought for agribusiness in the Sacramento-San Joaquin River Delta. They put together a great Flash widget that highlights noteworthy portions of the documents they reviewed, but they had to sit down with a highlighter, circle relevant passages, and then scan each document for the site.

ProPublica, the Los Angeles Times, ABC News and the Washington Post are collaborating to report on civilian contractors injured in Iraq that are now struggling to get badly needed health care. But if you want to read the Congressional report that found covering private contractors has proved quite profitable for insurance companies, you’ll have to download the whole report.

If the reporters behind these stories had DocumentCloud at their fingertips, we could have saved them a decent amount of work, made the documents they wanted to share more accessible, and invited deeper reader examination. And that is just the beginning. We don’t know yet what that examination might yet yield. Stay tuned!

As we work, we really do want to know: What are you doing with documents now? Can you point out a recent story that you’d like to look at more closely? Please share your thoughts and feedback in the comments.

Discuss How Could News Organizations Manage Documents Better? on PBS’s IdeaLab.

Big Apps Are Here

Posted
Dec 18th, 2009

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

I’ve already voiced my own suspicion that New York City’s Big Apps competition is a deft end-run on an actual open data bill in New York City. Nonetheless, some 85 applications built on the city’s currently public data sets are available now to explore and vote on through early January. They include a handful of legislator lookup tools and an unexpected number of park spot finders. There’s also a graffiti finder designed for the curious dual purpose of helping steer both Wildstyle fans and the city’s Anti-Graffiti Unit paint trucks straight to new throw-ups.

Other gems that I’ve been watching include ProPublica’s great and very offline crowdsourcing efforts as part of their coverage of police shootings in New Orleans in the days following Hurricane Katrina.

Discuss Big Apps Are Here on PBS’s IdeaLab.

DocumentCloud Releases More Code, Continues to Attract Developer Interest

Posted
Dec 10th, 2009

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

A public beta of DocumentCloud, one that journalists can kick the wheels on and upload documents to, won’t be ready for a few more months, but work is continuing apace in our corner of the cloud.

We’ve released a handful of code that comprises some of the components of our big picture, and it is great to see how well received our work has been by the Ruby and JavaScript communities. Last week we hit a little milestone: more than 1,000 developers are watching DocumentCloud projects on Git Hub, which is pretty cool. The advantage for us is that many of these developers are actually trying out our software releases and helping us make them stronger.

Gregg Pollack included a great review of CloudCrowd in a recent episode of his show, Scaling Rails. CloudCrowd will still be Greek to the truly non-technical readers out there, but if you have enough of a handle on software development to wish you understood”scaling” better, his review just might help.

Our latest release, Docsplit, is a command-line utility and Ruby library for splitting documents into distinct components such as raw text (which you need for searches), page thumbnails, and document metadata (details like the document’s author or the number of pages it contains).

Splitting documents apart is a pretty key functionality for DocumentCloud: everything else DocumentCloud does depends on the presence of one or another of these pieces. Docsplit got a lot of attention when we released it on Monday — and we’re all looking forward to seeing what other folks do with it.

Discuss DocumentCloud Releases More Code, Continues to Attract Developer Interest on PBS’s IdeaLab.

Staffing Up DocumentCloud

Posted
Nov 11th, 2009

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

A few months ago (three, to be precise), I quietly announced that I’d be leaving Gotham Gazette for parts unknown. I wasn’t making that up about “parts unknown,” but my announcement did get a few conversations started. The most interesting one turned out to be with Eric, Aron and Scott, who persuaded me to join DocumentCloud as their program director.

I’m pretty thrilled to be joining them: I care a lot about software freedom, improving access to information, and making great software accessible to small organizations. DocumentCloud gives me a great opportunity to approach access to information from a different angle, and to have a hand in developing undeniably excellent tools that will be (some already are) accessible to large and small news organizations alike.

I just started this week, but you’ll be hearing more from me as we proceed, about both our challenges and successes. The first challenge was realizing that it was time to bring someone on board to work with our document partners and help Jeremy Ashkenas, our lead developer, find beta testers to help keep him moving forward. I like to think we handled that one well, and I’m looking forwarded to more challenges to come.

Discuss Staffing Up DocumentCloud on PBS’s IdeaLab.

Introducing Switch, A News Game About New York City’s Energy Gap

Posted
Sep 30th, 2009

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

Our latest (and last, for now) news game, Switch, is live. It is no Energyville but we think it is pretty awesome. Not only is it live, the source code and installation instructions are already available.

With gadgets guzzling evermore energy, New York City faces a looming energy gap. New Yorkers will have to cut back on our electric use or start generating a lot more power. Our game lets people explore the options that are on the table, along with a few that aren’t. Should the city ban air conditioning? Harness the tides? Go nuclear? Warning: the game is addictive.

Switch is a concentration-style game that deals each player 18 pairs of cards, each representing an opportunity for the city to conserve or produce electricity. As players match pairs, they’re asked to decide whether each policy initiative is a good fit for New York City. At the end (or whenever the player grows bored!) players “flip the switch” to see how the measures they’ve accepted would add up against the city’s predicted 2030 energy needs.

We worked with Will James of Tekimaki, whom we met through his very cool subway map project at onNYTurf which, in addition to being both early and awesome, is the only online NYC map I know of that is available in Estonian.

We’ve learned a lot about gaming and news games over the last two years, and a lot about building them on the cheap. More on that after you’ve all played Switch!

Discuss Introducing Switch, A News Game About New York City’s Energy Gap on PBS’s IdeaLab.

Improving Access to Information is One Way to Make Reporting Cheaper

Posted
Sep 9th, 2009

Tags
IdeaLab

Author
Amanda Hickman

Cross posted from PBS Idealab.

When he’s not toasting escapism, our tireless editor Mark Glaser has been asking why reporting costs so much. I can’t tell you much about investigative reporting (a $400,000 product of which started the conversation), except to say that six figure salaries do add up. But I can tell you that when it comes to local reporting, improved access to information could make a big dent in the expense of getting a story written.

If you want to take a look at distribution of discretionary funds by the New York City Council, you have to start with a 400-page PDF full of tables of information. And then you need someone on hand who knows how to pull tables from a PDF into a workable spreadsheet. That, or you need a pencil sharpener and a calculator. And while highlighters and pencil sharpeners are not blowing holes in anyone’s reporting budget, the hours required to process this information certainly are. The situation is absurd: this information started out in a database and there’s no reason that anyone — whether they’re a reporter, civic gadfly or deli manager — should have to jump through hoops to put it back into a database.

Of course, those hoops are just for information the city already makes public. If you want to know where pedestrians are being hit by cars, or how parking placards are distributed in a city where curbside space is valuable and abuse of parking privileges is well documented, you’d better know who has that data and have someone on hand who can write an iron tight FOIL request. Want to know about the distribution of lead poisoning cases in the city? For that you’ll need lawyers.

FOILs take time, which means money. Lawyers, too, tend to want money for their time. One way to make information cheaper is to step up the data requirements in local transparency laws. New York City is considering legislation that would amend existing public records laws to require that information be made available and that it “be presented and structured in a format that permits automated processing.” That is to say, raw data. Just publish it — don’t make us ask.

With the law itself lingering in committee, the mayor’s office announced a competition, NYC Big Apps, for applications that will use city data. Perhaps the idea is to deflect attention from the bill, which the mayor is no fan of. The contest, which offers a prize that includes dinner with the mayor, is not really a substitute for making data available.

Steve Romalewski, a pioneer of web-based GIS and community mapping projects, is also skeptical of the contest. He notes that it offers no explicit guarantee that any datasets will be fully available for the long haul, and that no one has offered any explanation of why just 80 data sets are included.

Romalewski also rattles off a good list of datasets that are currently only available on a per-request basis — which means, among other things, that you need to know they are there. His list includes the types and locations of small businesses, green spaces, recreational spaces and housing violations, as well as interim multiple dwellings (aka lofts) throughout the city. He also points out that land use data currently must be licensed from the city at a rate of $1,500 per year if you want all five boroughs: not a trivial expense to small projects like Gotham Gazette.

Romalewski argues that we shouldn’t have to ask for data–that most of what city agencies aggregate belongs in the public domain. I’m with him there, and curious as I am to see what comes out of NYC Big Apps, I’m not convinced that the contest going to help put city data in the public domain in New York City.

I don’t know whether or not the legislation currently sitting in committee is the answer we need, but I do know that New York City is not alone in needing far better access to the data that civil servants use and aggregate in the course of their work. I also don’t think that simply providing us with the raw data is enough — but at least it’s the bare minimum we need to fill the role of government watchdog.

By the way, if you want that list of under-publicized city data, skip to the comments in Romalewski’s post.

Discuss Improving Access to Information is One Way to Make Reporting Cheaper on PBS’s IdeaLab.