If you’re coming to IRE’s annual data journalism conference March 5-8 in Atlanta, be sure to stop by and say hello to the DocumentCloud team!
We’ve made a few ways for you to learn more about the platform, tell us your ideas, and hear about what’s next for DocumentCloud:
— Saturday at 3:20 p.m., join us for a hands-on class, “Reporting and Presentation with DocumentCloud.” Get to know the suite of tools that DocumentCloud offers to help you better organize, analyze and present public documents.
— Sunday at 11:20 a.m. in the Demo Room, we’ll offer “Advanced DocumentCloud: Examples and Suggestions.” Take a deeper dive into DocumentCloud and its API and bring your ideas for features you’d like to see in the platform. Plus, see some of the best uses of DocumentCloud in the last year!
— Throughout the conference, you can find the DocumentCloud team on hand at its booth outside the conference rooms. We’ll be set up to give you a demo, answer questions about accounts and hear how you use the platform.
During the conference, reach us by email or Twitter. Find Ted Han via email@example.com and @knowtheory; Anthony DeBarros via firstname.lastname@example.org and @anthonydb; and Lauren Grandestaff via email@example.com and @lgrandestaff.
We’ll look forward to seeing you!
If you’d dropped into the DocumentCloud workspace in Columbia, Mo., at the start of January, you’d have found at least two things: a team actively avoiding the single-digit temps outside the office and a whiteboard that we frequently filled with ideas, photographed for posterity, erased and filled up again.
We ended up ignoring the temps. The ideas generated enough heat to last us until summer – and beyond!
So, what’s ahead for 2015? We’ve spent the last year researching and reflecting on what you want, both as a buildup to the recent $1.4 million Knight Foundation grant and to make sure you’re as happy as possible with the service. We want to be sure the platform is fast, reliable, and enables you to do your best work.
There’s a legacy to continue. DocumentCloud was founded by and for journalists to support in-depth reporting around public documents. Today, more than 900 news organizations worldwide use the platform. Whether it’s publishing the documents related to doubts about the guilt of a Texas man executed for murder or the grand jury testimony regarding the death of Michael Brown in Ferguson, Mo., journalists use DocumentCloud to give their readers a first-hand view of the primary source documents they gather.
We plan to build from that success. The DocumentCloud team’s expanding, and we’re locking in a roadmap to grow and improve the platform. We’ve recently hired a director of product development, and we’ve just posted a job description for a front-end developer. With a bigger team and lots of focus, we believe that by the end of the year, you’ll see substantial improvements and expanded offerings that will maintain DocumentCloud’s place as an essential reporting and presentation tool.
So, here are some things we’re going to do to help you:
— Improved processing. If you use DocumentCloud, you upload documents to the platform. You want them processed quickly, without errors, so you can get right to publishing or annotating. Over the last year, we’ve made substantial improvements to our processing cluster and sped up imports of popular documents uploaded by multiple users. And we’re glad that you’re noticing the results. We have more work planned – look for a blog post soon detailing the changes.
– Go mobile. We know (and so do you) that more of your readers view more of your content on their phones. So, we’re planning mobile-specific changes to the viewer to improve scrolling, zooming and the experience in general.
– More storytelling tools. We’re exploring ideas for expanding the display options for the viewer, such as presentation templates, social media sharing, notes displays and more. Many of you have asked for oEmbed support, and we’re looking closely at that!
– Telling DocumentCloud’s story. We’ll bring you more blog posts like this, keeping you up to date on our progress and listening for your ideas. We’ll also tell you more about how to make better use of all the site has to offer — such as deeper search options — and highlight great examples of your storytelling.
– Expanded reach and premium offerings. We want to make sure DocumentCloud is going to be available to journalists for years to come, and one way to do that is for the platform to begin generating revenue. This goal is part of the Knight grant, and so we’ll be exploring options – premium features, opening the tool to additional types of users on a fee basis, donations, and other ideas.
Beyond those, we have many more ideas, among them better feedback on document processing, ability to rotate pages, better organization of the site and workspace, and batch processing options.
That’s a lot to chew in one year, but with our team expanding – and, we hope, with continued contributions from the open source community – we’re pretty excited about the prospects.
As always, let us know your thoughts! You can reach us on UserVoice, Twitter, or email.
You can live where you’d like and work flexible hours. We’re a nimble, tightly knit team that works remotely. We scrum daily and stay connected via Slack and video chats. Our code is open source, so your commits to our Github will be seen by the growing community that depends on our platform.
You’ll join DocumentCloud at a significant time. We build a civic platform that more than 1,000 news organizations worldwide use for the public good, and we value transparency, accountability and the preservation of a free press. Our tools have been used to investigate and publish stories from the grand jury decision in Ferguson, Missouri, to the Guardian’s NSA spying leaks. We collaborate with organizations like the Washington Post, The Associated Press and Mozilla’s OpenNews fellows to build better ways to present the news, and you’ll have the chance to be part of the community exploring this intersection of news, data and technology.
You’ll be at the center of several of our immediate goals: we plan on improving the experience of reading documents on mobile devices; developing templates for displaying documents; and refreshing our website and user workspace. And we’re interested in your creative input as we navigate DocumentCloud’s path forward.
Here’s what we’re looking for:
In this front end role, you’ll focus on DocumentCloud’s Backbone/Rails components, which let users upload, organize, research and embed documents – making it easy for users to highlight and publish the newsy parts of documents. You’ll develop for desktop and mobile web across browsers, creating a smart, easy workflow for journalists investigating and publishing documents, as well as our readers in the public.
We’d like you to have some or most of these skills and qualities:
— Strong ability to collaborate and communicate with a distributed team.
— Independent problem-solver who values learning, keeps current on new trends, but knows how to pick the right set of tools for a problem.
— Able to write clean, well-documented code; you know your way around Git, and your Github account shows activity.
— Familiar with Ruby/Rails and SQL/databases or willing to dive in and gain enough knowledge to contribute when needed.
To apply, please contact us at firstname.lastname@example.org
DocumentCloud is a technology service created to help journalists and improve transparency in journalism. Our platform helps journalists find and highlight interesting information in the primary source documents used in their reporting.
Our document research & publishing platform is used by the likes of the New York Times, ProPublica, the Guardian, NPR, La Nacion, Al Jazeera, Homicidewatch and many more. Journalists have used our tools to support their innovative, award winning and world changing reporting (you can see examples on our featured reporting page).
With the help of a $1.4 million grant from the Knight Foundation, we’re expanding our team so that DocumentCloud can grow as an organization and sustain itself as an effective and efficient tool for our users.
We’re searching for a product manager who has experience working with software development teams. You need to be comfortable with human-centric design processes (even in an informal way). As manager you’ll work with our development team to plan, prioritize and keep us focused on making good products. Together with our team, you’ll be responsible for developing products & features in line with our mission and our goal of generating enough revenue to sustain DocumentCloud.
We care deeply about contributing to the civic sphere whether that’s better tools for reporters, or open source software and patterns. In fact the DocumentCloud platform itself was written as open source components, many of which have gone on to be adopted in other industries including Backbone.js, Underscore.js, VisualSearch, Docsplit as well as many others.
So, we hope you’ll join us to make great products with an eye to make the world greater too. You’ll be able to join us from wherever you live, as DocumentCloud operates as a distributed organization. Officially we’re based out of the Columbia, Missouri offices of our parent organization, Investigative Reporters & Editors, but we primarily function through Slack, IRC and Hangouts/Skype.
Email us at email@example.com by October 31st!
By Tom Meagher, data editor at The Marshall Project,
Emily Yount, interaction designer at The Washington Post,
Matt DeLong, national digital projects editor at The Washington Post
and Ted Han, lead developer at IRE/DocumentCloud
On Aug. 3, The Marshall Project, a new nonprofit journalism organization focused on criminal justice issues, published an investigation in partnership with The Washington Post that revealed new evidence raising doubts about a high-profile Texas execution.
TOM: Our reporter, Maurice Possley, began working on this story months before most of the rest of our newsroom at the Marshall Project was even hired. By the time we were able to start helping, the story was mostly reported, so we dove into the documents to bring ourselves up to speed.
The case against Cameron Todd Willingham — who was executed in Texas for the murder of his three daughters — had been written about extensively over the last 22 years, but a lot of new information was uncovered, and it was all in the documents. We knew we wanted to be able to explore and highlight the correspondence that cast this case in an entirely new light. DocumentCloud was clearly the answer.
In the course of his reporting, Possley, who has covered this case for more than a decade, was given access to copies of dozens of primary source documents that tell the backstory of Cameron Todd Willingham and the informer who helped convict him. In filing its grievance with the State Bar of Texas against the former prosecutor in the case, the Innocence Project had acquired these documents and assembled them into a series of appendices. They gave us eight PDF files that added up to nearly 400 pages. We used DocumentCloud to stitch them all back together into one large file.
We then combed through the appendices and dozens of other records of court testimony and correspondence. As we saw the various typefaces and handwriting styles that made up the key passages, we knew we wanted to use DocumentCloud notes to present excerpts directly in the story.
Matt: I started working on the story in earnest a couple of weeks before it published. We were very excited about having so many primary-source documents to enrich the narrative. The Post has been using DocumentCloud for years, but we’ve long been frustrated by one of its biggest limitations: it isn’t mobile-friendly. This isn’t really DocumentCloud’s fault; these scanned documents are a set size, so when you scale them down, at some point words will become too small to read.
We had seen how the New York Times addressed the problem, by putting up the text and linking to the original document in DocumentCloud. That’s totally logical and fine if the words are all that you care about, but in this case we have official letters and handwritten notes between the characters in the story. The pages themselves are interesting, and many readers will want to see them with their own eyes.
We decided at the outset that however we ended up displaying the documents we included in the story, they had to be responsive. But this meant we’d have to come up with our own hack. Emily and I had both been thinking about this problem individually for a while, and we had some time to work on it, so we decided to try to figure out a solution that we could use in this project.
Emily: At the time of publication, DocumentCloud’s note embed code already resized and repositioned the note based on the width of the DC-note-container div, so I knew we only needed to solve for when the note is wider than the note embed and the right side of the note is cut off (see image below).
The original coordinates, width and height allow us to determine how wide the note is in relation to the document image and resize the document just enough to make the note 100% of the embed, instead of the document image 100% of the embed. This helps with readability by making the text as large as it can be. At times, depending on the width of the note and the size of the text, there will still be readability issues, so cropping the annotations carefully and testing to make sure they are readable is really important.
Here’s an example from the Willingham story of a responsive note on an iPhone 5:
Ted: We were thrilled when Ben Chartoff (OpenNews fellow at the Washington Post) reached out to put Emily in touch with us.
We believe deeply in DocumentCloud as an open source project as well as the service to which journalists post documents relevant to the public interest. Emily and Matt’s motivation to extend the behavior which DocumentCloud already provides and to share their code back is exactly the kind of effort we love to see and encourage.
Technology in the world of news is a means toward the end of better reporting. Especially in competitive industries like ours, an open source ethos around the tools we all share is an avenue for us work together to improve the state of all reporting. Anyone who solves an issue for their own needs can help to solve that issue for everyone.
In that spirit we were excited to incorporate Emily’s code into our own. To do so, we spun our note code off into its own repository to make it easier for anyone to contribute (you can find the code on Github as documentcloud-notes). Then with the Washington Post’s & Marshall Project’s stories as a basis we began incorporating the changes. Ultimately, we ended up rewriting much of Emily’s code in the process, but what she had written served as the design criteria to anchor the code we wrote.
Our responsive notes code is already live on DocumentCloud now, and journalists needn’t take any additional steps to use it. Any embedded note from DocumentCloud will now behave responsively.
We have a lot of projects going on at DocumentCloud and to serve those goals we’re looking for others to join us! For those who may be unfamiliar with our project, we’ve included the full details below.
DocumentCloud is a web based platform allowing journalists to upload, analyze, annotate, and publish primary source documents. We want give journalists the tools to show their audience their source material, not just tell them about it. In addition to the newsrooms worldwide who use DocumentCloud, our open source software projects, such as Backbone.js, Underscore.js, Docsplit, and Jammit, are relied upon by companies such as LinkedIn, Walmart, Foursquare and more. DocumentCloud is run by Investigative Reporters & Editors.
What DocumentCloud is building
- DocumentCloud is growing fast, and we’re looking to accelerate that pace by expanding our tools into other languages beyond English. In the next year we’ll adapt our platform to accommodate multi-language OCR, search indexing, and entity extraction tools.
- DocumentCloud always looks for new ways to present documents and engage readers. We are extending DocumentCloud’s document viewer and annotation tools so that readers can make their own comments and notes on documents.
DocumentCloud is looking for someone with a combination of the following skills
Things we like and hope you like too!
Literate programming; Extracting libraries from app code; Polyglot programming; Web standards; Journalists; Natural Language Processing
Investigative Reporters & Editors is based in Columbia, Missouri, on the University of Missouri’s campus. DocumentCloud is comfortable operating with a distributed team.
You can email us at firstname.lastname@example.org
“That crux of it is that it makes it safe to drop a document into WordPress and be certain that it won’t be broken,” said Amico, of the plugin he authored as an application developer for NPR’s StateImpact project.
Users can post their document using a DocumentCloud button on the post’s toolbar in Visual mode, or a shortcode in HTML mode.
The plugin also allows users to configure the width and height of the document viewer in an administrative panel in the Settings menu. The “Full-width” option is designed to make the document viewer as wide as the post content.
“We wanted to give reporters the ability to make a post that is basically just the document,” Amico said.
The plugin is available on GitHub and the Project Argo site. StateImpact is a spinoff project of Project Argo, which are both run by NPR. Amico said he wanted to help bloggers use DocumentCloud because he used it as a reporter for PBS NewsHour. “At some point, my plan is to put it into WordPress’s plugin directory,” Amico said.
Samantha Sunne volunteers with DocumentCloud at its hub in Columbia, Missouri. She studies investigative and multimedia reporting at the University of Missouri.
Jeremy Ashkenas, engineer emeritus of DocumentCloud, opened the conference with the first State of the Backbone keynote. We’ve recorded it for those who weren’t able to join us at BackboneConf.