Latest Updates: Our Blog

A summer day’s worth of DocumentCloud updates

Posted
Aug 10th, 2015

Tags
Workspace

Author
Anthony DeBarros

Hello and happy summer! We’ve been busy here at Team DocumentCloud, using the weeks since meeting many of you at IRE 2015 and SRCCON to focus on building a stronger platform and getting in position to ensure the long-term sustainability of DocumentCloud.

There’s lots in motion, and so here’s a quick update on the highlights:

Milestone: 2 million docs

Thank you for keeping us busy! In July, the total number of files uploaded to DocumentCloud passed 2 million, and our platform now holds more than 27 million pages of the documents you’ve gathered. The numbers keep growing as more news organizations join us – more than 1,400 worldwide right now – and as more people use our API for bulk uploads. Keep those documents coming (and we always appreciate a tip at support@documentcloud.org if you’re planning a big drop)!

A mobile-optimized viewer embed

Whenever we chat with our users, the most-requested feature for DocumentCloud is a better experience for viewing our embeds on phones. Well, we’ve heard you and have been busy developing documentcloud-pages, a new responsive embed type that displays a page with minimal chrome but also allows navigation through the entire document. We’re aiming to launch an early version in late August or September; if you’d like to contribute code or issues, please visit the project repository. In fact, we’re excited that the folks at La Nacion are already using the new embed in their Doc2Media project.

Get tips and updates in your mailbox!

We’re here to help, and soon we’ll launch two newsletters filled with info on how to get more out of your DocumentCloud account. News & Tips will highlight new features (and ones you might not know about) plus tips for publishing, collaborating and working with documents. App Developers will offer information for developers working with our API or building news apps based on our open source components. Both newsletters also will highlight great uses of DocumentCloud from around the world. You can sign up now.

An OpenCalais update

Since the launch of DocumentCloud, we’ve used Thomson Reuters’ OpenCalais API for our entity extraction. There’s a new version of the API, and we’re migrating to it this month. There won’t be any immediate difference in how we display entities, but we’re looking at whether the new API may offer us some new features. Stay tuned.

Welcome, Clay Selby, to the team!

There’s a new face at Team DocumentCloud’s daily scrum: Clay Selby of Austin, Texas, joined us in July as a part-time developer (thereby doubling our Texas staff). Clay’s the founder of email marketing startup SocialRest.com and brings a good dose of entrepreneurial experience along with his coding chops. Initially, Clay’s been working on moving us to the new OpenCalais API, but he’ll also be focusing on a lot of our back-end processing improvements.

Out and about

We’ve seen many of you in the last few months at various places on the map, from IRE 2015 in Philadelphia – where we held a hands-on class and talked to hundreds who stopped at our table – to SRCCON in Minneapolis. We had the fortune to show off DocumentCloud to students at Medill/IRE’s National Security Journalism Data/Watchdog Workshop in Washington, D.C., and we checked in with a couple of our user newsrooms as well. If we’re in your area and you’d like to get together, let us know!

A sustainable DocumentCloud

Finally, but not least, our current Knight Foundation grant directs us to find ways to make DocumentCloud financially sustainable. Since launch in 2010, thanks to Knight, our service has been offered to journalists for free. In the spirit of improving journalism and making reporting more transparent, we’re intent on maintaining a level of free access to DocumentCloud for journalists while developing a pricing model around features of the platform. In addition, thanks to a new account signup page, we’re hearing from many outside of journalism who’d like to use DocumentCloud, and we’re exploring that option. In the weeks ahead, we’ll be reaching out to many of you to discuss our plans.

Thanks for reading, and as always we have several ways for you to get in touch or follow our progress in several ways:

A new language for our Workspace: Danish

Posted
May 18th, 2015

Tags
Workspace

Author
Anthony DeBarros

Good news – or, if you speak Danish, gode nyheder! Starting today, Danish-speakers can set the DocumentCloud workspace to default to their native language, thanks to translation help from Nils Mulvad, editor at Kaas & Mulvad and associate professor at The Danish School of Media and Journalism.

The addition of Danish increases the number of workspace translations to five. Along with English, we also support Spanish (thanks to work by Fernando Diaz), and Russian and Ukrainian (thanks for both to Roman Kolgushev).

Widening our language support remains an ongoing mission at DocumentCloud, part of our commitment to making our platform accessible to journalists around the world. As we wrote in March when we added OCR support for three additional languages, DocumentCloud language support falls into three categories: text search, entity extraction and workspace translation. We also have work under way to support additional languages in the document viewer.

Thanks to recent work by our development team, we’ve made it easier for collaborators to translate our workspace into more languages – and we’re looking for help! If you’re interested in helping bring DocumentCloud’s workspace to your language, please email us at info@documentcloud.org.

Easier publishing with WordPress and oEmbed

Posted
May 5th, 2015

Tags
Documents

Author
Ted Han

Today, we’re making it easier for you to embed documents and notes with an updated WordPress plugin and an oEmbed service to power it.

The WordPress plugin – which builds upon an earlier version developed by Chris Amico for NPR’s StateImpact project – adds the ability to embed notes as well as documents using shortcodes. And we’ve updated our embed wizard so it will generate the WordPress shortcodes for you like this:

[documentcloud url="http://www.documentcloud.org/documents/1699074-sb0101-05-enrs/annotations/210824.js" ]

Which embed like this:

 

The plugin is powered by our new oEmbed API. It's our next step in helping you integrate DocumentCloud embed codes into your content management system so embedding documents and notes is as simple as pasting in a URL.

You can install our plugin right now by visiting its WordPress page. Developers interested in adding simple embedding with our oEmbed API can find its documentation in our help pages.

Read on for more details:

WordPress

Since 2011, WordPress users have been able to embed documents on their blogs thanks to a plugin created by Chris Amico for NPR’s StateImpact project.

Chris’ plugin let users embed documents with shortcodes by translating the shortcodes into HTML embed codes.

With Chris’ help, and input from Adam Schweigert of the Project Largo team, we’ve released a new version of the plugin that adds:

  1. Note Embeds: You can now embed individual notes as easily as documents. Just pass a note URL into the shortcode with the url attribute.
  2. Raw URL Support: You can now paste the URL for a document or note onto its own line, and the plugin will translate that into an embed.
  3. oEmbed Support: Embed codes are now fetched from our oEmbed service rather than generated internally, so they’ll always be up to date.

Full installation and usage instructions are available on the plugin page.

Note for users of the existing plugin: Because we’re releasing a whole new plugin, prior installations of Navis DocumentCloud won’t automatically update. You’ll have to deactivate/delete Navis DocumentCloud from your site and install the new plugin. Sorry! This should be a one-time process, and future updates will be delivered through the normal WordPress update mechanism.

oEmbed

We’ve added an oEmbed API to make it easier for developers to get embed codes for documents and notes. oEmbed has become a standard for easily embedding Web content, and it has long been one of our users’ top feature requests. Now, instead of having to reverse engineer the format of our embed codes, developers can just send a request to the DocumentCloud API to ask for a correctly formatted embed code.

CMSes that support oEmbed can turn DocumentCloud URLs for documents and notes directly into embeds.  So users can be assured that their CMS won’t eat or mangle an HTML/JavaScript embed code, and developers can use existing oEmbed tools to support DocumentCloud’s current embeds as well as future types of embeds we have in the works.

If you’d like your CMS to support embedding documents as easily as DocumentCloud’s WordPress plugin, you or your developers can read more about our oEmbed service in our API help pages.

Thanks for reading.  Follow us on twitter for more updates.

DocumentCloud adds five to group of advisers

Posted
Apr 16th, 2015

Tags
People

Author
Anthony DeBarros

DocumentCloud is pleased today to welcome five media and technology professionals to its group of advisers. The expanded group, which includes two of the platform’s founders, will help guide DocumentCloud as it develops new offerings and plans for sustainability.

DocumentCloud, which is a service of Investigative Reporters and Editors, serves thousands of journalists worldwide with tools for organizing, researching, annotating and publishing documents gathered while reporting. The expanded advisory group is one of several efforts under way as part of a 2014 Knight Foundation grant that is enabling DocumentCloud to add staff, improve the platform’s efficiency, and implement new features.

The advisers will help the team weigh questions related to technology, market opportunities, product development and revenue models.

“We’re excited to have a strong group of experts who are willing to share their expertise with us,” said Mark Horvit, executive director of IRE. “Their guidance will play a key role in helping us chart the future of DocumentCloud.”

The DocumentCloud advisers includes:

Penelope (Penny) Muse Abernathy, the Knight Chair in Journalism and Digital Media Economics at the University of North Carolina and a journalism professional with more than 30 years of experience as a reporter, editor and media executive. @businessofnews

Matthew de Ganon, Senior Vice President of Product Management & Commerce at Softcard, the mobile wallet joint venture of AT&T, T-Mobile and Verizon. @deganon

Eric Gundersen, CEO of Mapbox, a leading provider of custom online mapping solutions. @ericg

Jacqueline Kazil, an Innovation Specialist working on cross-agency platforms for the federal government. @JackieKazil

Scott Klein, an assistant managing editor at ProPublica and a co-founder of DocumentCloud. @kleinmatic

T. Christian Miller, a member of the Investigative Reporters and Editors board of directors, is a senior reporter at ProPublica, which he joined in 2008. @txiatianmiller

Aron Pilhofer, Executive Editor of Digital at The Guardian and a co-founder of DocumentCloud. @pilhofer

Biographies of the advisers are available on our staff page.

 

DocumentCloud welcomes Justin Reese

Posted
Mar 24th, 2015

Tags
People

Author
Anthony DeBarros

We’re happy to announce that Justin Reese is joining the DocumentCloud development team. Hailing (and working remotely) from Tyler, Texas, Justin will focus on building the next generation of our platform’s front-end components, from the document workspace to embeds to the overall site experience.

Justin comes to DocumentCloud after spending years translating complicated business requirements into simple, usable web apps for companies such as Essilor Labs and Bon-Ton. We’re excited to have Justin as a collaborator. His work shows a thoughtful consideration of users and attention to detail, and as a contributor to projects such as Hack Tyler he shares DocumentCloud’s eagerness to create software that serves the public good. Justin’s artistry extends beyond software: he also makes short films and tolerable Neapolitan-style pizza (which we expect to taste asap).

The addition of Justin is part of our current funding from the Knight Foundation, a grant intended to expand and improve the platform. Our goal is to make DocumentCloud the best document reporting, research and publishing platform for journalists and those who work with public documents and to also ensure the long-term sustainability of the platform. In addition, we’re eager to continue DocumentCloud’s legacy of making elements of the platform available as open-source components, such as our recent release of PDFShaver. Justin will play a key role in helping us make that happen.

Please welcome him to our team. You can reach Justin at justin@documentcloud.org or follow him on Twitter.

Hungarian, Norwegian, Swedish OCR support added

Posted
Mar 12th, 2015

Tags
Documents,Workspace

Author
Anthony DeBarros

Starting today, DocumentCloud users can choose three additional languages to OCR uploaded documents: Hungarian, Norwegian and Swedish. We’ve added the three based on support requests and feedback we heard during last week’s NICAR conference in Atlanta.

The addition brings the number of languages available for OCR to 17. To see them all, click “Manage account” beneath your user name and click the “New Documents” dropdown under “Language Defaults.”

We believe that journalists around the world should have access to tools that enable better reporting, and growing our language support is critical to that. DocumentCloud’s language support falls into three independent categories (so partial support of your language may be possible):

Text Search: DocumentCloud fully supports Unicode, allowing users to search documents in a variety of character sets. For scanned documents that require OCR, DocumentCloud uses the battle-tested open-source Tesseract engine, which also powers Google Books and is maintained by Google. The open-source community has contributed language packages for many widely used languages, which allows DocumentCloud to enable them on our platform. So, if DocumentCloud does not yet support your language, please reach out and let us know about your interest!

Entity extraction: DocumentCloud supports identifying people, places, organizations and other entities through OpenCalais. As of now, OpenCalais only supports English, French and Spanish. We are evaluating other tools that would allow us to bring entity extraction to other languages.

Interface translation: Accessibility to our tools is more than being able to process non-english documents. Our users have already collaborated with us to translate DocumentCloud’s Workspace user interface into four languages: English, Spanish, Russian and Ukrainian. If you are interested in bringing DocumentCloud to your language, please email us at info@documentcloud.org

Shearing PDFs with PDFShaver at DocumentCloud

Posted
Mar 7th, 2015

Tags
Documents

Author
Ted Han

As of this week, documents uploaded to DocumentCloud will process much faster thanks to a new tool we’ve written called PDFShaver that wraps Google Chrome’s PDFium library.

How much faster? From our preliminary statistics, a lot.

Under the covers, DocumentCloud uses our Docsplit open source library to disassemble documents. Prior to PDFShaver, Docsplit relied upon Graphicsmagick and Ghostscript (GM+GS) to render PDFs and save pages as images.

GraphicsMagick and Ghostscript have served DocumentCloud well, but we’ve had trouble processing some poorly constructed documents that journalists receive from sources — governments, companies and non-profits, for example. Our search for a replacement led us to PDFium, and we found that not only did it solve a number of our issues but it also provided substantial gains in speed.

Testing PDFShaver and Graphicsmagick on 50 documents picked at random from DocumentCloud’s public collection shows that PDFShaver can render documents an order of magnitude faster (here’s our raw data). These data are a preliminary sample, but we’re excited about what it shows about the kinds of speed gains we can make to our processing pipeline. We’ll continue to track PDFShaver and DocumentCloud’s performance as we make improvements, so look forward to more updates!

Rendering PDF pages with PDFShaver & PDFium

PDFShaver works by connecting PDFium to Ruby with a C/C++ extension inside a Ruby gem.  PDFium itself is an open source library and the software that powers Google Chrome’s PDF viewer. And aside from taking advantage of the speed and capabilities Chrome’s tools provide, we’re happy to be able to make open source PDF processing easier to access through a programming language such as Ruby.

For example, picking the landscape-oriented pages out of a document and rendering them is as easy as these three lines:

document = PDFShaver::Document.new("./path/to/document.pdf")
landscape_pages = document.pages.select{ |page| page.aspect > 1 }
landscape_pages.each{ |page| page.render("page_#{page.number}.gif") }

We plan to keep improving PDFShaver and make more of PDFium’s features accessible to give Rubyists, data scientists and journalists a boost for overcoming the impediments that PDFs present.

If you’re interested in installing and using PDFShaver, you can read how on our Github repository. And if you’d like to help journalists and others free information from PDFs, your contributions are welcome!

 

Job posting: DocumentCloud seeks data engineer

Posted
Mar 7th, 2015

Tags
Jobs

Author
Anthony DeBarros

We’re looking for a data engineer to join the growing team at DocumentCloud! If you’d enjoy a chance to help develop the next generation of our service — an open-source civic platform that more than 1,000 news organizations use to analyze, annotate and publish documents for the public good — we’d love to hear from you.

This is a full-time, two-year position with full University of Missouri benefits funded by a grant from the Knight Foundation. We’re a nimble, tightly knit team that works remotely — we stay connected via Slack and video chats — so you can live where you’d like and work flexible hours.

You’ll work on DocumentCloud’s processing pipeline, which makes searching and analyzing document collections accessible to journalists, to improve DocumentCloud’s extraction and analysis capabilities. The pipeline consists of several open source tools wrapped up in our Ruby-based infrastructure (a Rails-driven API and our CloudCrowd parallel processing toolkit). You’ll also play a key role in developing our production API capabilities, especially focused around what information we extract for users from documents and how best to do so.

Our ideal candidate would have the following skills and qualities:

— Independent problem-solver who values learning, keeps current on trends, and knows how to pick the right set of tools for a problem.
— Able to write clean, well-documented code; you know your way around Git, and your Github account shows activity.
— Strong ability to collaborate and communicate with a distributed team.
— Ruby and Rails.
— Experience with Unix-based systems.
— Some knowledge of data science, linguistics, information extraction or search. SOLR experience is a bonus.
— An interest in language and data processing.
— Knowledge of SQL (Postgres preferred).

You’ll join DocumentCloud at a significant time. We’re enjoying widespread use of our platform, and our tools have been used to investigate and publish stories from the grand jury decision in Ferguson, Missouri, to the Guardian’s NSA spying leaks. We collaborate with organizations such as the Washington Post, The Associated Press and Mozilla’s OpenNews fellows to build better ways to present the news, and you’ll have the chance to be part of the community exploring this intersection of news, data and technology.

To apply, please contact us at jobs@documentcloud.org

Catch DocumentCloud at NICAR 2015 in Atlanta

Posted
Feb 26th, 2015

Tags
Documents,People

Author
Anthony DeBarros

If you’re coming to IRE’s annual data journalism conference March 5-8 in Atlanta, be sure to stop by and say hello to the DocumentCloud team!

We’ve made a few ways for you to learn more about the platform, tell us your ideas, and hear about what’s next for DocumentCloud:

— Saturday at 3:20 p.m., join us for a hands-on class, “Reporting and Presentation with DocumentCloud.” Get to know the suite of tools that DocumentCloud offers to help you better organize, analyze and present public documents.

— Sunday at 11:20 a.m. in the Demo Room, we’ll offer “Advanced DocumentCloud: Examples and Suggestions.” Take a deeper dive into DocumentCloud and its API and bring your ideas for features you’d like to see in the platform. Plus, see some of the best uses of DocumentCloud in the last year!

— Throughout the conference, you can find the DocumentCloud team on hand at its booth outside the conference rooms. We’ll be set up to give you a demo, answer questions about accounts and hear how you use the platform.

During the conference, reach us by email or Twitter. Find Ted Han via ted@documentcloud.org and @knowtheory; Anthony DeBarros via anthony@documentcloud.org and @anthonydb; and Lauren Grandestaff via lauren@ire.org and @lgrandestaff.

We’ll look forward to seeing you!

Ahead for 2015: A Faster, More Productive DocumentCloud

Posted
Jan 20th, 2015

Tags
Documents,Workspace

Author
Anthony DeBarros

If you’d dropped into the DocumentCloud workspace in Columbia, Mo., at the start of January, you’d have found at least two things: a team actively avoiding the single-digit temps outside the office and a whiteboard that we frequently filled with ideas, photographed for posterity, erased and filled up again.

We ended up ignoring the temps. The ideas generated enough heat to last us until summer – and beyond!

So, what’s ahead for 2015? We’ve spent the last year researching and reflecting on what you want, both as a buildup to the recent $1.4 million Knight Foundation grant and to make sure you’re as happy as possible with the service. We want to be sure the platform is fast, reliable, and enables you to do your best work.

There’s a legacy to continue. DocumentCloud was founded by and for journalists to support in-depth reporting around public documents. Today, more than 900 news organizations worldwide use the platform. Whether it’s publishing the documents related to doubts about the guilt of a Texas man executed for murder or the grand jury testimony regarding the death of Michael Brown in Ferguson, Mo., journalists use DocumentCloud to give their readers a first-hand view of the primary source documents they gather.

We plan to build from that success. The DocumentCloud team’s expanding, and we’re locking in a roadmap to grow and improve the platform. We’ve recently hired a director of product development, and we’ve just posted a job description for a front-end developer. With a bigger team and lots of focus, we believe that by the end of the year, you’ll see substantial improvements and expanded offerings that will maintain DocumentCloud’s place as an essential reporting and presentation tool.

So, here are some things we’re going to do to help you:

Improved processing. If you use DocumentCloud, you upload documents to the platform. You want them processed quickly, without errors, so you can get right to publishing or annotating. Over the last year, we’ve made substantial improvements to our processing cluster and sped up imports of popular documents uploaded by multiple users. And we’re glad that you’re noticing the results. We have more work planned – look for a blog post soon detailing the changes.

— Go mobile. We know (and so do you) that more of your readers view more of your content on their phones. So, we’re planning mobile-specific changes to the viewer to improve scrolling, zooming and the experience in general.

— More storytelling tools. We’re exploring ideas for expanding the display options for the viewer, such as presentation templates, social media sharing, notes displays and more. Many of you have asked for oEmbed support, and we’re looking closely at that!

— Telling DocumentCloud’s story. We’ll bring you more blog posts like this, keeping you up to date on our progress and listening for your ideas. We’ll also tell you more about how to make better use of all the site has to offer — such as deeper search options — and highlight great examples of your storytelling.

— Expanded reach and premium offerings. We want to make sure DocumentCloud is going to be available to journalists for years to come, and one way to do that is for the platform to begin generating revenue. This goal is part of the Knight grant, and so we’ll be exploring options – premium features, opening the tool to additional types of users on a fee basis, donations, and other ideas.

Beyond those, we have many more ideas, among them better feedback on document processing, ability to rotate pages, better organization of the site and workspace, and batch processing options.

That’s a lot to chew in one year, but with our team expanding – and, we hope, with continued contributions from the open source community – we’re pretty excited about the prospects.

As always, let us know your thoughts!  You can reach us on UserVoice, Twitter, or email.

Job Posting: Come work on DocumentCloud’s front end!

Posted
Jan 20th, 2015

Tags
Jobs

Author
Anthony DeBarros

DocumentCloud, the platform journalists use to analyze, annotate and publish documents, is growing! We have an immediate need for a JavaScript developer/architect who can help build the next evolution of our platform. This is a full-time, two-year position with full University of Missouri benefits funded by a grant from the Knight Foundation.

You can live where you’d like and work flexible hours. We’re a nimble, tightly knit team that works remotely. We scrum daily and stay connected via Slack and video chats. Our code is open source, so your commits to our Github will be seen by the growing community that depends on our platform.

You’ll join DocumentCloud at a significant time.  We build a civic platform that more than 1,000 news organizations worldwide use for the public good, and we value transparency, accountability and the preservation of a free press.  Our tools have been used to investigate and publish stories from the grand jury decision in Ferguson, Missouri, to the Guardian’s NSA spying leaks. We collaborate with organizations like the Washington Post, The Associated Press and Mozilla’s OpenNews fellows to build better ways to present the news, and you’ll have the chance to be part of the community exploring this intersection of news, data and technology.

You’ll be at the center of several of our immediate goals: we plan on improving the experience of reading documents on mobile devices; developing templates for displaying documents; and refreshing our website and user workspace.  And we’re interested in your creative input as we navigate DocumentCloud’s path forward.

Here’s what we’re looking for:

In this front end role, you’ll focus on DocumentCloud’s Backbone/Rails components, which let users upload, organize, research and embed documents – making it easy for users to highlight and publish the newsy parts of documents. You’ll develop for desktop and mobile web across browsers, creating a smart, easy workflow for journalists investigating and publishing documents, as well as our readers in the public.

We’d like you to have some or most of these skills and qualities:

— Strong ability to collaborate and communicate with a distributed team.

— Independent problem-solver who values learning, keeps current on new trends, but knows how to pick the right set of tools for a problem.

— Able to write clean, well-documented code; you know your way around Git, and your Github account shows activity.

— Familiar with Ruby/Rails and SQL/databases or willing to dive in and gain enough knowledge to contribute when needed.

To apply, please contact us at jobs@documentcloud.org

Welcome Aboard, Anthony DeBarros

Posted
Jan 6th, 2015

Tags
People

Author
Ted Han

I’m excited to announce that Anthony DeBarros is joining DocumentCloud this week.  We’ve known Tony for a long time as a DocumentCloud user at Gannett Digital and USA TODAY, where he’s led technology projects and data-driven stories and interactives.  Tony will be joining me to lead and manage DocumentCloud’s product efforts as our platform and team grows.

Thanks to a grant from the Knight Foundation, we’re looking forward to expanding and improving our platform. We’d like to find new ways to help journalists with reporting, research and publishing on deadline. The grant also is designed to give us the footing we need to guarantee DocumentClould’s long-term sustainability, so that journalists can continue to rely on the platform into the future, and Tony is going to be an important part of that process.

So, welcome Tony!  We’re happy to have you.

You can also reach him at anthony@documentcloud.org and on twitter as anthonydb, and you can learn more about him on the IRE website.

 

DocumentCloud searches for Product Manager

Posted
Oct 9th, 2014

Tags
Jobs

Author
Ted Han

DocumentCloud is a technology service created to help journalists and improve transparency in journalism.  Our platform helps journalists find and highlight interesting information in the primary source documents used in their reporting.

Our document research & publishing platform is used by the likes of the New York Times, ProPublica, the Guardian, NPR, La Nacion, Al Jazeera, Homicidewatch and many more.  Journalists have used our tools to support their innovative, award winning and world changing reporting (you can see examples on our featured reporting page).

With the help of a $1.4 million grant from the Knight Foundation, we’re expanding our team so that DocumentCloud can grow as an organization and sustain itself as an effective and efficient tool for our users.

We’re searching for a product manager who has experience working with software development teams.  You need to be comfortable with human-centric design processes (even in an informal way).  As manager you’ll work with our development team to plan, prioritize and keep us focused on making good products.  Together with our team, you’ll be responsible for developing products & features in line with our mission and our goal of generating enough revenue to sustain DocumentCloud.

We care deeply about contributing to the civic sphere whether that’s better tools for reporters, or open source software and patterns.  In fact the DocumentCloud platform itself was written as open source components, many of which have gone on to be adopted in other industries including Backbone.js, Underscore.js, VisualSearch, Docsplit as well as many others.

So, we hope you’ll join us to make great products with an eye to make the world greater too.  You’ll be able to join us from wherever you live, as DocumentCloud operates as a distributed organization.  Officially we’re based out of the Columbia, Missouri offices of our parent organization, Investigative Reporters & Editors, but we primarily function through Slack, IRC and Hangouts/Skype.

Email us at jobs@documentcloud.org by October 31st!

 

 

How we made DocumentCloud note embeds responsive

Posted
Sep 25th, 2014

Tags
Documents

Author
Ted Han

By Tom Meagher, data editor at The Marshall Project,
Emily Yount, interaction designer at The Washington Post,
Matt DeLong, national digital projects editor at The Washington Post
and Ted Han, lead developer at IRE/DocumentCloud

On Aug. 3, The Marshall Project, a new nonprofit journalism organization focused on criminal justice issues, published an investigation in partnership with The Washington Post that revealed new evidence raising doubts about a high-profile Texas execution.

TOM: Our reporter, Maurice Possley, began working on this story months before most of the rest of our newsroom at the Marshall Project was even hired. By the time we were able to start helping, the story was mostly reported, so we dove into the documents to bring ourselves up to speed.

The case against Cameron Todd Willingham — who was executed in Texas for the murder of his three daughters — had been written about extensively over the last 22 years, but a lot of new information was uncovered, and it was all in the documents. We knew we wanted to be able to explore and highlight the correspondence that cast this case in an entirely new light. DocumentCloud was clearly the answer.

In the course of his reporting, Possley, who has covered this case for more than a decade, was given access to copies of dozens of primary source documents that tell the backstory of Cameron Todd Willingham and the informer who helped convict him. In filing its grievance with the State Bar of Texas against the former prosecutor in the case, the Innocence Project had acquired these documents and assembled them into a series of appendices. They gave us eight PDF files that added up to nearly 400 pages. We used DocumentCloud to stitch them all back together into one large file.

We then combed through the appendices and dozens of other records of court testimony and correspondence. As we saw the various typefaces and handwriting styles that made up the key passages, we knew we wanted to use DocumentCloud notes to present excerpts directly in the story.

Matt: I started working on the story in earnest a couple of weeks before it published. We were very excited about having so many primary-source documents to enrich the narrative. The Post has been using DocumentCloud for years, but we’ve long been frustrated by one of its biggest limitations: it isn’t mobile-friendly. This isn’t really DocumentCloud’s fault; these scanned documents are a set size, so when you scale them down, at some point words will become too small to read.

We had seen how the New York Times addressed the problem, by putting up the text and linking to the original document in DocumentCloud. That’s totally logical and fine if the words are all that you care about, but in this case we have official letters and handwritten notes between the characters in the story. The pages themselves are interesting, and many readers will want to see them with their own eyes.

We decided at the outset that however we ended up displaying the documents we included in the story, they had to be responsive. But this meant we’d have to come up with our own hack. Emily and I had both been thinking about this problem individually for a while, and we had some time to work on it, so we decided to try to figure out a solution that we could use in this project.

Emily: At the time of publication, DocumentCloud’s note embed code already resized and repositioned the note based on the width of the DC-note-container div, so I knew we only needed to solve for when the note is wider than the note embed and the right side of the note is cut off (see image below).

image01

To solve this problem, when the embed first loads, the code stored the coordinates, width and height of the note relative to an image of the page of the annotated document. When the page loads, the browser resizes or the orientation of your device changes, javascript media queries (matchMedia) detect whether the note is wider than the embed and then resizes and repositions the document image.

The original coordinates, width and height allow us to determine how wide the note is in relation to the document image and resize the document just enough to make the note 100% of the embed, instead of the document image 100% of the embed. This helps with readability by making the text as large as it can be. At times, depending on the width of the note and the size of the text, there will still be readability issues, so cropping the annotations carefully and testing to make sure they are readable is really important.

Here’s an example from the Willingham story of a responsive note on an iPhone 5:

image00

Ted: We were thrilled when Ben Chartoff (OpenNews fellow at the Washington Post) reached out to put Emily in touch with us.

We believe deeply in DocumentCloud as an open source project as well as the service to which journalists post documents relevant to the public interest. Emily and Matt’s motivation to extend the behavior which DocumentCloud already provides and to share their code back is exactly the kind of effort we love to see and encourage.

Technology in the world of news is a means toward the end of better reporting. Especially in competitive industries like ours, an open source ethos around the tools we all share is an avenue for us work together to improve the state of all reporting. Anyone who solves an issue for their own needs can help to solve that issue for everyone.

In that spirit we were excited to incorporate Emily’s code into our own. To do so, we spun our note code off into its own repository to make it easier for anyone to contribute (you can find the code on Github as documentcloud-notes). Then with the Washington Post’s & Marshall Project’s stories as a basis we began incorporating the changes. Ultimately, we ended up rewriting much of Emily’s code in the process, but what she had written served as the design criteria to anchor the code we wrote.

Our responsive notes code is already live on DocumentCloud now, and journalists needn’t take any additional steps to use it. Any embedded note from DocumentCloud will now behave responsively.

Congrats to our users, documen…

Posted
Sep 28th, 2012

Tags
Twitter

Author
documentcloud

Congrats to our users, documents on DocumentCloud have been viewed 60 million times. /TH

Nice use of embedded annotatio…

Posted
Sep 28th, 2012

Tags
Twitter

Author
documentcloud

Nice use of embedded annotations to piece together a timeline of the death of a Guantánamo detainee: http://t.co/UZvZISIU by @ProPublica /ja

Trying to integrate the Docume…

Posted
Sep 20th, 2012

Tags
Twitter

Author
documentcloud

Trying to integrate the DocumentViewer with an app in your newsroom? Useful tip: Here’s the “public” API: https://t.co/axjlpe2g /ja

The DocumentCloud API now supp…

Posted
Sep 10th, 2012

Tags
Twitter

Author
documentcloud

The DocumentCloud API now supports CORS responses for public queries. Feel free to use it if JSONP isn’t your cup of tea… /ja

Interested in learning about D…

Posted
Sep 6th, 2012

Tags
Twitter

Author
documentcloud

Interested in learning about DocumentCloud’s story? The @knightfdn is doing a review of the 2009 #newschallenge http://t.co/nmylmn6U

Do you depend on DocumentCloud…

Posted
Sep 5th, 2012

Tags
Twitter

Author
documentcloud

Do you depend on DocumentCloud’s APIs? Sign up to our API announcements list: https://t.co/EDHWfQYn /th

You may have noticed we’re ext…

Posted
Aug 30th, 2012

Tags
Twitter

Author
documentcloud

You may have noticed we’re extra busy today, documents might take a little while to process through. Thanks for bearing with us. /th

Interested in DocumentCloud fo…

Posted
Aug 30th, 2012

Tags
Twitter

Author
documentcloud

Interested in DocumentCloud for your language(s)? Make sure to let us know: https://t.co/SQ59mGLx /th

Thanks for bearing with us, we…

Posted
Aug 23rd, 2012

Tags
Twitter

Author
documentcloud

Thanks for bearing with us, we’re keeping an eye on things, but service is back to normal. /TH

Hey folks, apologies for the d…

Posted
Aug 23rd, 2012

Tags
Twitter

Author
documentcloud

Hey folks, apologies for the difficulties, we’re experiencing some uniquely heavy traffic today. We’ll update you when we can. /TH

DocumentCloud document embeds …

Posted
Aug 15th, 2012

Tags
Twitter

Author
documentcloud

DocumentCloud document embeds just got a little bit smarter about SSL. They’ll now serve via HTTPS on secure pages. /ja