Latest Updates: Our Blog

Category: People

A Node.js wrapper for the DocumentCloud API

Posted
Apr 7th, 2016

Tags
Code,People

Author
Anthony DeBarros

Thanks to Ryan Murphy of the Texas Tribune, there’s a new way to simplify using DocumentCloud’s API – a Node.js library aptly called node-documentcloud.

“Why should Ruby and Python get to have all the fun?” Murphy said, referring to the fact that coders for some time have been able to use python-documentcloud and the documentcloud RubyGem wrappers to work with the DocumentCloud API.

“The more I use Node.js, the more I like having the option to complete tasks in the language,” Murphy said. “DocumentCloud also has a relatively straightforward API structure, so it also seemed like a good opportunity to try building a client for the first time (something I’ve wanted to attempt for a while).”

DocumentCloud’s API is a powerful piece of the platform, a web service that lets you interact programmatically with resources such as documents, projects and entities. Various API methods let you upload files, create projects, update document data and embed assets via oEmbed, among other tasks.

You can interact with our API via the programming language of your choice, but if you’re a user of Python, Ruby, or now Node.js, the wrappers around the API contributed by the open-source community provide many shortcuts over coding all the interactions yourself.

Murphy sees his library as a gateway to additional features and platforms around DocumentCloud.

“For example, one of the first spinoffs I’ve begun work on is a command line interface on top of node-documentcloud — something that would allow you to interface with the DocumentCloud client from your terminal,” Murphy said.

“Then, it could be as simple as something like documentcloud-cli upload <name_of_folder> to send a bunch of documents to the service. Or, documentcloud-cli download <document_id> to pull down a file. It’s still early going!”

Read all about the wrappers

You can learn more about node-documentcloud by visiting its documentation on Github or npm.

If you’re a Python or Ruby coder, take a look at:

python-documentcloud: From Ben Welsh of the Los Angeles Times’ data desk comes this full-featured API client for Python programmers. In addition to covering the basics, this library goes deep with providing details such as the location of annotations in a document. Documentation.

pneumatic: A Python bulk-upload library for DocumentCloud, written by Anthony DeBarros of the DocumentCloud team. Provides features including cataloging all the files uploaded and their URLs in a database. Documentation.

DocumentCloud: A RubyGem for interacting with the DocumentCloud API, created by Miles Zimmerman. Upload, search, retrieve data about documents. Github. RubyGems.

WRAL builds award-winning app with DocumentCloud API

Posted
Apr 7th, 2016

Tags
Code,Documents,People

Author
Anthony DeBarros

A big congratulations from the DocumentCloud team to Tyler Dukes, public records reporter at TV station WRAL in North Carolina. Dukes received a 2016 Sunshine Award from the North Carolina Open Government Coalition for work including a custom document-search application he built using the DocumentCloud API.

The API is a powerful piece of the DocumentCloud platform, a web service that lets you interact programmatically with resources such as documents, projects and entities. Various API methods let you upload files, create projects, update document data and embed assets via oEmbed, among other tasks.

When the University of North Carolina at Chapel Hill released hundreds of thousands of pages of documents gathered during an independent investigation into academic fraud involving faculty, staff and student athletes, Dukes turned to the search method of DocumentCloud’s API to build a web application that let users find and read documents by keyword or key people in the investigation.

“We wanted to build something to allow users to browse and search hundreds of thousands of pages of documents all in one place,” Dukes said. “DocumentCloud’s existing embeddable search was close, but because the documents were arbitrarily spread across hundreds of batches, I was concerned it would be too confusing for the average user.

“The API allowed us to very quickly prototype and roll out exactly what we wanted for this very specific circumstance,” he said. “We used the API to pull every page from documents stored in a single project and display them in an intuitive application that allows users to read page by page (or even random pages) or search everything at once. We’ve updated the application twice now, and we’re currently up to 680,000 pages and counting.”

Dukes’ project is one of several in recent months that have used our API or components to give readers custom search and viewing, including a Wall Street Journal application to let readers tag Hillary Clinton’s emails and La Nacion’s election crowdsourcing application VozData.

If you’re interested in using the DocumentCloud API, check out our help documentation and don’t be shy about getting in touch.

DocumentCloud adds five to group of advisers

Posted
Apr 16th, 2015

Tags
People

Author
Anthony DeBarros

DocumentCloud is pleased today to welcome five media and technology professionals to its group of advisers. The expanded group, which includes two of the platform’s founders, will help guide DocumentCloud as it develops new offerings and plans for sustainability.

DocumentCloud, which is a service of Investigative Reporters and Editors, serves thousands of journalists worldwide with tools for organizing, researching, annotating and publishing documents gathered while reporting. The expanded advisory group is one of several efforts under way as part of a 2014 Knight Foundation grant that is enabling DocumentCloud to add staff, improve the platform’s efficiency, and implement new features.

The advisers will help the team weigh questions related to technology, market opportunities, product development and revenue models.

“We’re excited to have a strong group of experts who are willing to share their expertise with us,” said Mark Horvit, executive director of IRE. “Their guidance will play a key role in helping us chart the future of DocumentCloud.”

The DocumentCloud advisers includes:

Penelope (Penny) Muse Abernathy, the Knight Chair in Journalism and Digital Media Economics at the University of North Carolina and a journalism professional with more than 30 years of experience as a reporter, editor and media executive. @businessofnews

Matthew de Ganon, Senior Vice President of Product Management & Commerce at Softcard, the mobile wallet joint venture of AT&T, T-Mobile and Verizon. @deganon

Eric Gundersen, CEO of Mapbox, a leading provider of custom online mapping solutions. @ericg

Jacqueline Kazil, an Innovation Specialist working on cross-agency platforms for the federal government. @JackieKazil

Scott Klein, an assistant managing editor at ProPublica and a co-founder of DocumentCloud. @kleinmatic

T. Christian Miller, a member of the Investigative Reporters and Editors board of directors, is a senior reporter at ProPublica, which he joined in 2008. @txiatianmiller

Aron Pilhofer, Executive Editor of Digital at The Guardian and a co-founder of DocumentCloud. @pilhofer

Biographies of the advisers are available on our staff page.

 

DocumentCloud welcomes Justin Reese

Posted
Mar 24th, 2015

Tags
People

Author
Anthony DeBarros

We’re happy to announce that Justin Reese is joining the DocumentCloud development team. Hailing (and working remotely) from Tyler, Texas, Justin will focus on building the next generation of our platform’s front-end components, from the document workspace to embeds to the overall site experience.

Justin comes to DocumentCloud after spending years translating complicated business requirements into simple, usable web apps for companies such as Essilor Labs and Bon-Ton. We’re excited to have Justin as a collaborator. His work shows a thoughtful consideration of users and attention to detail, and as a contributor to projects such as Hack Tyler he shares DocumentCloud’s eagerness to create software that serves the public good. Justin’s artistry extends beyond software: he also makes short films and tolerable Neapolitan-style pizza (which we expect to taste asap).

The addition of Justin is part of our current funding from the Knight Foundation, a grant intended to expand and improve the platform. Our goal is to make DocumentCloud the best document reporting, research and publishing platform for journalists and those who work with public documents and to also ensure the long-term sustainability of the platform. In addition, we’re eager to continue DocumentCloud’s legacy of making elements of the platform available as open-source components, such as our recent release of PDFShaver. Justin will play a key role in helping us make that happen.

Please welcome him to our team. You can reach Justin at justin@documentcloud.org or follow him on Twitter.

Catch DocumentCloud at NICAR 2015 in Atlanta

Posted
Feb 26th, 2015

Tags
Documents,People

Author
Anthony DeBarros

If you’re coming to IRE’s annual data journalism conference March 5-8 in Atlanta, be sure to stop by and say hello to the DocumentCloud team!

We’ve made a few ways for you to learn more about the platform, tell us your ideas, and hear about what’s next for DocumentCloud:

— Saturday at 3:20 p.m., join us for a hands-on class, “Reporting and Presentation with DocumentCloud.” Get to know the suite of tools that DocumentCloud offers to help you better organize, analyze and present public documents.

— Sunday at 11:20 a.m. in the Demo Room, we’ll offer “Advanced DocumentCloud: Examples and Suggestions.” Take a deeper dive into DocumentCloud and its API and bring your ideas for features you’d like to see in the platform. Plus, see some of the best uses of DocumentCloud in the last year!

— Throughout the conference, you can find the DocumentCloud team on hand at its booth outside the conference rooms. We’ll be set up to give you a demo, answer questions about accounts and hear how you use the platform.

During the conference, reach us by email or Twitter. Find Ted Han via ted@documentcloud.org and @knowtheory; Anthony DeBarros via anthony@documentcloud.org and @anthonydb; and Lauren Grandestaff via lauren@ire.org and @lgrandestaff.

We’ll look forward to seeing you!

Welcome Aboard, Anthony DeBarros

Posted
Jan 6th, 2015

Tags
People

Author
Ted Han

I’m excited to announce that Anthony DeBarros is joining DocumentCloud this week.  We’ve known Tony for a long time as a DocumentCloud user at Gannett Digital and USA TODAY, where he’s led technology projects and data-driven stories and interactives.  Tony will be joining me to lead and manage DocumentCloud’s product efforts as our platform and team grows.

Thanks to a grant from the Knight Foundation, we’re looking forward to expanding and improving our platform. We’d like to find new ways to help journalists with reporting, research and publishing on deadline. The grant also is designed to give us the footing we need to guarantee DocumentClould’s long-term sustainability, so that journalists can continue to rely on the platform into the future, and Tony is going to be an important part of that process.

So, welcome Tony!  We’re happy to have you.

You can also reach him at anthony@documentcloud.org and on twitter as anthonydb, and you can learn more about him on the IRE website.

 

DocumentCloud Seeks Developer to Work on Open Source Software Platform

Posted
Jul 16th, 2012

Tags
People

Author
Ted Han

We have a lot of projects going on at DocumentCloud and to serve those goals we’re looking for others to join us! For those who may be unfamiliar with our project, we’ve included the full details below.

DocumentCloud is a web based platform allowing journalists to upload, analyze, annotate, and publish primary source documents. We want give journalists the tools to show their audience their source material, not just tell them about it. In addition to the newsrooms worldwide who use DocumentCloud, our open source software projects, such as Backbone.js, Underscore.js, Docsplit, and Jammit, are relied upon by companies such as LinkedIn, Walmart, Foursquare and more. DocumentCloud is run by Investigative Reporters & Editors.

What DocumentCloud is building

  • DocumentCloud is growing fast, and we’re looking to accelerate that pace by expanding our tools into other languages beyond English. In the next year we’ll adapt our platform to accommodate multi-language OCR, search indexing, and entity extraction tools.
  • DocumentCloud always looks for new ways to present documents and engage readers. We are extending DocumentCloud’s document viewer and annotation tools so that readers can make their own comments and notes on documents.

DocumentCloud is looking for someone with a combination of the following skills

Experience with Ruby and JavaScript; API driven web applications; working on and fostering FOSS; user-centered products; Experience the JVM toolchain; linux administration on Platform as a Service providers such as AWS.

Things we like and hope you like too!

Literate programming; Extracting libraries from app code; Polyglot programming; Web standards; Journalists; Natural Language Processing

Practical Details

Investigative Reporters & Editors is based in Columbia, Missouri, on the University of Missouri’s campus. DocumentCloud is comfortable operating with a distributed team.

You can email us at jobs@documentcloud.org

Welcome Aboard, Ted Han

Posted
Sep 21st, 2011

Tags
IdeaLab,People

Author
Amanda Hickman

Back in August, we announced that we’d be welcoming a new lead developer, but he’s been on the job two weeks already and we managed to forget to say anything like “Welcome aboard!”

Well, better late than never. Continue reading »

DocumentCloud Merges with IRE

Posted
Jun 9th, 2011

Tags
People

Author
Amanda Hickman

DocumentCloud is beyond delighted to announce that we’ve found a long-term home for our project. We’re merging our operation with Investigative Reporters and Editors, a nonprofit grassroots organization committed to fostering excellence in investigative journalism. This transition means that DocumentCloud will have a permanent place in a longstanding resource for investigative reporting. IRE has a long and established history of supporting investigative reporting, and we’ll be a proud part of their ongoing work to provide journalists with tools that support their reporting. It goes without saying that DocumentCloud is a natural fit for an organization that has been upholding high professional standards and instilling a passion for public service journalism for more than 35 years.

IRE will continue to honor all of the promises we have made to our users, and our staff will be working to ensure a smooth transition. The best way to get your questions answered will still be reaching out to support@documentcloud.org or contacting us through the workspace.  We’re still welcoming new users — contact us to find out more about bringing your newsroom on board.

We’ve even got some great new tools in the works. More on that soon.

All of us are committed to the continuing success of DocumentCloud. Over the next few months, we’ll be handing off day to day responsibility for managing DocumentCloud to IRE’s staff based at the University of Missouri in Columbia, Mo. I’ll stay on as program director through the summer to facilitate a smooth transition. Developer Sam Clay is moving to San Francisco to join a startup there. Our lead developer, Jeremy Ashkenas, has moved to the New York Times’s Interactive News team, but will remain actively involved with DocumentCloud on the technical side. Our founders will be here to help DocumentCloud continue to thrive — Scott Klein, Aron Pilhofer and Eric Umansky will remain on the project as advisors and advocates.

We’re already interviewing strong candidates to take over as lead developer, but will be looking for more developers, too. More on that soon as well.

DocumentCloud was first envisioned by a team of editors at ProPublica and The New York Times, and was founded in 2009 through a grant from the John S. and James L. Knight Foundation to build an online catalog of primary source documents and a set of tools to help journalists get more out of source documents. We are all immensely grateful to Knight for their confidence in us. We think their investment paid off. Not only do newsrooms have a new resource that is already indispensable, but DocumentCloud helped demonstrate that 21st century newsrooms are ready to collaborate and share what were once privately held materials. The public is better informed because of it.

Since we launched in March of 2010, newsrooms and watchdog organizations have used DocumentCloud to analyze, annotate, and publish thousands of documents ranging from suspicious, if not outright spurious, expense reports filed by local authorities in Long Island, New York to hundreds of pages of correspondence released by the Financial Crisis Inquiry Commission, and much, much more. How much more? We encourage you to search our public catalog and see for yourself.

See Also:
IRE’s announccment: DocumentCloud joins IRE, and Knight’s: News Challenge Success Story Finds a Home

Welcome, Samuel Clay

Posted
Jul 9th, 2010

Tags
People

Author
Amanda Hickman

Sam Clay

Our third hire! Developer Samuel Clay joins DocumentCloud today, bringing our full time staff to a total of three.

Samuel joins us from Storybird, a collaborative storytelling startup which works with artists to give children access to high quality narrative art that they can use to publish their own original stories. He’s also the mastermind behind NewsBlur, an open source feed reader that uses artificial intelligence to suggest stories you might want to read. Think of it as an RSS reader with intelligence.

Samuel lives in Brooklyn with his dog and guinea pigs, where he photographs historic districts for New York Field Guide. Find him at samuel@documentcloud.org or on twitter.

He’ll be bringing his formidable JavaScript skills to DocumentCloud’s workspace, which should be getting more awesome twice as fast now.

Seeking Consultants (updated)

Posted
Nov 17th, 2009

Tags
People

Author
Amanda Hickman

update: we have what we need for now, thanks.

Have you been watching DocumentCloud roll out code releases and wishing you could be part of it all? You can! We’re looking for a couple of consultants to help us build out Document Cloud: we need a JavaScript consultant to work with us on an ongoing basis over the next few months and a Posgres expert to do some intense consulting with us.

We’re building a research tool for reporters, a semantic search engine, an index of primary source documents with our grant from the Knight Foundation. DocumentCloud will be free and open source software.

We need a JavaScript developer to help build out a rich, web-based tool that journalists will use to search and organize documents, as well as visualize the relationships between documents. A strong foundation in HTML and CSS is required, bonus points for comfort in Ruby. If you think that doing full JavaScript MVC in the browser doesn’t sound like a crazy idea, then we want to hear from you.

We also need an expert-level PostgreSQL consultant to sit down with us and review and refine our architecture plans. We’re looking someone with plenty of experience working with sharded Postgres installations, someone skilled at tuning Postgres for full text searches over very large datasets (potentially approaching hundreds of thousands of documents) and well versed in best practices for deploying Postgres on EC2.

If either of these sounds like you, send your resume, a rate quote and a short description of particularly relevant work to: jobs@documentcloud.org with “JavaScript Developer” or “Postgres Consultant” in the subject line.

Hint: the subject line matters more than you’d think. Our “jobs” inbox has a procmail filter and three folders: JavaScript, Postgres and Trash.

Our Second Hire

Posted
Nov 9th, 2009

Tags
People

Author
Amanda Hickman

Here at Document Cloud we’ve finally hired ourselves a Program Director to keep Jeremy, our lead developer, company. Someone to manage our impressive and growing list of document partners and help them get the most out of Document Cloud. Someone to develop some training materials and help our beta testers get started beta testing. For her first challenge, we asked her to write a blog post in the third person.

Amanda Hickman joins us from Gotham Gazette where, as the Director of Technology, she managed development of a series of games about public policy issues, built a pretty cool database of candidates for local office and shared an ONA award for General Excellence with her colleagues there. Prior to joining Gotham Gazette, she worked as a Circuit Rider, providing technology assistance and training to low-income grassroots groups in the U.S. working on anti-poverty issues and as a consultant to foundations looking for ways to support their grantees’ use of technology in organizing work. She taught an undergraduate course at NYU’s Gallatin School on using the Internet as an organizing tool. An active local organizer, she’s got her hands in a few community composting and gardening projects, too. If you ever tire of hearing about semantic analysis of primary source documents, try asking her about the dwarf crab apple trees at Greene Acres or what she does with 1300 lbs of compost every week.

She’ll be back here answering all your questions just as soon as she can manage.

Our First Hire

Posted
Sep 14th, 2009

Tags
People

Author
Scott

We’re excited to announce that Jeremy Ashkenas has joined the team as the lead developer for DocumentCloud. His previous job was at Zenbe Inc., a provider of online email and collaboration software. He’s the creator of the Ruby-Processing visualization toolkit, and a winnertwice — of the Sunlight Foundation’s Apps for America competition. Jeremy graduated from Brown University with a degree in Literary Systems.

Over the past few weeks, he’s been working on the central processing system for a DocumentCloud prototype. We are planning to open source this tool shortly … so stay tuned.