Thanks to Ryan Murphy of the Texas Tribune, there’s a new way to simplify using DocumentCloud’s API – a Node.js library aptly called node-documentcloud.
“Why should Ruby and Python get to have all the fun?” Murphy said, referring to the fact that coders for some time have been able to use python-documentcloud and the documentcloud RubyGem wrappers to work with the DocumentCloud API.
“The more I use Node.js, the more I like having the option to complete tasks in the language,” Murphy said. “DocumentCloud also has a relatively straightforward API structure, so it also seemed like a good opportunity to try building a client for the first time (something I’ve wanted to attempt for a while).”
DocumentCloud’s API is a powerful piece of the platform, a web service that lets you interact programmatically with resources such as documents, projects and entities. Various API methods let you upload files, create projects, update document data and embed assets via oEmbed, among other tasks.
You can interact with our API via the programming language of your choice, but if you’re a user of Python, Ruby, or now Node.js, the wrappers around the API contributed by the open-source community provide many shortcuts over coding all the interactions yourself.
Murphy sees his library as a gateway to additional features and platforms around DocumentCloud.
“For example, one of the first spinoffs I’ve begun work on is a command line interface on top of node-documentcloud — something that would allow you to interface with the DocumentCloud client from your terminal,” Murphy said.
“Then, it could be as simple as something like
documentcloud-cli upload <name_of_folder> to send a bunch of documents to the service. Or,
documentcloud-cli download <document_id> to pull down a file. It’s still early going!”
Read all about the wrappers
You can learn more about node-documentcloud by visiting its documentation on Github or npm.
If you’re a Python or Ruby coder, take a look at:
python-documentcloud: From Ben Welsh of the Los Angeles Times’ data desk comes this full-featured API client for Python programmers. In addition to covering the basics, this library goes deep with providing details such as the location of annotations in a document. Documentation.
pneumatic: A Python bulk-upload library for DocumentCloud, written by Anthony DeBarros of the DocumentCloud team. Provides features including cataloging all the files uploaded and their URLs in a database. Documentation.
DocumentCloud: A RubyGem for interacting with the DocumentCloud API, created by Miles Zimmerman. Upload, search, retrieve data about documents. Github. RubyGems.
A big congratulations from the DocumentCloud team to Tyler Dukes, public records reporter at TV station WRAL in North Carolina. Dukes received a 2016 Sunshine Award from the North Carolina Open Government Coalition for work including a custom document-search application he built using the DocumentCloud API.
The API is a powerful piece of the DocumentCloud platform, a web service that lets you interact programmatically with resources such as documents, projects and entities. Various API methods let you upload files, create projects, update document data and embed assets via oEmbed, among other tasks.
When the University of North Carolina at Chapel Hill released hundreds of thousands of pages of documents gathered during an independent investigation into academic fraud involving faculty, staff and student athletes, Dukes turned to the search method of DocumentCloud’s API to build a web application that let users find and read documents by keyword or key people in the investigation.
“We wanted to build something to allow users to browse and search hundreds of thousands of pages of documents all in one place,” Dukes said. “DocumentCloud’s existing embeddable search was close, but because the documents were arbitrarily spread across hundreds of batches, I was concerned it would be too confusing for the average user.
“The API allowed us to very quickly prototype and roll out exactly what we wanted for this very specific circumstance,” he said. “We used the API to pull every page from documents stored in a single project and display them in an intuitive application that allows users to read page by page (or even random pages) or search everything at once. We’ve updated the application twice now, and we’re currently up to 680,000 pages and counting.”
Dukes’ project is one of several in recent months that have used our API or components to give readers custom search and viewing, including a Wall Street Journal application to let readers tag Hillary Clinton’s emails and La Nacion’s election crowdsourcing application VozData.
If you’re interested in using the DocumentCloud API, check out our help documentation and don’t be shy about getting in touch.
DocumentCloud is pleased today to welcome five media and technology professionals to its group of advisers. The expanded group, which includes two of the platform’s founders, will help guide DocumentCloud as it develops new offerings and plans for sustainability.
DocumentCloud, which is a service of Investigative Reporters and Editors, serves thousands of journalists worldwide with tools for organizing, researching, annotating and publishing documents gathered while reporting. The expanded advisory group is one of several efforts under way as part of a 2014 Knight Foundation grant that is enabling DocumentCloud to add staff, improve the platform’s efficiency, and implement new features.
The advisers will help the team weigh questions related to technology, market opportunities, product development and revenue models.
“We’re excited to have a strong group of experts who are willing to share their expertise with us,” said Mark Horvit, executive director of IRE. “Their guidance will play a key role in helping us chart the future of DocumentCloud.”
The DocumentCloud advisers includes:
Penelope (Penny) Muse Abernathy, the Knight Chair in Journalism and Digital Media Economics at the University of North Carolina and a journalism professional with more than 30 years of experience as a reporter, editor and media executive. @businessofnews
Matthew de Ganon, Senior Vice President of Product Management & Commerce at Softcard, the mobile wallet joint venture of AT&T, T-Mobile and Verizon. @deganon
Eric Gundersen, CEO of Mapbox, a leading provider of custom online mapping solutions. @ericg
Jacqueline Kazil, an Innovation Specialist working on cross-agency platforms for the federal government. @JackieKazil
Scott Klein, an assistant managing editor at ProPublica and a co-founder of DocumentCloud. @kleinmatic
T. Christian Miller, a member of the Investigative Reporters and Editors board of directors, is a senior reporter at ProPublica, which he joined in 2008. @txiatianmiller
Aron Pilhofer, Executive Editor of Digital at The Guardian and a co-founder of DocumentCloud. @pilhofer
Biographies of the advisers are available on our staff page.
We’re happy to announce that Justin Reese is joining the DocumentCloud development team. Hailing (and working remotely) from Tyler, Texas, Justin will focus on building the next generation of our platform’s front-end components, from the document workspace to embeds to the overall site experience.
Justin comes to DocumentCloud after spending years translating complicated business requirements into simple, usable web apps for companies such as Essilor Labs and Bon-Ton. We’re excited to have Justin as a collaborator. His work shows a thoughtful consideration of users and attention to detail, and as a contributor to projects such as Hack Tyler he shares DocumentCloud’s eagerness to create software that serves the public good. Justin’s artistry extends beyond software: he also makes short films and tolerable Neapolitan-style pizza (which we expect to taste asap).
The addition of Justin is part of our current funding from the Knight Foundation, a grant intended to expand and improve the platform. Our goal is to make DocumentCloud the best document reporting, research and publishing platform for journalists and those who work with public documents and to also ensure the long-term sustainability of the platform. In addition, we’re eager to continue DocumentCloud’s legacy of making elements of the platform available as open-source components, such as our recent release of PDFShaver. Justin will play a key role in helping us make that happen.
Please welcome him to our team. You can reach Justin at email@example.com or follow him on Twitter.
If you’re coming to IRE’s annual data journalism conference March 5-8 in Atlanta, be sure to stop by and say hello to the DocumentCloud team!
We’ve made a few ways for you to learn more about the platform, tell us your ideas, and hear about what’s next for DocumentCloud:
— Saturday at 3:20 p.m., join us for a hands-on class, “Reporting and Presentation with DocumentCloud.” Get to know the suite of tools that DocumentCloud offers to help you better organize, analyze and present public documents.
— Sunday at 11:20 a.m. in the Demo Room, we’ll offer “Advanced DocumentCloud: Examples and Suggestions.” Take a deeper dive into DocumentCloud and its API and bring your ideas for features you’d like to see in the platform. Plus, see some of the best uses of DocumentCloud in the last year!
— Throughout the conference, you can find the DocumentCloud team on hand at its booth outside the conference rooms. We’ll be set up to give you a demo, answer questions about accounts and hear how you use the platform.
During the conference, reach us by email or Twitter. Find Ted Han via firstname.lastname@example.org and @knowtheory; Anthony DeBarros via email@example.com and @anthonydb; and Lauren Grandestaff via firstname.lastname@example.org and @lgrandestaff.
We’ll look forward to seeing you!
We have a lot of projects going on at DocumentCloud and to serve those goals we’re looking for others to join us! For those who may be unfamiliar with our project, we’ve included the full details below.
DocumentCloud is a web based platform allowing journalists to upload, analyze, annotate, and publish primary source documents. We want give journalists the tools to show their audience their source material, not just tell them about it. In addition to the newsrooms worldwide who use DocumentCloud, our open source software projects, such as Backbone.js, Underscore.js, Docsplit, and Jammit, are relied upon by companies such as LinkedIn, Walmart, Foursquare and more. DocumentCloud is run by Investigative Reporters & Editors.
What DocumentCloud is building
- DocumentCloud is growing fast, and we’re looking to accelerate that pace by expanding our tools into other languages beyond English. In the next year we’ll adapt our platform to accommodate multi-language OCR, search indexing, and entity extraction tools.
- DocumentCloud always looks for new ways to present documents and engage readers. We are extending DocumentCloud’s document viewer and annotation tools so that readers can make their own comments and notes on documents.
DocumentCloud is looking for someone with a combination of the following skills
Things we like and hope you like too!
Literate programming; Extracting libraries from app code; Polyglot programming; Web standards; Journalists; Natural Language Processing
Investigative Reporters & Editors is based in Columbia, Missouri, on the University of Missouri’s campus. DocumentCloud is comfortable operating with a distributed team.
You can email us at email@example.com
Back in August, we announced that we’d be welcoming a new lead developer, but he’s been on the job two weeks already and we managed to forget to say anything like “Welcome aboard!”
Well, better late than never. Continue reading »
DocumentCloud is beyond delighted to announce that we’ve found a long-term home for our project. We’re merging our operation with Investigative Reporters and Editors, a nonprofit grassroots organization committed to fostering excellence in investigative journalism. This transition means that DocumentCloud will have a permanent place in a longstanding resource for investigative reporting. IRE has a long and established history of supporting investigative reporting, and we’ll be a proud part of their ongoing work to provide journalists with tools that support their reporting. It goes without saying that DocumentCloud is a natural fit for an organization that has been upholding high professional standards and instilling a passion for public service journalism for more than 35 years.
IRE will continue to honor all of the promises we have made to our users, and our staff will be working to ensure a smooth transition. The best way to get your questions answered will still be reaching out to firstname.lastname@example.org or contacting us through the workspace. We’re still welcoming new users — contact us to find out more about bringing your newsroom on board.
We’ve even got some great new tools in the works. More on that soon.
All of us are committed to the continuing success of DocumentCloud. Over the next few months, we’ll be handing off day to day responsibility for managing DocumentCloud to IRE’s staff based at the University of Missouri in Columbia, Mo. I’ll stay on as program director through the summer to facilitate a smooth transition. Developer Sam Clay is moving to San Francisco to join a startup there. Our lead developer, Jeremy Ashkenas, has moved to the New York Times’s Interactive News team, but will remain actively involved with DocumentCloud on the technical side. Our founders will be here to help DocumentCloud continue to thrive — Scott Klein, Aron Pilhofer and Eric Umansky will remain on the project as advisors and advocates.
We’re already interviewing strong candidates to take over as lead developer, but will be looking for more developers, too. More on that soon as well.
DocumentCloud was first envisioned by a team of editors at ProPublica and The New York Times, and was founded in 2009 through a grant from the John S. and James L. Knight Foundation to build an online catalog of primary source documents and a set of tools to help journalists get more out of source documents. We are all immensely grateful to Knight for their confidence in us. We think their investment paid off. Not only do newsrooms have a new resource that is already indispensable, but DocumentCloud helped demonstrate that 21st century newsrooms are ready to collaborate and share what were once privately held materials. The public is better informed because of it.
Since we launched in March of 2010, newsrooms and watchdog organizations have used DocumentCloud to analyze, annotate, and publish thousands of documents ranging from suspicious, if not outright spurious, expense reports filed by local authorities in Long Island, New York to hundreds of pages of correspondence released by the Financial Crisis Inquiry Commission, and much, much more. How much more? We encourage you to search our public catalog and see for yourself.
Our third hire! Developer Samuel Clay joins DocumentCloud today, bringing our full time staff to a total of three.
Samuel joins us from Storybird, a collaborative storytelling startup which works with artists to give children access to high quality narrative art that they can use to publish their own original stories. He’s also the mastermind behind NewsBlur, an open source feed reader that uses artificial intelligence to suggest stories you might want to read. Think of it as an RSS reader with intelligence.
Samuel lives in Brooklyn with his dog and guinea pigs, where he photographs historic districts for New York Field Guide. Find him at email@example.com or on twitter.
update: we have what we need for now, thanks.
We’re building a research tool for reporters, a semantic search engine, an index of primary source documents with our grant from the Knight Foundation. DocumentCloud will be free and open source software.
We also need an expert-level PostgreSQL consultant to sit down with us and review and refine our architecture plans. We’re looking someone with plenty of experience working with sharded Postgres installations, someone skilled at tuning Postgres for full text searches over very large datasets (potentially approaching hundreds of thousands of documents) and well versed in best practices for deploying Postgres on EC2.
Here at Document Cloud we’ve finally hired ourselves a Program Director to keep Jeremy, our lead developer, company. Someone to manage our impressive and growing list of document partners and help them get the most out of Document Cloud. Someone to develop some training materials and help our beta testers get started beta testing. For her first challenge, we asked her to write a blog post in the third person.
Amanda Hickman joins us from Gotham Gazette where, as the Director of Technology, she managed development of a series of games about public policy issues, built a pretty cool database of candidates for local office and shared an ONA award for General Excellence with her colleagues there. Prior to joining Gotham Gazette, she worked as a Circuit Rider, providing technology assistance and training to low-income grassroots groups in the U.S. working on anti-poverty issues and as a consultant to foundations looking for ways to support their grantees’ use of technology in organizing work. She taught an undergraduate course at NYU’s Gallatin School on using the Internet as an organizing tool. An active local organizer, she’s got her hands in a few community composting and gardening projects, too. If you ever tire of hearing about semantic analysis of primary source documents, try asking her about the dwarf crab apple trees at Greene Acres or what she does with 1300 lbs of compost every week.
She’ll be back here answering all your questions just as soon as she can manage.
We’re excited to announce that Jeremy Ashkenas has joined the team as the lead developer for DocumentCloud. His previous job was at Zenbe Inc., a provider of online email and collaboration software. He’s the creator of the Ruby-Processing visualization toolkit, and a winner — twice — of the Sunlight Foundation’s Apps for America competition. Jeremy graduated from Brown University with a degree in Literary Systems.
Over the past few weeks, he’s been working on the central processing system for a DocumentCloud prototype. We are planning to open source this tool shortly … so stay tuned.