Latest Updates: Our Blog

When Documents Are Challenged:…

Posted
Apr 5th, 2012

Tags
Twitter

Author
documentcloud

When Documents Are Challenged: http://t.co/fKTTxFyy

When Documents Are Challenged

Posted
Apr 5th, 2012

Tags
Documents,IdeaLab

Author
Mark Horvit

Last week, DocumentCloud received a complaint seeking the removal of a collection of emails posted by journalists with the Australian Financial Review. The emails involved a company called NDS, which hired a law firm to try and have the documents pulled from public view. This kind of thing is rare, but it happens. This case in particular has a couple of wrinkles that make it unusual, and it presents a good opportunity to remind all of our members that DocumentCloud has policies and options in place that allow you to keep all documents processed through our service available to the public for as long as you desire.

I’ll detail those below. But first, a little context.

DocumentCloud was created …

Back up and serving fresh docu…

Posted
Mar 31st, 2012

Tags
Twitter

Author
documentcloud

Back up and serving fresh documents. /th

Hi folks, we’re in the midst o…

Posted
Mar 31st, 2012

Tags
Twitter

Author
documentcloud

Hi folks, we’re in the midst of an interruption. More news shortly /th

Thanks to a suggestion from Ri…

Posted
Mar 26th, 2012

Tags
Twitter

Author
documentcloud

Thanks to a suggestion from Richard Stallman, DocumentCloud embed codes now include “noscript” links to the original PDF and plain text. /ja

The site is back to normal, bu…

Posted
Mar 22nd, 2012

Tags
Twitter

Author
documentcloud

The site is back to normal, but we are experiencing heavy load at the moment. Thanks for bearing with us /th

We’re experiencing some troubl…

Posted
Mar 22nd, 2012

Tags
Twitter

Author
documentcloud

We’re experiencing some trouble at the moment. More news soon /th

Just launched a new “mini” lay…

Posted
Mar 14th, 2012

Tags
Twitter

Author
documentcloud

Just launched a new “mini” layout for the document viewer. Kicks in when < 500 pixels are available. Side-by-side: http://t.co/wPY7SZt3 /ja

Per requests, DocumentCloud no…

Posted
Feb 14th, 2012

Tags
Twitter

Author
documentcloud

Per requests, DocumentCloud now redacts in either black or red. /ja

Backbone.js 0.9.0 is out. Chan…

Posted
Jan 30th, 2012

Tags
Twitter

Author
documentcloud

Backbone.js 0.9.0 is out. Changes: http://t.co/FxtTyxpr Upgrading: http://t.co/nWGA3bCN We’re running it on DocumentCloud.org now… /ja

Good morning/afternoon! We’d l…

Posted
Dec 16th, 2011

Tags
Twitter

Author
documentcloud

Good morning/afternoon! We’d like to let you know that we’re planning a period of downtime tonight around midnight EST.

We’ve fixed some difficulties …

Posted
Dec 14th, 2011

Tags
Twitter

Author
documentcloud

We’ve fixed some difficulties with uploading non-pdf documents. Please give it a shot! /th

Document Uploading is back up …

Posted
Nov 23rd, 2011

Tags
Twitter

Author
documentcloud

Document Uploading is back up and humming along as usual. Thanks for bearing with us /th

Hi all, doing some work on our…

Posted
Nov 23rd, 2011

Tags
Twitter

Author
documentcloud

Hi all, doing some work on our document processing servers, uploads may be slow for a few minutes. /th

VisualSearch.js v0.2.1 release…

Posted
Nov 15th, 2011

Tags
Twitter

Author
documentcloud

VisualSearch.js v0.2.1 released: http://t.co/Sq5oj06I. New feature: preserve the order of your facets.

DocumentCloud gets Entity Char…

Posted
Oct 27th, 2011

Tags
Twitter

Author
documentcloud

DocumentCloud gets Entity Charts: http://t.co/cIvNMkeQ Give them a try on your documents. /ja

New Feature: Entity Charts

Posted
Oct 27th, 2011

Tags
Workspace

Author
Jeremy Ashkenas

Whenever you upload a document to DocumentCloud we send the contents to OpenCalais, a service that discovers the entities (people, places, organizations, terms, etc.) that are present in plain text. OpenCalais can tell us that “Barack Obama” is the same person as “President Obama”, “Senator Obama”, “Mr. President” … and even “he” or “his” in clauses like “his policy proposals”.

Last month, we stopped indexing entities for faceting because DocumentCloud has reached the point where our search index can no longer support the strain of keeping track of the millions of unique entities stored in our database. We still hope to bring back some form of entity faceting — a feature you may remember as the “Entities” tab — using a different implementation in the future. But for the time being, we have added a new feature that allows you to easily browse through all of the entities associated with a document:

The entities are displayed in a chart that shows how often each entity occurs across each page. Using this chart, you can see which companies and individuals tend to be mentioned together frequently, or which parts of a long document concern a certain topic. Hover over any mention (the small gray boxes) to see the surrounding context, and click on it to jump directly to that mention within the document itself.

If you want to try out an example, here is a link to a recent document that ran with a disability fraud story in today’s New York Times. Right-click on the document and choose View Entities from the context menu, or select the document and choose View Entities from the Analyze menu.

We’re still polishing these charts, so let us know if you have any ideas for improving them, or ideas for other ways that we can make extracted entities more useful for your reporting.

.@amandabee make that: If you …

Posted
Oct 22nd, 2011

Tags
Twitter

Author
documentcloud

.@amandabee make that: If you are trying to make @documentcloud handle data you’ll want to watch out for @pandaproject! /th

You at SEJ in Miami? So’s @doc…

Posted
Oct 22nd, 2011

Tags
Twitter

Author
documentcloud

You at SEJ in Miami? So’s @documentcloud! If you can’t make our Saturday panel, @amandabee will be around for the afternoon. /abh

If your CMS (Ahem, WordPress) …

Posted
Oct 20th, 2011

Tags
Twitter

Author
documentcloud

If your CMS (Ahem, WordPress) gives you trouble pasting in document viewer embed codes, we’ve just added a “remove line breaks” link. /ja

RT @dansinker: Congrats to @pr…

Posted
Oct 4th, 2011

Tags
Twitter

Author
documentcloud

RT @dansinker: Congrats to @propubnerds for a successful launch of DocDiver. Looks *amazing* http://t.co/cQKnmoO6 /abh

DocumentCloud now supports com…

Posted
Oct 3rd, 2011

Tags
Twitter

Author
documentcloud

DocumentCloud now supports combined metadata searches: `citizen: Guatemala citizen: Mexico` would find documents for both countries. /ja

Update on Searching and Entities

Posted
Sep 28th, 2011

Tags
Workspace

Author
Ted Han

Users who tried to search for pretty much anything on DocumentCloud this morning noticed pretty quickly that there was something not quite right on our servers. The short story is this: the problem was caused by human error and our servers are in the process of rebuilding the index that failed.

The longer story, for those of you who’ve been have been tracking updates about our search outage, is this: Continue reading »

Our search index should now be…

Posted
Sep 28th, 2011

Tags
Twitter

Author
documentcloud

Our search index should now be fully recovered for all documents. If you have any trouble searching this evening, please let us know. /ja

Search within document viewers…

Posted
Sep 28th, 2011

Tags
Twitter

Author
documentcloud

Search within document viewers should now be restored for all documents. /ja