<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DocumentCloud</title>
	<atom:link href="http://blog.documentcloud.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.documentcloud.org</link>
	<description>We turn documents into data.</description>
	<lastBuildDate>Fri, 27 Aug 2010 20:16:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Excerpts for your Entities</title>
		<link>http://blog.documentcloud.org/blog/2010/08/entities-and-excerpts/</link>
		<comments>http://blog.documentcloud.org/blog/2010/08/entities-and-excerpts/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 20:16:02 +0000</pubDate>
		<dc:creator>Jeremy Ashkenas</dc:creator>
				<category><![CDATA[Workspace]]></category>

		<guid isPermaLink="false">http://blog.documentcloud.org/?p=470</guid>
		<description><![CDATA[We added excerpts to our timelines and entities tab today. We are using OpenCalais to parse every document you upload and extract the names of people, organizations, terms and places in the text. We display these under the Entities tab. We also extract date information from each document that you upload and plot those on [...]]]></description>
			<content:encoded><![CDATA[<p>We added excerpts to our timelines and entities tab today. We are using <a href="http://www.opencalais.com">OpenCalais</a> to parse every document you upload and extract the names of people, organizations, terms and places in the text. We display these under the Entities tab. We also extract date information from each document that you upload and plot those on a timeline which you can access from the Analyze menu. </p>
<p>DocumentCloud&#8217;s entities show you, at a glance, the people who are mentioned the most times in a given project, or the organizations that are named in each of the documents you&#8217;ve selected. Here&#8217;s how it works now:</p>
<p><img src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Screen-shot-2010-08-27-at-12.53.05-PM.png" alt="" title="" width="308" height="186" class="aligncenter size-full wp-image-471" style="border: 1px solid #ccc; margin: 20px auto;" /></p>
<p>Click on the &#8220;show pages&#8221; link next to any entity to reveal a thumbnail of each page in each document that contains that term, alongside excerpts highlighting the mention of the entity in the text. Clicking on the highlighted phrase will take you directly to term within the document itself. In the screenshot below, you can see how the Environmental Protection Agency was correctly identified by both its proper name and its acronym.</p>
<p><img src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Screen-shot-2010-08-27-at-12.52.52-PM.png" alt="" title="" width="604" height="556" class="aligncenter size-full wp-image-472" style="margin: 20px auto;" /></p>
<p>We&#8217;ve added excerpts to the timeline as well. When you open a timeline from the Analyze menu and scroll over any date, you&#8217;ll see a few words along with the date as it appears in the document&#8211;useful for corroborating a single event across multiple sources or for comparing different accounts of what should be a shared timeline. Click on a date to go straight to the point in the document where that date appears.</p>
<p><img src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Screen-shot-2010-08-27-at-12.59.28-PM.png" alt="" title="" width="700" height="341" class="aligncenter size-full wp-image-474" style="margin: 20px auto;" /></p>
<p>Hopefully excerpts will come in handy for your DocumentCloud projects. If you think of a way we can make them even more useful, comment or <a href="http://www.documentcloud.org/contact">let us know</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.documentcloud.org/blog/2010/08/entities-and-excerpts/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Uploading Documents Gets a Little Easier</title>
		<link>http://blog.documentcloud.org/blog/2010/08/upload-multiple-documents/</link>
		<comments>http://blog.documentcloud.org/blog/2010/08/upload-multiple-documents/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 13:22:26 +0000</pubDate>
		<dc:creator>Amanda Hickman</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Workspace]]></category>

		<guid isPermaLink="false">http://blog.documentcloud.org/?p=418</guid>
		<description><![CDATA[You&#8217;ve always been able to script batch uploads using our API, but for users without coding skills, uploads were one at a time. Today we&#8217;re rolling out an improved document uploading dialog that will let you upload as many documents as you want, all in one fell swoop. You&#8217;ll still use the &#8220;New Documents&#8221; button, [...]]]></description>
			<content:encoded><![CDATA[<p>You&#8217;ve always been able to script batch uploads using our API, but for users without coding skills, uploads were one at a time. Today we&#8217;re rolling out an improved document uploading dialog that will let you upload as many documents as you want, all in one fell swoop. </p>
<p>You&#8217;ll still use the &#8220;New Documents&#8221; button, but now that button takes you straight to a file selection dialog.<br />
Use the control  (on MS Windows) or command (on Macs) key to select additional documents, just like you would in your file browser.</p>
<p><img src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Screen-shot-2010-08-17-at-9.40.49-AM.png" alt="File selection screenshot" title="" width="717" height="480" class="aligncenter size-full wp-image-437" /></p>
<p>We&#8217;ll start you off by suggesting a title, based on each file&#8217;s name, but you can edit that name and add additional information, including the source of each document and a description. As with the old upload dialog, you can decide right when you upload your document whether or not you&#8217;re ready to share it with the world yet. As ever, you can edit all of these fields again later. </p>
<p><img src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Screen-shot-2010-08-18-at-9.03.07-AM.png" alt="" title="" width="603" height="512" class="aligncenter size-full wp-image-461" /></p>
<p>If your documents share a common source or description, use the &#8220;Apply to All Files&#8221; link to copy your metadata to each document in this batch. Note: this new upload interface requires Flash. If that&#8217;s an issue for you, <a href="http://www.documentcloud.org/contact">let us know ASAP</a> and we&#8217;ll whip up an alternate interface that doesn&#8217;t require any plugins. Promise.</p>
<p>As the files upload from your computer to DocumentCloud, you&#8217;ll see the progress of each transfer.</p>
<p><b>Better Processing, Too</b><br />
This week&#8217;s release is more than just a new upload dialog. We&#8217;ve made some big changes to <a href="http://documentcloud.github.com/docsplit/">Docsplit</a> and the <a href="http://github.com/documentcloud/right_aws">RightAWS</a> gem, and we&#8217;re hoping this means the dreaded &#8220;import failed&#8221; error will be a thing of the past. If looking under the hood is your thing, both Docsplit and our fork of RightAWS are on on <a href="http://www.documentcloud.org/opensource">github</a> for your viewing (and reusing) pleasure.</p>
<p><b>Don&#8217;t be a Stranger</b><br />
If you have gigabytes worth of documents to upload, get in touch before you start uploading so we can add more horsepower to handle your job. Otherwise, happy uploading!  And don&#8217;t forget to tell us about what you&#8217;re <a href="http://www.documentcloud.org/featured">publishing with DocumentCloud</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.documentcloud.org/blog/2010/08/upload-multiple-documents/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Related Documents</title>
		<link>http://blog.documentcloud.org/blog/2010/08/related-documents/</link>
		<comments>http://blog.documentcloud.org/blog/2010/08/related-documents/#comments</comments>
		<pubDate>Mon, 02 Aug 2010 15:56:48 +0000</pubDate>
		<dc:creator>Samuel Clay</dc:creator>
				<category><![CDATA[Workspace]]></category>

		<guid isPermaLink="false">http://blog.documentcloud.org/?p=394</guid>
		<description><![CDATA[If you log in to DocumentCloud this morning, you&#8217;ll notice a new menu, entitled &#8220;Analyze&#8221;. We&#8217;ve gathered the various analytic tools under one roof here &#8212; to view the entities for selected documents, or display a timeline &#8212; and added a major new one: Related Documents. If you&#8217;re working on a story, and you just [...]]]></description>
			<content:encoded><![CDATA[<p>If you log in to DocumentCloud this morning, you&#8217;ll notice a new menu, entitled &#8220;Analyze&#8221;. We&#8217;ve gathered the various analytic tools under one roof here &#8212; to view the entities for selected documents, or display a timeline &#8212; and added a major new one: Related Documents. If you&#8217;re working on a story, and you just uploaded a material document, you can use that document as a jumping-off point to find other public documents about the same subject. Select a document, open the &#8220;Analyze&#8221; menu, and click &#8220;Find Related Documents&#8221;:</p>
<p><img class="aligncenter size-full wp-image-398" src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Finding-Related-Documents.png" alt="Finding Related Documents" width="698" height="171" /></p>
<p><a href="http://blog.documentcloud.org/wp-content/uploads/2010/08/Finding-Related-Documents.png"></a>Related documents can span all documents visible to your account, including documents that other organizations have made public.</p>
<p>Under the hood, we are using a technique known as <a href="http://en.wikipedia.org/wiki/Tf–idf">tf/idf</a>, which compares document similarity by looking at the &#8220;important&#8221; words across a set of documents. The importance of each word is evaluated by weighing the frequency of use of each word in a particular, divided by the frequency of the word in the collection of documents as a whole. In this manner, commonly used words drop out of the index, and distinctive words obtain greater importance. The search engine we use for DocumentCloud, <a href="http://lucene.apache.org/">Lucene</a>, has this type of search built-in.  </p>
<p><img src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Screen-shot-2010-08-02-at-11.34.50-AM.png" alt="" title="" width="662" height="593" class="aligncenter size-full wp-image-407" /></p>
<p>We&#8217;re still at work on improving this feature. At the moment, you&#8217;ll notice two things: there is a long tail of barely-related documents that follows the first page of results, and shorter documents (1-3 pages) may find no related documents whatsoever. But for most documents with high-quality text, you&#8217;ll find that the related documents at the top are very relevant.</p>
<p>There&#8217;s one more thing that we released at the same time: a panel that allows you to edit all the information that describes your documents (title, source, description, access level) at a stroke. To use it, click on the pencil icon that now appears next to any document. Naturally, you can also select multiple documents and edit all of their attributes simultaneously.</p>
<p><img src="http://blog.documentcloud.org/wp-content/uploads/2010/08/Screen-shot-2010-08-02-at-11.48.33-AM.png" alt="" title="" width="734" height="113" class="aligncenter size-full wp-image-411" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.documentcloud.org/blog/2010/08/related-documents/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
