<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Distributed Web Crawling with Tornado and Gearman</title>
	<atom:link href="http://www.iacquire.com/blog/distributed-web-crawling-with-tornado-and-gearman/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.iacquire.com/blog/distributed-web-crawling-with-tornado-and-gearman/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=distributed-web-crawling-with-tornado-and-gearman</link>
	<description>Search is Our CRAFT</description>
	<lastBuildDate>Thu, 23 May 2013 10:21:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: Jon Hines</title>
		<link>http://www.iacquire.com/blog/distributed-web-crawling-with-tornado-and-gearman/#comment-514</link>
		<dc:creator>Jon Hines</dc:creator>
		<pubDate>Tue, 15 May 2012 20:37:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.iacquire.com/?p=411#comment-514</guid>
		<description><![CDATA[thanks for the response Jeff!]]></description>
		<content:encoded><![CDATA[<p>thanks for the response Jeff!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Nappi</title>
		<link>http://www.iacquire.com/blog/distributed-web-crawling-with-tornado-and-gearman/#comment-513</link>
		<dc:creator>Jeff Nappi</dc:creator>
		<pubDate>Sun, 13 May 2012 19:08:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.iacquire.com/?p=411#comment-513</guid>
		<description><![CDATA[Hi Jon, sorry for the delayed reply. I plan on providing an easy to use Amazon Machine Image with a web based interface to this in the future, but for now yes you only need a single Ubuntu instance to run this. One way to do this is to sign up for Amazon Web Services and start by launching this AMI: https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-a29943cb

Once you have everything set up by following the directions in the post, you would supply a list of URLs in a text file (one on each line) and pass the name of the text file to the TweetHandler.py script with the --url_file= parameter.

Feel free to contact me via e-mail if you have more questions - jeff _a_ iacquire.com]]></description>
		<content:encoded><![CDATA[<p>Hi Jon, sorry for the delayed reply. I plan on providing an easy to use Amazon Machine Image with a web based interface to this in the future, but for now yes you only need a single Ubuntu instance to run this. One way to do this is to sign up for Amazon Web Services and start by launching this AMI: https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-a29943cb</p>
<p>Once you have everything set up by following the directions in the post, you would supply a list of URLs in a text file (one on each line) and pass the name of the text file to the TweetHandler.py script with the &#8211;url_file= parameter.</p>
<p>Feel free to contact me via e-mail if you have more questions &#8211; jeff _a_ iacquire.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Hines</title>
		<link>http://www.iacquire.com/blog/distributed-web-crawling-with-tornado-and-gearman/#comment-512</link>
		<dc:creator>Jon Hines</dc:creator>
		<pubDate>Thu, 10 May 2012 14:36:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.iacquire.com/?p=411#comment-512</guid>
		<description><![CDATA[so if i&#039;m just using this on just 1 terminal, the setup above will be suffice? so if i have a list of url&#039;s, how do I input this exactly?]]></description>
		<content:encoded><![CDATA[<p>so if i&#8217;m just using this on just 1 terminal, the setup above will be suffice? so if i have a list of url&#8217;s, how do I input this exactly?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Nappi</title>
		<link>http://www.iacquire.com/blog/distributed-web-crawling-with-tornado-and-gearman/#comment-501</link>
		<dc:creator>Jeff Nappi</dc:creator>
		<pubDate>Wed, 02 May 2012 19:12:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.iacquire.com/?p=411#comment-501</guid>
		<description><![CDATA[Something quite important I didn&#039;t address in the post - the tests shown here are in fact only running on a single node. In order to run it in a distributed manner one would just launch TweetScout workers on additional nodes with the --jobserver=master-server-ip:4730 parameter. I will be making a follow-up post about this in the coming weeks.]]></description>
		<content:encoded><![CDATA[<p>Something quite important I didn&#8217;t address in the post &#8211; the tests shown here are in fact only running on a single node. In order to run it in a distributed manner one would just launch TweetScout workers on additional nodes with the &#8211;jobserver=master-server-ip:4730 parameter. I will be making a follow-up post about this in the coming weeks.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
