Nutch学习笔记之四：部署搜索服务(Tomcat)

crawl完成后，就可以部署到tomcat，提供搜索引擎服务了。步骤如下：

1. 安装WAR文件
   将WAR文件$nutch$/nutch-*.war拷贝到目录$tomcat$/webapps/,
   cp $nutch$/nutch-*.war $tomcat$/webapps/nutch.war
   这样就可以通过URL: http://127.0.0.1:8080/nutch 来打开搜索主页面

   如果是保存为ROOT.war, 对应的URL为http://127.0.0.1:8080
   cp $nutch$/nutch-*.war $tomcat$/webapps/ROOT.war

2. 指定搜索数据目录
   需要为搜索服务程序指定数据文件的位置。
   假设WAR文件保存为nutch.war，重启动Tomcat，解压缩成目录$tomcat$/webapps/nutch/。
   打开文件$tomcat$/webapps/nutch/WEB-INF/classes/nutch-site.xml，添加searcher.dir
   属性，例如数据文件保存在/local/nutch/crawl目录中，则添加：

   <property>
      <name>searcher.dir</name>
      <value>/local/nutch/crawl</value>
   </property>

   这样search.jsp就知道数据文件的在哪里了。

3. 让Tomcat支持中文输入
   如果要用中文词汇做为关键词来搜索，Tomcat必须要支持中文输入。为此必须修改tomcat的
   配置文件$tomcat$/conf/server.xml, 在端口8080上的Connector中加入两个属性URIEncoding
   和useBodyEncodingForURI。代码如下：

4. 如果要搜索大型网站，例如网络门户，还需要修改一些配置，因为缺省配置是搜索intranet的。
修改db.max.outlinks.per.page，它定义一个网页的最大link数，超过此数的链接都要被忽略掉。缺省是100，改为1000足够了。

<property>
<name>db.max.outlinks.per.page</name>
<value>1000</value>
<description>The maximum number of outlinks that we'll process for a page.
If this value is nonnegative (>=0), at most db.max.outlinks.per.page outlinks
will be processed for a page; otherwise, all outlinks will be processed.
</description>
</property>

修改urlfilter.order，指定URL过滤器的顺序。作者比较喜欢用正则表达式，所以设置为org.apache.nutch.urlfilter.regex.RegexURLFilter。

<property>
<name>urlfilter.order</name>
<value>org.apache.nutch.urlfilter.regex.RegexURLFilter</value>
<description>The order by which url filters are applied.
If empty, all available url filters (as dictated by properties
plugin-includes and plugin-excludes above) are loaded and applied in system
defined order. If not empty, only named filters are loaded and applied
in given order. For example, if this property has value:
org.apache.nutch.urlfilter.regex.RegexURLFilter org.apache.nutch.urlfilter.prefix.PrefixURLFilter
then RegexURLFilter is applied first, and PrefixURLFilter second.
Since all filters are AND'ed, filter ordering does not have impact
on end result, but it may have performance implication, depending
on relative expensiveness of filters.
</description>
</property>

5. 再次重启Tomcat
用浏览器打开URL: "http://127.0.0.1:8080/nutch", 大功告成，现在开始enjoy nutch。

posted on 2007-10-04 23:01 专心练剑阅读(3657) 评论(14) 编辑收藏引用所属分类: 搜索引擎

# re: Nutch学习笔记之四：部署搜索服务(Tomcat) 2010-04-16 07:44 dissertation

You seem to be so cool and your knowledge about this good post supposes to be good enough. Should you continue your investigation? I should purchase the thesis samples and dissertation from you. 回复更多评论

# re: Nutch学习笔记之四：部署搜索服务(Tomcat) 2010-04-16 13:14 resume service

Some people transpire the responsibility to professional resume writers because they don't have the talent to write a respectable resume in order that the argument why people
need to
resume writers, but such people like writer don't do that. Thanks for the information. Really useful article about this post. 回复更多评论

# re: Nutch学习笔记之四：部署搜索服务(Tomcat) 2010-06-15 02:13 buy research papers

There are many drafts available for gathering an education these days,you can buy term paper or buy research paper which is intereating news for those who have not yet directed. verily, essays writing is not an easy production so try make right choice between making on yor own or to buy essays about this topic. Maybe you need help with unique audit , maybe you want help in producing a fresh intention on a topic that is vast and complex. 回复更多评论

# re: Nutch学习笔记之四：部署搜索服务(Tomcat) 2010-07-30 21:57 ANTHONY27Lynn

Have bad Internet traffic and are willing to make it better? Simply look for the <a href="http://4submission.com/bookmarks.htm">social bookmarking submission services</a>, just because it really helps. 回复更多评论

# re: Nutch学习笔记之四：部署搜索服务(Tomcat) 2010-10-09 09:44 nutch tutorial

A good tutorial for sharing 回复更多评论

# re: Nutch学习笔记之四：部署搜索服务(Tomcat) 2011-10-02 21:03 credit loans

Some specialists say that mortgage loans aid a lot of people to live their own way, because they can feel free to buy necessary goods. Furthermore, banks present bank loan for all people. 回复更多评论

刷新评论列表

只有注册用户登录后才能发表评论。

搜索引擎

常用链接

留言簿(4)

文章分类(26)

AI

opensource

Vertical Search

面经

搜索

最新评论

评论排行榜

评论