2021-05-12 14:32:11
Nutch的紀錄檔系統
一、Nutch紀錄檔實現方式
1、Nutch使用slf4j作為紀錄檔介面,使用log4j作為具體實現。關於二者的基礎,請參考
http://www.linuxidc.com/Linux/2015-03/114637.htm
2、在java類檔案中,通過以下方式輸出紀錄檔訊息:
(1)獲取Logger物件
public static final Logger LOG = LoggerFactory.getLogger(InjectorJob.class);
(2)使用Logger進行輸出
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
long start = System.currentTimeMillis();
LOG.info("InjectorJob: starting at " + sdf.format(start));
3、在log4j.properties中定義各個屬性
# Define some default values that can be overridden by system properties
Hadoop.log.dir=.
hadoop.log.file=hadoop.log
# RootLogger - DailyRollingFileAppender
log4j.rootLogger=INFO,DRFA
# Logging Threshold
log4j.threshold=ALL
#special logging requirements for some commandline tools
log4j.logger.org.apache.nutch.crawl.Crawl=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.InjectorJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.host.HostInjectorJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.GeneratorJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.DbUpdaterJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.host.HostDbUpdateJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.fetcher.FetcherJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.parse.ParserJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.IndexingJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.DeleteDuplicates=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.CleaningJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.WebTableReader=INFO,cmdstdout
log4j.logger.org.apache.nutch.host.HostDbReader=INFO,cmdstdout
log4j.logger.org.apache.nutch.parse.ParserChecker=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.IndexingFiltersChecker=INFO,cmdstdout
log4j.logger.org.apache.nutch.plugin.PluginRepository=WARN
log4j.logger.org.apache.nutch.api.NutchServer=INFO,cmdstdout
log4j.logger.org.apache.nutch=INFO
log4j.logger.org.apache.hadoop=WARN
log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.gora=WARN
#
# Daily Rolling File Appender
#
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
# Debugging Pattern format: Date LogLevel LoggerName (FileName:MethodName:LineNo) LogMessage
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# stdout
# Add *stdout* to rootlogger above if you want to use this
#
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# plain layout used for commandline tools to output to console
#
log4j.appender.cmdstdout=org.apache.log4j.ConsoleAppender
log4j.appender.cmdstdout.layout=org.apache.log4j.PatternLayout
log4j.appender.cmdstdout.layout.ConversionPattern=%m%n
#
# Rolling File Appender
#
#log4j.appender.RFA=org.apache.log4j.RollingFileAppender
#log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
# Logfile size and and 30-day backups
#log4j.appender.RFA.MaxFileSize=1MB
#log4j.appender.RFA.MaxBackupIndex=30
#log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
二、Nutch紀錄檔分析
1、nutch紀錄檔輸出有2個appender: cmdstdout 與 DRFA。
前者將紀錄檔輸出至標準輸出中,後者將檔案輸出到每日一個的紀錄檔檔案中。
2、整個工程的預設紀錄檔設定為INFO, DRFA
而nutch自身的紀錄檔被重定義為INFO,cmdstdout
hadoop, gora, zookeeper等則重定義為WARN,DRFA, 預設紀錄檔為./hadoop.log
Nutch2.0完全分散式部署設定 http://www.linuxidc.com/Linux/2012-10/71977.htm
Nutch-2.0叢集設定 http://www.linuxidc.com/Linux/2012-10/71976.htm
相關文章