首頁 > 軟體

Nutch的紀錄檔系統

2020-06-16 18:05:13

一、Nutch紀錄檔實現方式

1、Nutch使用slf4j作為紀錄檔介面,使用log4j作為具體實現。關於二者的基礎,請參考

http://www.linuxidc.com/Linux/2015-03/114637.htm

2、在java類檔案中,通過以下方式輸出紀錄檔訊息:

(1)獲取Logger物件

  public static final Logger LOG = LoggerFactory.getLogger(InjectorJob.class);

(2)使用Logger進行輸出

    SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    long start = System.currentTimeMillis();
    LOG.info("InjectorJob: starting at " + sdf.format(start));

3、在log4j.properties中定義各個屬性

# Define some default values that can be overridden by system properties
Hadoop.log.dir=.
hadoop.log.file=hadoop.log

# RootLogger - DailyRollingFileAppender
log4j.rootLogger=INFO,DRFA

# Logging Threshold
log4j.threshold=ALL

#special logging requirements for some commandline tools
log4j.logger.org.apache.nutch.crawl.Crawl=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.InjectorJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.host.HostInjectorJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.GeneratorJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.DbUpdaterJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.host.HostDbUpdateJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.fetcher.FetcherJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.parse.ParserJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.IndexingJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.DeleteDuplicates=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.CleaningJob=INFO,cmdstdout
log4j.logger.org.apache.nutch.crawl.WebTableReader=INFO,cmdstdout
log4j.logger.org.apache.nutch.host.HostDbReader=INFO,cmdstdout
log4j.logger.org.apache.nutch.parse.ParserChecker=INFO,cmdstdout
log4j.logger.org.apache.nutch.indexer.IndexingFiltersChecker=INFO,cmdstdout
log4j.logger.org.apache.nutch.plugin.PluginRepository=WARN
log4j.logger.org.apache.nutch.api.NutchServer=INFO,cmdstdout

log4j.logger.org.apache.nutch=INFO
log4j.logger.org.apache.hadoop=WARN
log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.gora=WARN

#
# Daily Rolling File Appender
#

log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}

# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd

# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout

# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
# Debugging Pattern format: Date LogLevel LoggerName (FileName:MethodName:LineNo) LogMessage
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n


#
# stdout
# Add *stdout* to rootlogger above if you want to use this
#

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

#
# plain layout used for commandline tools to output to console
#
log4j.appender.cmdstdout=org.apache.log4j.ConsoleAppender
log4j.appender.cmdstdout.layout=org.apache.log4j.PatternLayout
log4j.appender.cmdstdout.layout.ConversionPattern=%m%n

#
# Rolling File Appender
#

#log4j.appender.RFA=org.apache.log4j.RollingFileAppender
#log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}

# Logfile size and and 30-day backups
#log4j.appender.RFA.MaxFileSize=1MB
#log4j.appender.RFA.MaxBackupIndex=30

#log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

二、Nutch紀錄檔分析

1、nutch紀錄檔輸出有2個appender: cmdstdout 與 DRFA。

前者將紀錄檔輸出至標準輸出中,後者將檔案輸出到每日一個的紀錄檔檔案中。

2、整個工程的預設紀錄檔設定為INFO, DRFA

而nutch自身的紀錄檔被重定義為INFO,cmdstdout

hadoop, gora, zookeeper等則重定義為WARN,DRFA, 預設紀錄檔為./hadoop.log

Nutch2.0完全分散式部署設定 http://www.linuxidc.com/Linux/2012-10/71977.htm

Nutch-2.0叢集設定 http://www.linuxidc.com/Linux/2012-10/71976.htm


IT145.com E-mail:sddin#qq.com