####################################################################### # Apache::ParseLog # # Module to parse and process the Apache log files # # See the manpage for the usage # # Written by Akira Hangai # For comments, suggestions, opinions, etc., # email to: akira@discover-net.net # # Copyright 1998 by Akira Hangai. All rights reserved. # This program is free software; You can redistribute it and/or # modify it under the same terms as Perl itself. ####################################################################### package Apache::ParseLog; require 5.005; use Carp; use vars qw($VERSION); $VERSION = "1.02"; require Exporter; @ISA = qw(Exporter); @EXPORT = qw(countryByCode statusByCode sortHashByValue); @EXPORT_OK = qw(countryByCode statusByCode sortHashByValue); =head1 NAME Apache::ParseLog - Object-oriented Perl extension for parsing Apache log files =head1 SYNOPSIS use Apache::ParseLog; $base = new Apache::ParseLog(); $transferlog = $base->getTransferLog(); %dailytransferredbytes = $transferlog->bytebydate(); ... =head1 DESCRIPTION Apache::ParseLog provides an easy way to parse the Apache log files, using object-oriented constructs. The data obtained using this module are generic enough that it is flexible to use the data for your own applications, such as CGI, simple text-only report generater, feeding RDBMS, data for Perl/Tk-based GUI application, etc. =head1 FEATURES =over 4 =item 1 B Because all of the work (parsing logs, constructing regex, matching and assigning to variables, etc.) is done inside this module, you can easily create log reports (unless your logs need intense scrutiny). Read on this manpage as well as the L<"EXAMPLES"> section to see how easy it is to create log reports with this module. Also, this module does not require C compiler, and it can (should) run on any platforms supported by Perl. =item 2 B The Apache Web Server 1.3.x's new LogForamt/CustomLog feature (with mod_log_config) is supported. The log format specified with Apache's F directive in the F file will be parsed and the regular expressions will be created dynamically inside this module, so re-writing your existing code will be minimal when the log format is changed. =item 3 B Tranditionally, the hit count is calculated based on the number of B requested by visitors (the simplest is the the total number of lines of the log file calculated as the "total hit"). As such, the hit count obviously can be misleading in the sense of "how many visitors have actually visited my site?", especially if the pages of your site contain many images (because each image is counted as one hit). Apache::ParseLog provides the methods to obtain such traditional data, because those data also are very important for monitoring your web site's activities. However, this module also provides the methods to obtain the B, i.e., the actual number of "people" (well, IP or hostname) who visited your site, by date, time, and date and time. See the L<"LOG OBJECT METHODS"> for details about those methods. =item 4 B The new pre-compiled regex feature introduced by Perl 5.005 is used (if you have the version installed on your machine). For the pre-compiled regex and the new quote-like assignment operator (qr), see perlop(1) and perlre(1) manpages. =back =cut ####################################################################### # Local variables local($HOST, $LOGIN, $DATETIME, $REQUEST, $OSTATUS, $LSTATUS, $BYTE, $FILENAME, $ADDR, $PORT, $PROC, $SEC, $URL, $HOSTNAME, $REFERER, $UAGENT); my(%COUNTRY_BY_CODE) = map { chomp; my(@line) = split(/:/); $line[0] => $line[1] } ; my(%STATUS_BY_CODE) = ( 100 => "Continue", 101 => "Switching Protocols", 200 => "OK", 201 => "Created", 202 => "Accepted", 203 => "Non-Authoritative Information", 204 => "No Content", 205 => "Reset Content", 206 => "Partial Content", 300 => "Multiple Choices", 301 => "Moved Permanently", 302 => "Moved Temporarily", 303 => "See Other", 304 => "Not Modified", 305 => "Use Proxy", 400 => "Bad Request", 401 => "Unauthorized", 402 => "Payment Required", 403 => "Forbidden", 404 => "Not Found", 405 => "Method Not Allowed", 406 => "Not Acceptable", 407 => "Proxy Authentication Required", 408 => "Request Time-out", 409 => "Conflict", 410 => "Gone", 411 => "Length Required", 412 => "Precondition Failed", 413 => "Request Entity Too Large", 414 => "Request-URI Too Large", 415 => "Unsupported Media Type", 500 => "Internal Server Error", 501 => "Not Implemented", 502 => "Bad Gateway", 503 => "Service Unavailable", 504 => "Gateway Time-out", 505 => "HTTP Version not supported", ); my(%M2N) = ( 'Jan' => '01', 'Feb' => '02', 'Mar' => '03', 'Apr' => '04', 'May' => '05', 'Jun' => '06', 'Jul' => '07', 'Aug' => '08', 'Sep' => '09', 'Oct' => '10', 'Nov' => '11', 'Dec' => '12', ); ####################################################################### # Constructor ####################################################################### =pod =head1 CONSTRUCTOR To construct an Apache::ParseLog object,B method is available just like other modules. The C constructor returns an Apache::ParseLog base object with which to obtain basic server information as well as to construct B. =cut ####################################################################### # new([$path_to_httpd.conf]); returns ParseLog object =pod =head2 New Method B> With the B> method, an Apache::ParseLog object can be created in three different ways. =over 4 =item 1 C<$base = new Apache::ParseLog();> This first method creates an empty object, which means that the fields of the object are undefined (C); i.e., the object does not know what the server name is, where the log files are, etc. It is useful when you need to parse log files that are B created on the local Apache server (e.g., the log files FTP'd from elsewhere). You have to use the B> method (see below) to call any other methods. =item 2 C<$base = new Apache::ParseLog($httpd_conf);> This is the second way to create an object with necessary information extracted from the I<$httpd_conf>. I<$httpd_conf> is a scalar string containing the absolute path to the F file; e.g., $httpd_conf = "/usr/local/httpd/conf/httpd.conf"; This method tries to extract the information from I<$httpd_conf>, specified by the following Apache directives: F, F, F, F, F, F, F, and any user-defined F along with F. If any of the directives cannot be found or commented out in the I<$httpd_conf>, then the field(s) for that directive(s) will be empty (C), and corresponding methods that use the particular fields return an empty string when called, or error out (for B, refer to the section below). =item 3 C<$base = new Apache::ParseLog($httpd_conf, $virtual_host);> This method creates an object just like the second method, but for the VirtualHost specified by I<$virtual_host> B. The Apache directives and rules not specified within the and tags are parsed from the "regular" server section in the F file. Note that the I<$httpd_conf> B be specified in order to create an object for the I<$virtual_host>. =back =cut sub new { my($package) = shift; my($httpd_conf) = shift; my($virtual_host) = shift; my(%ARG); my($METHOD) = "Apache::ParseLog::new"; if (defined($httpd_conf)) { unless (-e $httpd_conf) { croak "$METHOD: $httpd_conf does not exist. Exiting..."; } else { %ARG = populate($httpd_conf, $virtual_host); } } return config($package, %ARG); } ####################################################################### # BASE OBJECT METHODS ####################################################################### =pod =head1 BASE OBJECT METHODS This section describes the methods available for the base object created by the B> construct described above. Unless the object is created with an empty argument, the Apache::ParseLog module parses the basic information configured in the F file (as passed as the first argument). The object uses the information to construct the B. The available methods are (return values are in parentheses): $base->config([%fields]); # (object) $base->version(); # (scalar) $base->serverroot(); # (scalar) $base->servername(); # (scalar) $base->httpport(); # (scalar) $base->serveradmin(); # (scalar) $base->transferlog(); # (scalar) $base->errorlog(); # (scalar) $base->agentlog(); # (scalar) $base->refererlog(); # (scalar) $base->customlog(); # (array) $base->customlogLocation($name); # (scalar) $base->customlogExists($name); # (scalar boolean, 1 or 0) $base->customlogFormat($name); # (scalar) $base->getTransferLog(); # (object) $base->getErrorLog(); # (object) $base->getRefererLog(); # (object) $base->getAgentLog(); # (object) $base->getCustomLog(); # (object) =over 4 =cut ####################################################################### # config($ParseLog, %arg); returns ParseLog object =pod =item * C $base = $base->config(field1 => value1, field2 => valud2, fieldN => valueN); This method configures the Apache::ParseLog object. Possible fields are: Field Name Value --------------------------------------------------------- serverroot => absolute path to the server root directory servername => name of the server, e.g., "www.mysite.com" httpport => httpd port, e.g., 80 serveradmin => the administrator, e.g., "admin@mysite.com" transferlog => absolute path to the transfer log errorlog => absolute path to the error log agentlog => absolute path to the agent log refererlog => absolute path to the referer log This method should be called after the empty object is created (C, see above). However, you can override the value(s) for any fields by calling this method even if the object is created with defined I<$httpd_conf> and I<$virtual_host>. (Convenient if you don't have any httpd server running on your machine but have to parse the log files transferred from elsewhere.) Any fields are optional, but at least one field should be specified (otherwise why use this method?). When this method is called from the empty object, and B all the fields are specified, the empty field still will be empty (thereby not being able to use some corresponding methods). When this method is called from the already configured object (with C), the fields specified in this C method will override the existing field values, and the rest of the fields inherit the pre-existing values. B This method B, so make sure to use the assignment operator to create the new object (see examples below). B You B (re)configure F values. It is to alleviate the possible broken log formats, which would render the parsed results unusable. Example 1 # Create an empty object first $base = new Apache::ParseLog(); # Configure the transfer and error fields only, for the files # transferred from your Web site hosting service $logs = "/home/webmaster/logs"; $base = $base->config(transferlog => "$logs/transfer_log", errorlog => "$logs/error_log"); Example 2 # Create an object with $httpd_conf $base = new Apache::ParseLog("/usr/local/httpd/conf/httpd.conf"); # Overrides some fields $logs = "/usr/local/httpd/logs"; $base = $base->config(transferlog => "$logs/old/trans_199807", errorlog => "$logs/old/error_199807", agentlog => "$logs/old/agent_199807", refererlog => "$logs/old/refer_199807"); =cut sub config { my($this, %arg) = @_; my($root, $servername, $port, $admin, $transfer, $error, $agent, $referer, %customlog); $serverroot = $arg{'serverroot'} || $this->{'serverroot'}; $servername = $arg{'servername'} || $this->{'servername'}; $httpport = $arg{'httpport'} || $this->{'httpport'}; $serveradmin = $arg{'serveradmin'} || $this->{'serveradmin'}; $transferlog = $arg{'transferlog'} || $this->{'transferlog'}; $errorlog = $arg{'errorlog'} || $this->{'errorlog'}; $agentlog = $arg{'agentlog'} || $this->{'agentlog'}; $refererlog = $arg{'refererlog'} || $this->{'refererlog'}; $customlog = $arg{'customlog'} || $this->{'customlog'}; return bless { 'serverroot' => $serverroot, 'servername' => $servername, 'httpport' => $httpport, 'serveradmin' => $serveradmin, 'transferlog' => $transferlog, 'errorlog' => $errorlog, 'agentlog' => $agentlog, 'refererlog' => $refererlog, 'customlog' => $customlog, } } ####################################################################### # populate($path_to_httpd.conf[, $virtualHost]); return %arg; private sub populate { my($conf) = shift; my($virtualhost) = shift; my($VIRTUAL) = ($virtualhost ? 1 : 0); my(%arg, $fh, $line, $serverroot, $servername, $httpport, $serveradmin, $transferlog, $errorlog, $agentlog, $refererlog, $customlog, @nickname, %location, %format); $fh = openFile($conf); while(defined($line = <$fh>)) { chomp($line); if ($line =~ /^ServerRoot\s+(.+)$/) { $serverroot = $1; } elsif ($line =~ /^Port\s+(.+)$/) { $httpport = $1; } elsif ($line =~ /^LogFormat\s+"(.+)"\s+(\w+)$/) { $format{$2} = $1; } unless ($VIRTUAL) { # check only when $VIRTUAL == 0 if ($line =~ /^ServerName\s+(.+)$/) { $servername = $1; } elsif ($line =~ /^ServerAdmin\s+(.+)$/) { $serveradmin = $1; } elsif ($line =~ /^TransferLog\s+(.+)$/) { $transferlog = $1; } elsif ($line =~ /^ErrorLog\s+(.+)$/) { $errorlog = $1; } elsif ($line =~ /^AgentLog\s+(.+)$/) { $agentlog = $1; } elsif ($line =~ /^RefererLog\s+(.+)$/) { $refererlog = $1; } elsif ($line =~ /^CustomLog\s+(.+)\s+(\w+)$/) { my($loc) = $1; push(@nickname, $2); if ($loc =~ m#\|#) { undef $loc } elsif ($loc !~ m#^/#) { $loc = "$serverroot/$loc" } $location{$2} = $loc; } } if (defined($virtualhost)) { if ($line =~ m#^#) { if ($1 =~ /$virtualhost/i) { # if matches, 0 $VIRTUAL = 0; } } elsif ($line =~ m#^#) { $VIRTUAL = 1; # if matches, 0 } } else { if ($line =~ m#^#) { $VIRTUAL = 0; } } } close($fh); $arg{'serverroot'} = $serverroot || undef; $arg{'servername'} = $servername || undef; $arg{'httpport'} = $httpport || undef; $arg{'serveradmin'} = $serveradmin || undef; if ($transferlog) { if ($transferlog !~ m#^/#) { $transferlog = "$serverroot/$transferlog" } elsif ($transferlog =~ m#\|#) { undef $transferlog } } $arg{'transferlog'} = $transferlog; if ($errorlog) { if ($errorlog !~ m#^/#) { $errorlog = "$serverroot/$errorlog" } elsif ($errorlog =~ m#\|#) { undef $errorlog } } $arg{'errorlog'} = $errorlog; if ($agentlog) { if ($agentlog !~ m#^/#) { $agentlog = "$serverroot/$agentlog" } elsif ($agentlog =~ m#\|#) { undef $agentlog } } $arg{'agentlog'} = $agentlog; if ($refererlog) { if ($refererlog !~ m#^/#) { $refererlog = "$serverroot/$refererlog" } elsif ($refererlog =~ m#\|#) { undef $refererlog } } $arg{'refererlog'} = $refererlog; $customlog = { 'nickname' => [ @nickname ], 'format' => { %format }, 'location' => { %location }, }; $arg{'customlog'} = $customlog; return %arg; } ####################################################################### # serverroot(); returns $serverroot =pod =item * C print $base->serverroot(), "\n"; Returns a scalar containing the root of the Web server as specified in the F file, or C if the object is not specified. =cut sub serverroot { my($this) = shift; return $this->{'serverroot'} || undef; } ####################################################################### # servername(); returns $servername =pod =item * C print $base->servername(), "\n"; Returns a scalar containing the name of the Web server, or C if server name is not specified. =cut sub servername { my($this) = shift; return $this->{'servername'} || undef; } ####################################################################### # httpport(); returns $port =pod =item * C print $base->httpport(), "\n"; Returns a scalar containing the port number used for the httpd, or C if not specified. (By default, httpd uses port 80.) =cut sub httpport { my($this) = shift; return $this->{'httpport'} || undef; } ####################################################################### # serveradmin(); returns $serveradmin =pod =item * C print $base->serveradmin(), "\n"; Returns a scalar containing the name of the server administrator, or C if not specified. =cut sub serveradmin { my($this) = shift; return $this->{'serveradmin'} || undef; } ####################################################################### # transferlog(); returns $transferlog =pod =item * C die "$!\n" unless -e $base->transferlog(); Returns a scalar containing the absolute path to the transfer log file, or C if not specified. =cut sub transferlog { my($this) = shift; return $this->{'transferlog'} || undef; } ####################################################################### # errorlog(); returns $errorlog =pod =item * C die "$!\n" unless -e $base->errorlog(); Returns a scalar containing the absolute path to the error log file, or C if not specified. =cut sub errorlog { my($this) = shift; return $this->{'errorlog'} || undef; } ####################################################################### # agentlog(); returns $agentlog =pod =item * C die "$!\n" unless -e $base->agentlog(); Returns a scalar containing the absolute path to the agent log file, or C if not specified. =cut sub agentlog { my($this) = shift; return $this->{'agentlog'} || undef; } ####################################################################### # refererlog(); returns $refererlog =pod =item * C die "$!\n" unless -e $base->refererlog(); Returns a scalar containing the absolute path to the referer log file, or C if not specified. =cut sub refererlog { my($this) = shift; return $this->{'refererlog'} || undef; } ####################################################################### # customlog(); returns @customlog =pod =item * C @customlog = $base->customlog(); Returns an array containing "nicknames" of the custom logs defined in the I<$httpd_conf>. =cut sub customlog { my($this) = shift; return @{ ${$this->{'customlog'}}{'nickname'} }; } ####################################################################### # customlogDefined($customlog_nickname); returns 1 or 0; private sub customlogDefined { my($this) = shift; my($nickname) = shift; my($switch) = 0; my(@logs) = customlog($this); foreach (@logs) { if ($_ eq $nickname) { $switch++; last; } } return $switch; } ####################################################################### # customlogLocation($customlog_nickname); returns path to the log =pod =item * C print $base->customlogLocation($name), "\n"; Returns a scalar containing the absolute path to the custom log I<$name>. If the custom log I<$name> does not exist, it will return C. This method should be used for debugging purposes only, since you can call C to parse the logs, making it unnecessary to manually open the custom log file in your own script. =cut sub customlogLocation { my($this) = shift; my($nickname) = shift; my($root) = serverroot($this); if (customlogDefined($this, $nickname)) { return ${ ${$this->{'customlog'}}{'location'} }{$nickname}; } else { return undef; } } ####################################################################### # customlogExists($customlog_nickname); returns 1 or 0 =pod =item * C if ($base->customlogExists($name)) { $customlog = $base->getCustomLog($name); } Returns C<1> if the custom log I<$name> (e.g., B, B) is defined in the I<$httpd_conf> file B the log file exists, or C<0> otherwise. You do B have to call this method usually because this is internally called by the C method. =cut sub customlogExists { my($this) = shift; my($nickname) = shift; my($switch) = 0; $switch++ if -e customlogLocation($this, $nickname); return $switch; } ####################################################################### # customlogFormat($customlog_nickname); returns the format =pod =item * C print $base->customlogFormat($name), "\n"; Returns a scalar containing the string of the "LogFormat" for the custom log I<$name>, as specified in I<$httpd_conf>. This method is meant to be used internally, as well as for debugging purpose. =cut sub customlogFormat { my($this) = shift; my($nickname) = shift; if (customlogExists($this, $nickname)) { return ${ ${$this->{'customlog'}}{'format'} }{$nickname}; } else { return undef; } } ####################################################################### # getTransferLog(); returns a blessed ref object for TransferLog =pod =item * C $transferlog = $base->getTransferLog(); Returns an object through which to access the information parsed from the F file. See the L<"LOG OBJECT METHODS"> below for methods to access the log information. =cut sub getTransferLog { my($this) = shift; my($METHOD) = "Apache::ParseLog::getTransferLog"; my($logfile) = $this->{'transferlog'}; croak "$METHOD: $logfile does not exist. Exiting " unless -e $logfile; my($FORMAT) = '(\\S+)\\s(\\S+)\\s(\\S+)\\s\\[(\\d{2}/\\w+/\\d{4}\\:\\d{2}\\:\\d{2}\\:\\d{2}\\s+.+?)\\]\\s\\"(\\w+\\s\\S+\\s\\w+\\/\\d+\\.\\d+)\\"\\s(\\d+)\\s(\\d+|-)'; my(@elements) = qw/HOST LOGIN USER DATETIME REQUEST LSTATUS BYTE/; return scanLog($this, $logfile, $FORMAT, @elements); } ####################################################################### # getRefererLog(); returns a blessed ref object for RefererLog =pod =item * C $refererlog = $base->getRefererLog(); Returns an object through which to access the information parsed from the F file. See the L<"LOG OBJECT METHODS"> below for methods to access the log information. =cut sub getRefererLog { my($this) = shift; my($METHOD) = "Apache::ParseLog::getRefererLog"; my($logfile) = $this->{'refererlog'}; croak "$METHOD: $logfile does not exist. Exiting " unless -e $logfile; my($FORMAT) = '(\\S+)\\s\\-\\>\\s(\\S+)'; my(@elements) = qw/REFERER URL/; return scanLog($this, $logfile, $FORMAT, @elements); } ####################################################################### # getAgentLog(); returns a blessed ref object for AgentLog =pod =item * C $agentlog = $base->getAgentLog(); This method returns an object through which to access the information parsed from the F file. See the L<"LOG OBJECT METHODS"> below for methods to access the log information. =cut sub getAgentLog { my($this) = shift; my($METHOD) = "Apache::ParseLog::getAgentLog"; my($logfile) = $this->{'agentlog'}; croak "$METHOD: $logfile does not exist. Exiting " unless -e $logfile; my($FORMAT) = '(.+)'; my(@elements) = qw/UAGENT/; return scanLog($this, $logfile, $FORMAT, @elements); } ####################################################################### # getErrorLog(); returns a blessed ref object for ErrorLog =pod =item * C $errorlog = $base->getErrorLog(); This method returns an object through which to access the information parsed from the F file. See the L<"LOG OBJECT METHODS"> below for methods to access the log information. =cut sub getErrorLog { my($this) = shift; my($METHOD) = "Apache::ParseLog::getErrorLog"; my($logfile) = $this->{'errorlog'}; croak "$METHOD: $logfile does not exist. Exiting " unless -e $logfile; my($FORMAT) = '\\[(?:\\S+)\\s+(\\S+)\\s+(\\d+)\\s+(\\S+)\\s+(\\d{4})\\]\\s+(\\[(\\w+)\\])?'; my($regex) = qr/$FORMAT/; my($line, %count, %allbydate, %allbytime, %allbydatetime, %allmessage, %errorbydate, %errorbytime, %errorbydatetime, %errormessage, %noticebydate, %noticebytime, %noticebydatetime, %noticemessage, %warnbydate, %warnbytime, %warnbydatetime, %warnmessage); my($fh) = openFile($logfile); while (defined($line = <$fh>)) { chomp($line); if ($line =~ m/^\[/) { $line =~ m#$regex#; my($m) = $M2N{$1}; my($d) = $2; my($time) = $3; my($y) = $4; my($keyword) = $6; $d = "0" . $d if $d =~ m/^\d$/; my($date) = "$m/$d/$y"; $time =~ s/:.+$//; my($datetime) = "$date-$time"; $line =~ s/^.+\]\s+//; if ($keyword eq "error") { $count{'error'}++; $errorbydate{$date}++; $errorbytime{$time}++; $errorbydatetime{$datetime}++; $errormessage{$line}++; } elsif ($keyword eq "notice") { $count{'notice'}++; $noticebydate{$date}++; $noticebytime{$time}++; $noticebydatetime{$datetime}++; $noticemessage{$line}++; } elsif ($keyword eq "warn") { $count{'warn'}++; $warnbydate{$date}++; $warnbytime{$time}++; $warnbydatetime{$datetime}++; $warnmessage{$line}++; } $allbydate{$date}++; $allbytime{$time}++; $allbydatetime{$datetime}++; $allmessage{$line}++ if $line; $count{'dated'}++; } else { $count{'nodate'}++; $allmessage{$line}++ if $line; } $count{'Total'}++; } close($fh); my(@methods) = qw/count allbydate allbytime allbydatetime allmessage errorbydate errorbytime errorbydatetime errormessage noticebydate noticebytime noticebydatetime noticemessage warnbydate warnbytime warnbydatetime warnmessage/; return bless { 'allbydate' => { %allbydate }, 'allbytime' => { %allbytime }, 'allbydatetime' => { %allbydatetime }, 'allmessage' => { %allmessage }, 'errorbydate' => { %errorbydate }, 'errorbytime' => { %errorbytime }, 'errorbydatetime' => { %errorbydatetime }, 'errormessage' => { %errormessage }, 'noticebydate' => { %noticebydate }, 'noticebytime' => { %noticebytime }, 'noticebydatetime' => { %noticebydatetime }, 'noticemessage' => { %noticemessage }, 'warnbydate' => { %warnbydate }, 'warnbytime' => { %warnbytime }, 'warnbydatetime' => { %warnbydatetime }, 'warnmessage' => { %warnmessage }, 'count' => { %count }, 'methods' => [ @methods ], }; } ####################################################################### # getCustomLog($nickname); returns $customlog object =pod =item * C $customlog = $base->getCustomLog($name); This method returns an object through which to access the information parsed from the F file I<$name>. See the L<"LOG OBJECT METHODS"> below for methods for methods to access the log information. =back =cut sub getCustomLog { my($this) = shift; my($nickname) = shift; my($METHOD) = "Apache::ParseLog::getCustomLog"; croak "$METHOD: $nickname does not exist, Exiting " unless customlogExists($this, $nickname); my($logfile) = customlogLocation($this, $nickname); croak "$METHOD: $logfile does not exist. Exiting " unless -e $logfile; # Variables preparation my($host_rx) = '(\\S+)'; # %h (host, visitor IP/hostname) my($log_rx) = '(\\S+)'; # %l (login, login name) my($user_rx) = '(\\S+)'; # %u (user, user name) my($time_rx) = '\\[(\\d{2}/\\w+/\\d{4}\\:\\d{2}\\:\\d{2}\\:\\d{2}\\s+.+?)\\]'; # %t (datetime, date/time) my($req_rx) = '(\\w+\\s\\S+\\s\\w+\\/\\d+\\.\\d+)'; # %r (request, method, file, proto) my($ost_rx) = '(\\d+)'; # %s (ostatus, original status) my($lst_rx) = '(\\d+)'; # %>s (lstatus, last status) my($byte_rx) = '(\\d+|-)'; # %b (byte, bytes) my($file_rx) = '(\\S+)'; # %f (filename, filename) my($addr_rx) = '(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})'; # %a (addr, IP) my($port_rx) = '(\\d+)'; # %p (port, port) my($proc_rx) = '(\\d+)'; # %p (proc, proc ID) my($sec_rx) = '(\\d+)'; # %T (sec, time in sec) my($url_rx) = '(\\S+)'; # %U (url, URL) my($hname_rx) = '(\\S+)'; # %v (hostname, hostname) my($referer_rx) = '(\\S+)'; # %{Referer}i (referer, referer) my($uagent_rx) = '(.+?)'; # %{User-agent}i (uagent, browser) my($space) = '\\s'; # white space my($FORMAT) = customlogFormat($this, $nickname); my($temp) = $FORMAT . " "; # add the last space for the $temp regex below my(@regex, @elements); # Create the pre-compiled regex sting dynamically here while ((length($temp)) > 0) { $temp =~ s/^(\\?\")?(.+?)(\\?\")?\s+//; my($match) = $2; if ($2 eq '%h') { push(@regex, $host_rx); push(@elements, 'HOST'); } elsif ($2 eq '%l') { push(@regex, $log_rx); push(@elements, 'LOGIN'); } elsif ($2 eq '%u') { push(@regex, $user_rx); push(@elements, 'USER'); } elsif ($2 eq '%t') { push(@regex, $time_rx); push(@elements, 'DATETIME'); } elsif ($2 eq '%r') { push(@regex, $req_rx); push(@elements, 'REQUEST'); } elsif ($2 eq '%s') { push(@regex, $ost_rx); push(@elements, 'OSTATUS'); } elsif ($2 eq '%>s') { push(@regex, $lst_rx); push(@elements, 'LSTATUS'); } elsif ($2 eq '%b') { push(@regex, $byte_rx); push(@elements, 'BYTE'); } elsif ($2 eq '%f') { push(@regex, $file_rx); push(@elements, 'FILENAME'); } elsif ($2 eq '%a') { push(@regex, $addr_rx); push(@elements, 'ADDR'); } elsif ($2 eq '%p') { push(@regex, $port_rx); push(@elements, 'PORT'); } elsif ($2 eq '%P') { push(@regex, $proc_rx); push(@elements, 'PROC'); } elsif ($2 eq '%T') { push(@regex, $sec_rx); push(@elements, 'SEC'); } elsif ($2 eq '%U') { push(@regex, $url_rx); push(@elements, 'URL'); } elsif ($2 eq '%v') { push(@regex, $hname_rx); push(@elements, 'HOSTNAME'); } elsif ($2 =~ /Referer/i) { push(@regex, $referer_rx); push(@elements, 'REFERER'); } elsif ($2 =~ /User-agent/i) { push(@regex, $uagent_rx); push(@elements, 'UAGENT'); } else { my($unknown) = $2; $unknown =~ s|(\W)|\\$1|g; push(@regex, $unknown); } $FORMAT =~ s/$match/$regex[$#regex]/; } $FORMAT =~ s/\s/$space/g; # Parse the log finally return scanLog($this, $logfile, $FORMAT, @elements); } ####################################################################### # scanLog($path_to_logfile, $regex_format, @elments); # returns a blessed object; private sub scanLog { my($this) = shift; # package my($logfile) = shift; # path to the log my($FORMAT) = shift; # regex my(@elements) = @_; # array containing what's in the regex # create an array containing the uc name of placeholders my($hostswitch) = 1; # off with host defined my($visitorswitch) = 0; # on with host defined my($visitordone) = 0; # on with visitor added to @methods my($dtswitch) = 0; # on with datetime defined my($dtbyteswitch) = 0; # on with datetime or byte defined my($fileswitch) = 0; # on with filename or request or url defined my(@METHODS) = map { if ((m"^HOST") && ($hostswitch)) { $hostswitch--; $visitorswitch++; if (($dtswitch) && (! $visitordone)) { ($_, "TOPDOMAIN", "SECDOMAIN", "VISITORBYDATE", "VISITORBYTTME", "VISITORBYDATETIME") } else { ($_, "TOPDOMAIN", "SECDOMAIN") } } elsif (((m"DATETIME") || (m"BYTE")) && (! $dtbyteswitch)) { $dtbyteswitch++; if (m"DATETIME") { $dtswitch++; if ($visitorswitch) { $visitorswitch = 0; $visitordone++; ("VISITORBYDATE", "VISITORBYTIME", "VISITORBYDATETIME", "HITBYDATE", "HITBYTIME", "HITBYDATETIME") } else { ("HITBYDATE", "HITBYTIME", "HITBYDATETIME") } } else { $_ } } elsif (((m"DATETIME") || (m"BYTE")) && ($dtbyteswitch)) { if (m"DATETIME") { $dtswitch++; if ($visitorswitch) { $visitorswitch = 0; $visitordone++; ("VISITORBYDATE", "VISITORBYTIME", "VISITORBYDATETIME", "HITBYDATE", "HITBYTIME", "HITBYDATETIME", "BYTEBYDATE", "BYTEBYTIME", "BYTEBYDATETIME") } else { ("HITBYDATE", "HITBYTIME", "HITBYDATETIME", "BYTEBYDATE", "BYTEBYTIME", "BYTEBYDATETIME") } } else { ($_, "BYTEBYDATE", "BYTEBYTIME", "BYTEBYDATETIME") } } elsif (m"REQUEST") { $fileswitch++; ("METHOD", "FILE", "QUERYSTRING", "PROTO") } elsif ((m"FILENAME") || (m"URL")) { $fileswitch++; $_ } elsif ((m"SEC") && ($fileswitch)) { $_ } elsif (m"UAGENT") { ($_, "UAVERSION", "BROWSER", "PLATFORM", "BROWSERBYOS") } elsif (m"REFERER") { ($_, "REFERERDETAIL") } else { $_ } } @elements; push(@METHODS, "HIT"); # the hit => { %hit } is always there @METHODS = map { lc } @METHODS; ### reports placeholders my(%host); # hosts (visitors) my(%topdomain); # top domains my(%secdomain); # secondary domains my(%login); # logins my(%user); # users my(%visitorbydate); # unique visitors (hosts) by date my(%visitorbytime); # unique visitors by time my(%visitorbydatetime); # unique visitors by date/time my(%hitbydate); # hits by date my(%hitbytime); # hits by time my(%hitbydatetime); # hits by date/time my(%method); # methods (get, post, etc.) my(%file); # files my(%querystring); # Query String my(%proto); # protos (HTTP/1.0, etc.) my(%ostatus); # original status (..) my(%lstatus); # last status (use with %STATUS_BY_CODE) my(%byte); # Bytes transferred (* containts one key "total") my(%bytebydate); # bytes by date my(%bytebytime); # bytes by time my(%bytebydatetime); # bytes by date/time my(%filename); # filenames (= files) my(%addr); # IPs (=~ hosts) my(%port); # ports my(%proc); # procs my(%sec); # seconds (time in sec) my(%url); # URLs (=~ files) my(%hostname); # hostnames (=~ hosts) my(%referer); # referer (site only) my(%refererdetail); # referer (detail) my(%uagent); # agents my(%uaversion); # uagent w/ versions (Mozilla/4.04, Slurp/2.0) my(%browser); # browsers w/ version my(%platform); # platforms only my(%browserbyos); # browsers w/ platforms my(%hit); # Total number of hits (lines) ### Routine if ((scalar(@elements) == 1) && ($elements[0] eq "UAGENT")) { $FORMAT =~ s#\?## } my($regex) = qr/$FORMAT/; my($line); my($fh) = openFile($logfile); while (defined($line = <$fh>)) { chomp($line); $line =~ m#$regex#; # Scan each match for ($i = 0; $i < scalar(@elements); $i++) { my($mi) = $i + 1; # index for match; $1, $2,... ${$elements[$i]} = ${$mi}; # assign the back ref } my($date, $time, $method, $file, $proto); ### create reports ### { # HOST RELATED BLOCK # HOST $host{$HOST}++ if $HOST; # HOSTNAME $hostname{$HOSTNAME}++ if $HOSTNAME; my($domain) = ($HOST ? $HOST : $HOSTNAME); # (TOP|SEC)DOMAIN if ($domain) { if ($domain !~ /^\d{1,3}(?:\.\d{1,3}){3}$/) { if ($domain =~ m/\.([A-Za-z0-9\-]+\.)(\w+)$/) { my($secdomain) = $1; my($topdomain) = $2; $topdomain{$topdomain}++; $secdomain = $secdomain . $topdomain; $secdomain{$secdomain}++; } else { $topdomain{$domain}++; $secdomain{$domain}++; } } else { $topdomain{'unknown'}++; $secdomain{'unknown'}++; } } } # LOGIN $login{$LOGIN}++ if $LOGIN; # USER $user{$USER}++ if $USER; # DATETIME if ($DATETIME) { $DATETIME =~ m#^(\d+)/(\w+)/(\d+)\:(\d{2})\:\d{2}\:\d{2}#; my($d) = ($1 =~ /^\d$/ ? '0' . $1 : $1); my($m) = $M2N{$2}; my($y) = $3; my($t) = $4; my($date) = "$m/$d/$y"; $hitbydate{$date}++; $hitbytime{$t}++; my($dt) = "$date-$t"; $hitbydatetime{$dt}++; if (($BYTE) && ($BYTE =~ m#^\d+$#)) { $bytebydate{$date} += $BYTE; $bytebytime{$t} += $BYTE; $bytebydatetime{$dt} += $BYTE; } my($visitor); if ($HOST) { $visitor = $HOST } elsif ($HOSTNAME) { $visitor = $HOSTNAME } elsif ($ADDR) { $visitor = $HOSTNAME } ${$date}{$visitor}++; ${$t}{$visitor}++; ${$dt}{$visitor}++; } # STATUS if ($OSTATUS) { my($key) = "$OSTATUS $STATUS_BY_CODE{$OSTATUS}"; $ostatus{$key}++; } if ($LSTATUS) { my($key) = "$LSTATUS $STATUS_BY_CODE{$LSTATUS}"; $lstatus{$key}++; } # FILENAME $filename{$FILENAME}++ if $FILENAME; # ADDR $addr{$ADDR}++ if $ADDR; # PORT $port{$PORT}++ if $PORT; # PROC $proc{$PROC}++ if $PROC; { # BEGIN FILE RELATED BLOCK my($FILE); # REQUEST if ($REQUEST) { $REQUEST =~ m#^(\w+)\s(\S+)\s(\S+)$#; my($method) = $1; my($file) = $2; my($proto) = $3; $method{$method}++; if ($file =~ m#\?(.+)$#) { $querystring{$1}++; # query string } $file =~ s#\?.+$##; # trim query_string $file =~ s#/\./#/#g; # same-dir duplicates $file =~ s#/\s+?/\.\./#/#g; # same-upper-dir duplicates $file{$file}++; $proto{$proto}++; $FILE = $file; } # URL if ($URL) { $url{$URL}++; $FILE = $URL; } # FILE $FILE = $FILENAME unless $FILE; # SEC SEC: if ($SEC) { last SEC unless $FILE; if (exists($sec{$FILE})) { $sec{$FILE} = "$SEC" if $SEC > $sec{$FILE}; } else { $sec{$FILE} = "$SEC"; } } # BYTE if (($BYTE) && ($BYTE =~ m#^\d+$#)) { $byte{'Total'} += $BYTE; last unless $FILE; $FILE =~ m#\.(\w+)$#; if ($1) { $byte{$1} += $BYTE; $hit{$1}++; } else { $byte{'OtherTypes'} += $BYTE; $hit{'OtherTypes'}++; } } # REFERER if ($REFERER) { my($refered); if ($URL) { $refered = $URL } elsif ($FILE) { $refered = $FILE } elsif ($FILENAME) { $refered = $FILENAME } my($ref) = (($refered) ? "$REFERER -> $refered" : $REFERER); $refererdetail{$ref}++; if ($REFERER =~ m#http://(\S+?)[/?]#) { $referer{$1}++ } elsif ($REFERER =~ m#^-$#) { $referer{'bookmark'}++ } else { $referer{'unknown'}++ } } } # END FILE RELATED BLOCK # UAGENT if ($UAGENT) { $uagent{$UAGENT}++; $UAGENT =~ m#^(\S+)\s*(.+)?$#; my($parser) = $1; my($rest) = $2; $uaversion{$parser}++ if $parser; my($browser); if (($UAGENT =~ m/^Mozilla/) && (($rest =~ m/(Webtv.+?)[;)]/) || ($rest =~ m/(AOL.+?)[;)]/) || ($rest =~ m/(MSN.+?)[;)]/) || ($rest =~ m/(MSIE.+?)[;)]/))) { $browser = $1; $browser{$1}++; } elsif (($UAGENT =~ m/Mozilla/) && ($rest =~ m/compatible\;\s+(.+?)[;)]/)) { $browser = $1; $browser{$1}++; } else { $browser = $parser; $browser{$browser}++; } my($plat); if ($rest =~ m/(Win.+?)[;)]/) { $plat = $1; $platform{$1}++; } elsif ($rest =~ m/(Mac.+?)[;)]/) { $plat = $1; $platform{$1}++; } elsif ($rest =~ m/X11\;\s+.+?\;\s+(.+?)[;)]/) { $plat = $1; $platform{$1}++; } else { $plat = $rest; $plat =~ s#(?:\(|\)|\;)##g; $platform{$plat}++; } my($bandp) = "$browser ($plat)"; $browserbyos{$bandp}++; } # hit $hit{'Total'}++; } close($fh); # Construct %visitorxxx hashes %visitorbydate = map { $_ => scalar(keys %{$_}), } keys %hitbydate; %visitorbytime = map { $_ => scalar(keys %{$_}), } keys %hitbytime; %visitorbydatetime = map { $_ => scalar(keys %{$_}), } keys %hitbydatetime; return bless { 'host' => { %host }, 'topdomain' => { %topdomain }, 'secdomain' => { %secdomain }, 'login' => { %login }, 'user' => { %user }, 'hitbydate' => { %hitbydate }, 'hitbytime' => { %hitbytime }, 'hitbydatetime' => { %hitbydatetime }, 'visitorbydate' => { %visitorbydate }, 'visitorbytime' => { %visitorbytime }, 'visitorbydatetime' => { %visitorbydatetime }, 'method' => { %method }, 'file' => { %file }, 'querystring' => { %querystring }, 'proto' => { %proto }, 'ostatus' => { %ostatus }, 'lstatus' => { %lstatus }, 'byte' => { %byte }, 'bytebydate' => { %bytebydate }, 'bytebytime' => { %bytebytime }, 'bytebydatetime' => { %bytebydatetime }, 'filename' => { %filename }, 'addr' => { %addr }, 'port' => { %port }, 'proc' => { %proc }, 'sec' => { %sec }, 'url' => { %url }, 'hostname' => { %hostname }, 'referer' => { %referer }, 'refererdetail' => { %refererdetail }, 'uagent' => { %uagent }, 'uaversion' => { %uaversion }, 'browser' => { %browser }, 'platform' => { %platform }, 'browserbyos' => { %browserbyos }, 'hit' => { %hit }, 'methods' => [ @METHODS ], }; } ####################################################################### # LOG OBJECT METHODS ####################################################################### =pod =head1 LOG OBJECT METHODS This section describes the methods available for the log object created by any of the following base object methods: B, B, B, B, and B. This section is devided into six subsections, each of which describes the available methods for a certain log object. Note that all the methods for F, F, and F can be used for the object created with C. =cut ####################################################################### # TRANSFERLOG METHODS =pod =head2 TransferLog/CustomLog Methods The following methods are available for the F object (created by C method), as well as the F object that logs appropriate arguments to the corresponding F. =over 4 =cut ###################################################################### # hit(); returns %hit =pod =item * C %hit = $logobject->hit(); Returns a hash containing at least a key 'Total' with the total hit count as its value, and the file extensions (i.e., html, jpg, gif, cgi, pl, etc.) as keys with the hit count for each key as values. =cut sub hit { my($this) = shift; return %{($this->{'hit'} || undef)}; } ####################################################################### # host(); returns %host =pod =item * C %host = $logobject->host(); Returns a hash containing host names (or IPs if names are unresolved) of the visitors as keys, and the hit count for each key as values. =cut sub host { my($this) = shift; my(%host) = %{($this->{'host'} || undef)}; return %host; } ####################################################################### # topdomain(); returns %topdomain =pod =item * C %topdomain = $logobject->topdomain(); Returns a hash containing topdomain names (com, net, etc.) of the visitors as keys, and the hit count for each key as values. Note that if the hostname is unresolved and remains as an IP address, the visitor will not be counted toward the (and the next C) returned value of this method. =cut sub topdomain { my($this) = shift; return %{($this->{'topdomain'} || undef)}; } ###################################################################### # secdomain(); returns %secdomain =pod =item * C %secdomain = $logobject->secdomain(); Returns a hash containing secondary domain names (xxx.com, yyy.net, etc.) as keys, and the hit count for each key as values. For the unresolved IPs, the same rule applies as the above C method. =cut sub secdomain { my($this) = shift; return %{($this->{'secdomain'} || undef)}; } ###################################################################### # login(); returns %login =pod =item * C %login = $logobject->login(); Returns a hash containing login names (authenticated user logins) of the visitors as keys, and the hit count for each key as values. Log entries for non-authenticated files have a character "-" as the login name. =cut sub login { my($this) = shift; return %{($this->{'login'} || undef)}; } ###################################################################### # user(); returns %user =pod =item * C %user = $logobject->user(); Returns a hash containing user names (for access-controlled directories, refer to the access.conf file of the Apache server) of the visitors as keys, and the hit count for each key as values. Non-access-controlled log entries have a character "-" as the user name. =cut sub user { my($this) = shift; return %{($this->{'user'} || undef)}; } ###################################################################### # hitbydate(); returns %hitbydate =pod =item * C %hitbydate = $logobject->hitbydate(); Returns a hash containing date (mm/dd/yyyy) when visitors visited the particular file (html, jpg, etc.) as keys, and the hit count for each key as values. =cut sub hitbydate { my($this) = shift; return %{($this->{'hitbydate'} || undef)}; } ###################################################################### # hitbytime(); returns %hitbytime =pod =item * C %hitbytime = $logobject->hitbytime(); Returns a hash containing time (00-23) each file was visited as keys, and the hit count for each key as values. =cut sub hitbytime { my($this) = shift; return %{($this->{'hitbytime'} || undef)}; } ###################################################################### # hitbydatetime(); returns %hitbydatetime =pod =item * C %hitbydatetime = $logobject->hitbydatetime(); Returns a hash containing date/time (mm/dd/yyyy-hh) as keys, and the hit count for each key as values. =cut sub hitbydatetime { my($this) = shift; return %{($this->{'hitbydatetime'} || undef)}; } ###################################################################### # visitorbydate(); returns %visitorbydate =pod =item * C %visitorbydate = $logobject->visitorbydate(); Returns a hash containing date (mm/dd/yyyy) as keys, and the unique visitor count for each key as values. =cut sub visitorbydate { my($this) = shift; return %{($this->{'visitorbydate'} || undef)}; } ###################################################################### # visitorbytime(); returns %visitorbytime =pod =item * C %visitorbytime = $logobject->visitorbytime(); Returns a hash containing time (00-23) as keys, and the unique visitor count for each key as values. =cut sub visitorbytime { my($this) = shift; return %{($this->{'visitorbytime'} || undef)}; } ###################################################################### # visitorbydatetime(); returns %visitorbydatetime =pod =item * C %visitorbydatetime = $logobject->visitorbydatetime(); Returns a hash containing date/time (mm/dd/yyyy-hh) as keys, and the unique visitor count for each key as values. =cut sub visitorbydatetime { my($this) = shift; return %{($this->{'visitorbydatetime'} || undef)}; } ###################################################################### # method(); returns %method =pod =item * C %method = $logobject->method(); Returns a hash containing HTTP method (GET, POST, PUT, etc.) as keys, and the hit count for each key as values. =cut sub method { my($this) = shift; return %{($this->{'method'} || undef)}; } ###################################################################### # file(); returns %file =pod =item * C %file = $logobject->file(); Returns a hash containing the file names relative to the F of the server as keys, and the hit count for each key as values. =cut sub file { my($this) = shift; return %{($this->{'file'} || undef)}; } ###################################################################### # querystring(); returns %querystring =pod =item * C %querystring = $logobject->querystring(); Returns a hash containing the query string as keys, and the hit count for each key as values. =cut sub querystring { my($this) = shift; return %{($this->{'querystring'} || undef)}; } ###################################################################### # proto(); returns %proto =pod =item * C %proto = $logobject->proto(); Returns a hash containing the protocols used (HTTP/1.0, HTTP/1.1, etc.) as keys, and the hit count for each key as values. =cut sub proto { my($this) = shift; return %{($this->{'proto'} || undef)}; } ###################################################################### # lstatus(); returns %lstatus =pod =item * C %lstatus = $logobject->lstatus(); Returns a hash containing HTTP codes and messages (e.g. "404 Not Found") for the last status (i.e., when the httpd finishes processing that request) as keys, and the hit count for each key as values. =cut sub lstatus { my($this) = shift; return %{($this->{'lstatus'} || undef)}; } ###################################################################### # byte(); returns %byte =pod =item * C %byte = $logobject->byte(); Returns a hash containing at least a key 'Total' with the total transferred bytes as its value, and the file extensions (i.e., html, jpg, gif, cgi, pl, etc.) as keys, and the transferred bytes for each key as values. =cut sub byte { my($this) = shift; return %{($this->{'byte'} || undef)}; } ###################################################################### # bytebydate(); returns %bytebydate =pod =item * C %bytebydate = $logobject->bytebydate(); Returns a hash containing date (mm/dd/yyyy) as keys, and the hit count for each key as values. =cut sub bytebydate { my($this) = shift; return %{($this->{'bytebydate'} || undef)}; } ###################################################################### # bytebytime(); returns %bytebytime =pod =item * C %bytebytime = $logobject->bytebytime(); Returns a hash containing time (00-23) as keys, and the hit count for each key as values. =cut sub bytebytime { my($this) = shift; return %{($this->{'bytebytime'} || undef)}; } ###################################################################### # bytebydatetime(); returns %bytebydatetime =pod =item * C %bytebydatetime = $logobject->bytebydatetime(); Returns a hash containing date/time (mm/dd/yyyy-hh) as keys, and the hit count for each key as values. =back =cut sub bytebydatetime { my($this) = shift; return %{($this->{'bytebydatetime'} || undef)}; } ####################################################################### # ERRORLOG METHODS =pod =head2 ErrorLog Methods Until the Apache version 1.2.x, each error log entry was just an error, meaning that there was no distinction between "real" errors (e.g., File Not Found, malfunctioning CGI, etc.) and non-significant errors (e.g., kill -1 the httpd processes, etc.). Starting from the version 1.3.x, the Apache httpd logs the "type" of each error log entry, namely "error", "notice" and "warn". If you use Apache 1.2.x, the C, C, and C should not be used, because those methods for that are for 1.3.x only will merely return an empty hash. The C methods will return desired results. The following methods are available for the F object (created by C method). =over 4 =cut ####################################################################### # total(); returns $total; =pod =item * C %errors = $errorlogobject->count(); Returns a hash containing count for each type of messages logged in the error log file. The keys and values are: 'Total' (total number of errors), 'error' (total number of errors of type "error"), 'notice' total number of errors of type "notice"), 'warn' (total number of errors of type "warn"), 'dated' (total number of error entries with date logged), and 'nodate' (total number of error entires with no date logged). So: print "Total Errors: ", $errors{'Total'}, "\n"; print "Total 1.3.x Errors: ", $errors{'error'}, "\n"; print "Total 1.3.x Notices: ", $errors{'notice'}, "\n"; print "Total 1.3.x Warns: ", $errors{'warn'}, "\n"; print "Total Errors with date: ", $errors{'dated'}, "\n"; print "Total Errors with no date: ", $errors{'nodate'}, "\n"; Note that with the F file generated by Apache version before 1.3.x, the value for 'error', 'notice', and 'warn' will be zero. =cut sub count { my($this) = shift; return %{($this->{'count'} || undef)}; } ####################################################################### # allbydate(); returns %allbydate; =pod =item * C %allbydate = $errorlogobject->allbydate(); Returns a hash containing date (mm/dd/yyyy) when the error was logged as keys, and the number of error occurrances as values. =cut sub allbydate { my($this) = shift; return %{($this->{'allbydate'} || undef)}; } ####################################################################### # allbytime(); returns %allbytime; =pod =item * C %allbytime = $errorlogobject->allbytime(); Returns a hash containing time (00-23) as keys and the number of error occurrances as values. =cut sub allbytime { my($this) = shift; return %{($this->{'allbytime'} || undef)}; } ####################################################################### # allbydatetime(); returns %allbydatetime; =pod =item * C %allbydatetime = $errorlogobject->allbydatetime(); Returns a hash containing date/time (mm/dd/yyyy-hh) as keys and the number of error occurrances as values. =cut sub allbydatetime { my($this) = shift; return %{($this->{'allbydatetime'} || undef)}; } ####################################################################### # allmessage(); returns %allmessage; =pod =item * C %allmessage = $errorlogobject->allmessage(); Returns a hash containing error messages as keys and the number of occurrances as values. =cut sub allmessage { my($this) = shift; return %{($this->{'allmessage'} || undef)}; } ####################################################################### # errorbydate(); returns %errorbydate; =pod =item * C %errorbydate = $errorlogobject->errorbydate(); Returns a hash containing date (mm/dd/yyyy) as keys and the number of error occurrances as values. For the Apache 1.3.x log only. =cut sub errorbydate { my($this) = shift; return %{($this->{'errorbydate'} || undef)}; } ####################################################################### # errorbytime(); returns %errorbytime; =pod =item * C %errorbytime = $errorlogobject->errorbytime(); Returns a hash containing time (00-23) as keys and the number of error occurrances as values. For the Apache 1.3.x log only. =cut sub errorbytime { my($this) = shift; return %{($this->{'errorbytime'} || undef)}; } ####################################################################### # errorbydatetime(); returns %errorbydatetime; =pod =item * C %errorbydatetime = $errorlogobject->errorbydatetime(); Returns a hash containing date/time (mm/dd/yyyy-hh) as keys and the number of error occurrances as values. For the Apache 1.3.x log only. =cut sub errorbydatetime { my($this) = shift; return %{($this->{'errorbydatetime'} || undef)}; } ####################################################################### # errormessage(); returns %errormessage; =pod =item * C %errormessage = $errorlogobject->errormessage(); Returns a hash containing error messages as keys and the number of occurrances as values. For the Apache 1.3.x log only. =cut sub errormessage { my($this) = shift; return %{($this->{'errormessage'} || undef)}; } ####################################################################### # noticebydate(); returns %noticebydate; =pod =item * C %noticebydate = $errorlogobject->noticebydate(); Returns a hash containing date (mm/dd/yyyy) as keys and the number of error occurrances as values. For the Apache 1.3.x log only. =cut sub noticebydate { my($this) = shift; return %{($this->{'noticebydate'} || undef)}; } ####################################################################### # noticebytime(); returns %noticebytime; =pod =item * C %noticebytime = $errorlogobject->noticebytime(); Returns a hash containing time (00-23) as keys and the number of error occurrances as values. For the Apache 1.3.x log only. =cut sub noticebytime { my($this) = shift; return %{($this->{'noticebytime'} || undef)}; } ####################################################################### # noticebydatetime(); returns %noticebydatetime; =pod =item * C %noticebydatetime = $errorlogobject->noticebydatetime(); Returns a hash containing date/time (mm/dd/yyyy-hh) as keys and the number of error occurrances as values. For the Apache 1.3.x log only. =cut sub noticebydatetime { my($this) = shift; return %{($this->{'noticebydatetime'} || undef)}; } ####################################################################### # noticemessage(); returns %noticemessage; =pod =item * C %noticemessage = $errorlogobject->noticemessage(); Returns a hash containing notice messages as keys and the number of occurrances as values. For the Apache 1.3.x log only. =cut sub noticemessage { my($this) = shift; return %{($this->{'noticemessage'} || undef)}; } ####################################################################### # warnbydate(); returns %warnbydate; =pod =item * C %warnbydate = $errorlogobject->warnbydate(); Returns a hash containing date (mm/dd/yyyy) as keys and the number of error occurrances as values. For the Apache 1.3.x only. =cut sub warnbydate { my($this) = shift; return %{($this->{'warnbydate'} || undef)}; } ####################################################################### # warnbytime(); returns %warnbytime; =pod =item * C %warnbytime = $errorlogobject->warnbytime(); Returns a hash containing time (00-23) as keys and the number of error occurrances as values. For the Apache 1.3.x only. =cut sub warnbytime { my($this) = shift; return %{($this->{'warnbytime'} || undef)}; } ####################################################################### # warnbydatetime(); returns %warnbydatetime; =pod =item * C %warnbydatetime = $errorlogobject->warnbydatetime(); Returns a hash containing date/time (mm/dd/yyyy-hh) as keys and the number of error occurrances as values. For the Apache 1.3.x only. =cut sub warnbydatetime { my($this) = shift; return %{($this->{'warnbydatetime'} || undef)}; } ####################################################################### # warnmessage(); returns %warnmessage; =pod =item * C %warnmessage = $errorlogobject->warnmessage(); Returns a hash containing warn messages as keys and the number of occurrances as values. For the Apache 1.3.x only. =back =cut sub warnmessage { my($this) = shift; return %{($this->{'warnmessage'} || undef)}; } ####################################################################### # REFERERLOG METHODS =pod =head2 RefererLog/CustomLog Methods The following methods are available for the F object (created by C method), as well as the F object that logs C<%{Referer}i> to the corresponding F. =over 4 =cut ###################################################################### # referer(); returns %referer =pod =item * C %referer = $logobject->referer(); Returns a hash containing the name of the web site the visitor comes from as keys, and the hit count for each key as values. Note that the returned data from this method contains B the site name of the referer, e.g. "www.altavista.digital.com", so if you want to obtain the full details of the referer as well as the referred files, use C method described below. =cut sub referer { my($this) = shift; return %{($this->{'referer'} || undef)}; } ####################################################################### # refererdetail(); returns %refererdetail =pod =item * C Returns a hash containing the full URL of the referer as keys, and the hit count for each key as values. The standard log format for the F is -> I>. With the F object, the object attempts to use the URL first, and if the URL is not logged, then the relative path, and then the absolute path to create the key for the returned data I<%referer>. If none of the URL, relative or absolute paths are logged, the object will use only the referer URL itself (without refererd files) as the key. =back =cut sub refererdetail { my($this) = shift; return %{($this->{'refererdetail'} || undef)}; } ####################################################################### # AGENTLOG METHODS =pod =head2 AgentLog/CustomLog Methods This subsection describes the methods available for the F object (created by C method), as well as the F object that logs C<%{User-agent}i> to the corresponding F. =over 4 =cut ###################################################################### # uagent(); returns %uagent =pod =item * C %uagent = $logobject->uagent(); Returns a hash containing the user agent (the "full name", as you see in the log file itself) as keys, and the hit count for each key as values. =cut sub uagent { my($this) = shift; return %{($this->{'uagent'} || undef)}; } ###################################################################### # uaversion(); returns %uaversion =pod =item * C %uaversion = $logobject->uaversion(); Returns a hash containing the most basic and simple information about the user agent (the first column in the agent log file, e.g. "C") as keys, and the hit count for each key as values. Useful to collect the information about the parser engine and its version, to determine which specs of HTML and/or JavaScript to deploy, for example. =cut sub uaversion { my($this) = shift; return %{($this->{'uaversion'} || undef)}; } ###################################################################### # browser(); returns %browser =pod =item * C %browser = $logobject->browser(); Returns a hash containing the actual browsers (as logged in the file) as keys, and the hit count for each key as values. For example, Netscape Navigator/Communicator will (still) be reported as "C>", Microsoft Internet Explorer as "C>", and so on. =cut sub browser { my($this) = shift; return %{($this->{'browser'} || undef)}; } ###################################################################### # platform(); returns %platform =pod =item * C %platform = $logobject->platform(); Returns a hash containing the names of OS (and possibly its version, hardware architecture, etc.) as keys, and the hit count for each key as values. For example, Solaris 2.6 on UltraSPARC will be reported as "C", =cut sub platform { my($this) = shift; return %{($this->{'platform'} || undef)}; } ###################################################################### # browserbyos(); returns %browserbyos =pod =item * C %browserbyos = $logobject->browserbyos(); Returns a hash containing the browser names with OS (in the form, I) as keys, and the hit count for each key as values. =back =cut sub browserbyos { my($this) = shift; return %{($this->{'browserbyos'} || undef)}; } ####################################################################### # CUSTOMLOG METHODS =pod =head2 CustomLog Methods This subsection describes the methods available only for the F object. See each method for what Apache directive is used for each returned result. =over 4 =cut ###################################################################### # addr(); returns %addr =pod =item * C %addr = $logobject->addr(); Returns a hash containing the IP addresses of the B (instead of the F) visited as keys, and the hit count for each key as values. (LogFormat C<%a>) =cut sub addr { my($this) = shift; return %{($this->{'addr'} || undef)}; } ###################################################################### # filename(); returns %filename =pod =item * C %filename = $logobject->filename(); Returns a hash containing the absolute paths to the files as keys, and the hit count for each key as values. (LogFormat C<%f>) =cut sub filename { my($this) = shift; return %{($this->{'filename'} || undef)}; } ###################################################################### # hostname(); returns %hostname =pod =item * C %hostname = $logobject->hostname(); Returns a hash containing the hostnames of the visitors as keys, and the hit count for each key as values. (LogFormat C<%v>) =cut sub hostname { my($this) = shift; return %{($this->{'hostname'} || undef)}; } ###################################################################### # ostatus(); returns %ostatus =pod =item * C %ostatus = $logobject->ostatus(); Returns a hash containing HTTP codes and messages (e.g. "404 Not Found") for the original status (i.e., when the httpd starts processing that request) as keys, and the hit count for each key as values. =cut sub ostatus { my($this) = shift; return %{($this->{'ostatus'} || undef)}; } ###################################################################### # port(); returns %port =pod =item * C %port = $logobject->port(); Returns a hash containing the port used for the transfer as keys, and the hit count for each key as values (there will probably be the only one key-value pair value for each server). (LogFormat C<%p>) =cut sub port { my($this) = shift; return %{($this->{'port'} || undef)}; } ###################################################################### # proc(); returns %proc =pod =item * C %proc = $logobject->proc(); Returns a hash containing the process ID of the server used for each file transfer as keys, and the hit count for each key as values. (LogFormat C<%P>) =cut sub proc { my($this) = shift; return %{($this->{'proc'} || undef)}; } ###################################################################### # sec(); returns %sec =pod =item * C %sec = $logobject->sec(); Returns a hash containing the file names (either relative paths, absolute paths, or the URL, depending on your log format) as keys, and the maximum seconds it takes to finish the process as values. Thus, note that the values are not accumulated results, but rather the highest number of seconds it took to process the file. (LogFormat C<%T>) =cut sub sec { my($this) = shift; return %{($this->{'sec'} || undef)}; } ###################################################################### # url(); returns %url =pod =item * C %url = $logobject->url(); Returns a hash containing the URLs (path relative to the F> as keys, and the hit count for each key as values. (LogFormat C<%U>) =back =cut sub url { my($this) = shift; return %{($this->{'url'} || undef)}; } ####################################################################### # SPECIAL METHOD =pod =head2 Special Method The special method described below, C, can be used with B of the B to extract the methods available for the calling object. =over 4 =cut ####################################################################### # getMethods(); returns @methods; =pod =item * C @object_methods = $logobject->getMethods(); Returns an array containing the names of the available methods for that log object. Each of the elements in the array is the name of one of the methods described in this section. By using this method, you can write a I simple Apache log parsing and reporting script, like so: #!/usr/local/bin/perl $|++; # flush buffer use Apache::ParseLog; # Construct the Apache::ParseLog object $base = new Apache::ParseLog("/usr/local/httpd/conf/httpd.conf"); # Get the CustomLog object for "my_custom_log" $customlog = $base->getCustomLog("my_custom_log"); # Get the available methods for the CustomLog object @methods = $customlog->getMethods(); # Iterate through the @methods foreach $method (@methods) { print "$method log report\n"; # Get the returned value for each method %{$method} = $customlog->$method(); # Iterate through the returned hash foreach (sort keys %{$method}) { print "$_: ${$method}{$_}\n"; } print "\n"; } exit; =back =cut sub getMethods { my($this) = shift; return @{($this->{'methods'} || undef)}; } ####################################################################### # MISCELLANEOUS ####################################################################### =pod =head1 MISCELLANEOUS This section describes some miscellaneous methods that might be useful. =cut ####################################################################### # version(); returns $VERSION =pod =over 4 =item * C Returns a scalar containing the version of the Apache::ParseLog module. =back =cut sub Version { $VERSION } ####################################################################### # EXPORTED METHODS =pod =head2 Exported Methods This subsection describes B methods provided by the Apache::ParseLog module. (For the information about exported methods, see Exporter(3).) Note that those exported modules can be used (called) just like local (main package) subroutines. =over 4 =cut ####################################################################### # countryByCode(); returns %COUNTRY_BY_CODE =pod =item * C %countryByCode = countryByCode(); Returns a hash containing a hashtable of country-code top-level domain names as keys and country names as values. Useful for creating a report on "hits" by countries. =cut sub countryByCode { return %COUNTRY_BY_CODE; } ####################################################################### # statusByCode(); returns %STATUS_BY_CODE =pod =item * C %statusByCode = statusByCode(); Returns a hash containing a hashtable of status code of the Apache HTTPD server, as defined by RFC2068, as keys and meanings as values. =cut sub statusByCode { return %STATUS_BY_CODE; } ####################################################################### # sortHashByValue(); returns @sorted =pod =item * C @sorted_keys = sortHashByValue(%hash); Returns an array containing keys of the I<%hash> B sorted by the values of the I<%hash>, by the descending order. Example # Get the custom log object $customlog = $log->getCustomLog("combined"); # Get the log report on "file" %file = $customlog->file(); # Sort the %file by hit counts, descending order @sorted_keys = sortHashByValue(%hash); foreach (@sorted_keys) { print "$_: $file{$_}\n"; # print : } =back =cut sub sortHashByValue { my(%hash) = @_; return sort { $hash{$b} <=> $hash{$a} } keys %hash; } ####################################################################### # PRIVATE METHODS ####################################################################### ####################################################################### # openFile($any_file); returns a filehandle sub openFile { my($file) = shift; my($METHOD) = "Apache::ParseLog::openFile"; local(*FH); open(FH, "<$file") or croak "$METHOD: Cannot open $file. Exiting "; return *FH; } ####################################################################### # ADDITIONAL DOCUMENTATION ####################################################################### =pod =head1 EXAMPLES The most basic, easiest way to create reports is presented as an example in the C section above, but the format of the output is pretty crude and less user-friendly. Shown below are some other examples to use Apache::ParseLog. =head2 Example 1: Basic Report The example code below checks the F and F generated by the Apache 1.2.x, and prints the reports to STDOUT. (To run this code, all you have to do is to change the I<$conf> value.) #!/usr/local/bin/perl $|++; use Apache::ParseLog; $conf = "/usr/local/httpd/conf/httpd.conf"; $base = new Apache::ParseLog($conf); print "TransferLog Report\n\n"; $transferlog = $base->getTransferLog(); %hit = $transferlog->hit(); %hitbydate = $transferlog->hitbydate(); print "Total Hit Counts: ", $hit{'Total'}, "\n"; foreach (sort keys %hitbydate) { print "$_:\t$hitbydate{$_}\n"; # : } $hitaverage = int($hit{'Total'} / scalar(keys %hitbydate)); print "Average Daily Hits: $hitaverage\n\n"; %byte = $transferlog->byte(); %bytebydate = $transferlog->bytebydate(); print "Total Bytes Transferred: ", $byte{'Total'}, "\n"; foreach (sort keys %bytebydate) { print "$_:\t$bytebydate{$_}\n"; # : } $byteaverage = int($byte{'Total'} / scalar(keys %bytebydate)); print "Average Daily Bytes Transferred: $byteaverage\n\n"; %visitorbydate = $transferlog->visitorbydate(); %host = $transferlog->host(); print "Total Unique Visitors: ", scalar(keys %host), "\n"; foreach (sort keys %visitorbydate) { print "$_:\t$visitorbydate{$_}\n"; # } $visitoraverage = int(scalar(keys %host) / scalar(keys %visitorbydate)); print "Average Daily Unique Visitors: $visitoraverage\n\n"; print "ErrorLog Report\n\n"; $errorlog = $base->getErrorLog(); %count = $errorlog->count(); %allbydate = $errorlog->allbydate(); print "Total Errors: ", $count{'Total'}, "\n"; foreach (sort keys %allbydate) { print "$_:\t$allbydate{$_}\n"; # : } $erroraverage = int($count{'Total'} / scalar(keys %allbydate)); print "Average Daily Errors: $erroraverage\n\n"; exit; =head2 Example 2: Referer Report The F (or F with referer logged) contains the referer for every single file requested. It means that everytime a page that contains 10 images is requested, 11 lines are added to the F, one line for the actual referer (where the visitor comes from), and the other 10 lines for the images with the I page containing the 10 images as the referer, which is probably a little too much more than what you want to know. The example code below checks the F that contains referer, (among other things), and reports the names of the referer sites that are not the local server itself. #!/usr/local/bin/perl $|++; use Apache::ParseLog; $conf = "/usr/local/httpd/conf/httpd.conf"; $base = new Apache::ParseLog($conf); $localserver = $base->servername(); $log = $base->getCustomLog("combined"); %referer = $log->referer(); @sortedkeys = sortHashByValue(%referer); print "External Referers Report\n"; foreach (@sortedkeys) { print "$_:\t$referer{$_}\n" unless m/$localserver/i or m/^\-/; } exit; =head2 Example 3: Access-Controlled User Report Let's suppose that you have a directory tree on your site that is access-controlled by F<.htaccess> or the like, and you want to check how frequently the section is used by the users. #!/usr/local/bin/perl $|++; use Apache::ParseLog; $conf = "/usr/local/httpd/conf/httpd.conf"; $base = new Apache::ParseLog($conf); $log = $base->getCustomLog("common"); %user = $log->user(); print "Users Report\n"; foreach (sort keys %user) { print "$_:\t$user{$_}\n" unless m/^-$/; } exit; =head1 SEE ALSO perl(1), perlop(1), perlre(1), Exporter(3) =head1 BUGS The reports on lesser-known browsers returned from the F methods are not always informative. The data returned from the C method for F may be irrelvant if the referred files are not accessed via HTTP (i.e., the referer does not start with "http://" string). If the base object is created with the I<$virtualhost> specified, unless the F and F are specified within the ... , those values specified in the global section of the I are not shared with the I<$virtualhost>. =head1 TO DO Increase the performance (speed). =head1 VERSION Apache::ParseLog 1.01 (10/01/1998). =head1 AUTHOR Apache::ParseLog was written and is maintained by Akira Hangai (akira@discover-net.net) For the bug reports, comments, suggestions, etc., please email me. =head1 COPYRIGHT Copyright 1998, Akira Hangai. All rights reserved. This program is free software; You can redistribute it and/or modify it under the same terms as Perl itself. =head1 DISCLAIMER This package is distributed in the hope that it will be useful for many web administrators/webmasters who are too busy to write their own programs to analyze the Apache log files. However, this package is so distributed WITHOUT ANY WARRANTY in that any use of the data generated by this package must be used at the user's own discretion, and the author shall not be held accountable for any results from the use of this package. =cut ####################################################################### # DATA ####################################################################### __DATA__ ad:Andorra ae:United Arab Emirates af:Afghanistan ag:Antigua and Barbuda ai:Anguilla al:Albania am:Armenia an:Netherlands Antilles ao:Angola aq:Antarctica ar:Argentina as:American Samoa at:Austria au:Australia aw:Aruba az:Azerbaijan ba:Bosnia and Herzegovina bb:Barbados bd:Bangladesh be:Belgium bf:Burkina Faso bg:Bulgaria bh:Bahrain bi:Burundi bj:Benin bm:Bermuda bn:Brunei Darussalam bo:Bolivia br:Brazil bs:Bahamas bt:Bhutan bv:Bouvet Island bw:Botswana by:Belarus bz:Belize ca:Canada cc:Cocos (Keeling) Islands cf:Central African Republic cg:Congo ch:Switzerland ci:Cote D'Ivoire (Ivory Coast) ck:Cook Islands cl:Chile cm:Cameroon cn:China co:Colombia cr:Costa Rica cs:Czechoslovakia (former) cu:Cuba cv:Cape Verde cx:Christmas Island cy:Cyprus cz:Czech Republic de:Germany dj:Djibouti dk:Denmark dm:Dominica do:Dominican Republic dz:Algeria ec:Ecuador ee:Estonia eg:Egypt eh:Western Sahara er:Eritrea es:Spain et:Ethiopia fi:Finland fj:Fiji fk:Falkland Islands (Malvinas) fm:Micronesia fo:Faroe Islands fr:France fx:France Metropolitan, ga:Gabon gb:Great Britain (UK) gd:Grenada ge:Georgia gf:French Guiana gh:Ghana gi:Gibraltar gl:Greenland gm:Gambia gn:Guinea gp:Guadeloupe gq:Equatorial Guinea gr:Greece gs:S. Georgia and S. Sandwich Isls. gt:Guatemala gu:Guam gw:Guinea-Bissau gy:Guyana hk:Hong Kong hm:Heard and McDonald Islands hn:Honduras hr:Croatia (Hrvatska) ht:Haiti hu:Hungary id:Indonesia ie:Ireland il:Israel in:India io:British Indian Ocean Territory iq:Iraq ir:Iran is:Iceland it:Italy jm:Jamaica jo:Jordan jp:Japan ke:Kenya kg:Kyrgyzstan kh:Cambodia ki:Kiribati km:Comoros kn:Saint Kitts and Nevis kp:Korea (North) kr:Korea (South) kw:Kuwait ky:Cayman Islands kz:Kazakhstan la:Laos lb:Lebanon lc:Saint Lucia li:Liechtenstein lk:Sri Lanka lr:Liberia ls:Lesotho lt:Lithuania lu:Luxembourg lv:Latvia ly:Libya ma:Morocco mc:Monaco md:Moldova mg:Madagascar mh:Marshall Islands mk:Macedonia ml:Mali mm:Myanmar mn:Mongolia mo:Macau mp:Northern Mariana Islands mq:Martinique mr:Mauritania ms:Montserrat mt:Malta mu:Mauritius mv:Maldives mw:Malawi mx:Mexico my:Malaysia mz:Mozambique na:Namibia nc:New Caledonia ne:Niger nf:Norfolk Island ng:Nigeria ni:Nicaragua nl:Netherlands no:Norway np:Nepal nr:Nauru nt:Neutral Zone nu:Niue nz:New Zealand (Aotearoa) om:Oman pa:Panama pe:Peru pf:French Polynesia pg:Papua New Guinea ph:Philippines pk:Pakistan pl:Poland pm:St. Pierre and Miquelon pn:Pitcairn pr:Puerto Rico pt:Portugal pw:Palau py:Paraguay qa:Qatar re:Reunion ro:Romania ru:Russian Federation rw:Rwanda sa:Saudi Arabia sb:Solomon Islands sc:Seychelles sd:Sudan se:Sweden sg:Singapore sh:St. Helena si:Slovenia sj:Svalbard and Jan Mayen Isls. sk:Slovak Republic sl:Sierra Leone sm:San Marino sn:Senegal so:Somalia sr:Suriname st:Sao Tome and Principe su:USSR (former) sv:El Salvador sy:Syria sz:Swaziland tc:Turks and Caicos Islands td:Chad tf:French Southern Territories tg:Togo th:Thailand tj:Tajikistan tk:Tokelau tm:Turkmenistan tn:Tunisia to:Tonga tp:East Timor tr:Turkey tt:Trinidad and Tobago tv:Tuvalu tw:Taiwan tz:Tanzania ua:Ukraine ug:Uganda uk:United Kingdom um:US Minor Outlying Islands us:United States uy:Uruguay uz:Uzbekistan va:Vatican City State (Holy See) vc:Saint Vincent and the Grenadines ve:Venezuela vg:Virgin Islands (British) vi:Virgin Islands (U.S.) vn:Viet Nam vu:Vanuatu wf:Wallis and Futuna Islands ws:Samoa ye:Yemen yt:Mayotte yu:Yugoslavia za:South Africa zm:Zambia zr:Zaire ZW:Zimbabwe com:US Commercial edu:US Educational gov:US Government int:International mil:US Military net:Network org:Non-Profit Organization arpa:Old style Arpanet nato:NATO Field