NAME
    Logfile - Perl extension for generating reports from logfiles

SYNOPSIS
      use Logfile::Cern;

      $l = new Logfile::Cern  File  => 'cache.log.gz', 
                              Group => [Domain,File,Hour];
      $l->report(Group => File,   Sort => Records);
      $l->report(Group => Domain, Sort => Bytes);
      $l->report(Group => Hour, List => [Bytes, Records]);

      use Logfile::Wftp;

      [...]

DESCRIPTION
    The Logfile extension will help you generate various reports from
    different server logfiles. In general there is no restriction as to what
    information you extract from the logfiles.

  Reading the files

    The package can be customized by subclassing `Logfile'.

    A subclass should provide a funtion `next' which reads the next record
    from the file handle `$self->{Fh}' and returns an object of type
    `Logfile::Record'. In addition a function `norm' may be specified to
    normalize the various record fields.

    Here is a shortened version of the `Logfile::Cern' class:

      package Logfile::Cern;
      @ISA = qw ( Logfile::Base ) ;

      sub next {
          my $self = shift;
          my $fh = $self->{Fh};

          *S = $fh;
          my ($line,$host,$user,$pass,$rest,$date,$req,$code,$bytes);

          ($host,$user,$pass,$rest) = split ' ', $line, 4;
          ($rest =~ s!\[([^\]]+)\]\s*!!) && ($date = $1);
          ($rest =~ s!\"([^\"]+)\"\s*!!) && ($req = (split ' ', $1)[1]);
          ($code, $bytes) = split ' ', $rest;
          Logfile::Record->new(Host  => $host,
                               Date  => $date,
                               File  => $req,
                               Bytes => $bytes);
      }

    As stated above, in general you are free to choose the fields you enter
    in the record. But:

    Date should be a valid date string. For conversion to the seconds elapsed
         since the start of epoch the modules GetDate and Date::DateParse
         are tried. If both cannot be `use'ed, a crude build-in module is
         used.

         The record constructor replaces Date by the date in `yymmdd' form
         to make it sortable. Also the field Hour is padded in.

    Host Setting Host will also set field Domain by the verbose name of the
         country given by the the domain suffix of the fully qualified
         domain name (hostname.domain). `foo.bar.PG' will be mapped to
         `Papua New'. Hostnames containing no dot will be assigned to the
         domain Local. IP numbers will be assiged to the domain Unresolved.
         Mapping of short to long domain names is done in the Net::Country
         extension which might be usefull in other contexts:

           use Net::Country;
           $germany = Net::Country::Name('de');

    Records
         is always set to 1 in the `Record' constructor. So this field gives
         the number of successful returns from the `next' function.

    Here is the shortened optional `norm' method:

      sub norm {
          my ($self, $key, $val) = @_;

          if ($key eq File) {
              $val =~ s/\?.*//;                             # remove query
              $val =~ s!%([\da-f][\da-f])!chr(hex($1))!eig; # decode escapes
          }
          $val;
      }

    The constructor reads in a logfile and builds one or more indices.

      $l = new Logfile::Cern  File => 'cache.log.gz', 
                              Group => [Host,Domain,File,Hour,Date];

    There is little space but some time overhead in generating additional
    indexes. If the File parameter is not given, STDIN is used. The Group
    parameter may be a field name or a reference to a list of field names.
    Only the field names given as constructor argument can be used for
    report generation.

  Report Generation

    The Index to use for a report must be given as the Group parameter.
    Output is sorted by the index field unless a Sort parameter is given.
    Also the output can be truncated by a Top argument or Limit.

    The report generator lists the fields Bytes and Records for a given
    index. The option List may be a single field name or a reference to an
    array fo field names. It specifies which field should be listed in
    addition to the Group field. List defaults to Records.

      $l->report(Group => Domain, List => [Bytes, Records])

    Output is sorted by the Group field unless overwritten by a Sort option.
    Default sorting order is increasing for Date and Hour fields and
    decreasing for all other Fields. The order can be reversed using the
    Reverse option.

    This code

      $l->report(Group => File, Sort => Records, Top => 10);

    prints:

      File                          Records 
      =====================================
      /htbin/SFgate               30 31.58% 
      /freeWAIS-sf/*              22 23.16% 
      /SFgate/SFgate               8  8.42% 
      /SFgate/SFgate-small         7  7.37% 
      /icons/*                     4  4.21% 
      /~goevert                    3  3.16% 
      /journals/SIGMOD             3  3.16% 
      /SFgate/ciw                  2  2.11% 
      /search                      1  1.05% 
      /reports/96/                 1  1.05% 

    Here are other examples. Also take a look at the t/* files.

      $l->report(Group => Domain, Sort => Bytes);

      Domain                  Records 
      ===============================
      Germany               12 12.63% 
      Unresolved             8  8.42% 
      Israel                34 35.79% 
      Denmark                4  4.21% 
      Canada                 3  3.16% 
      Network                6  6.32% 
      US Commercial         14 14.74% 
      US Educational         8  8.42% 
      Hong Kong              2  2.11% 
      Sweden                 2  2.11% 
      Non-Profit             1  1.05% 
      Local                  1  1.05% 
      
      $l->report(Group => Hour, List => [Bytes, Records]);

      Hour            Bytes          Records 
      ======================================
      07      245093 17.66%        34 35.79% 
      08      438280 31.59%        19 20.00% 
      09      156730 11.30%        11 11.58% 
      10      255451 18.41%        16 16.84% 
      11      274521 19.79%        10 10.53% 
      12       17396  1.25%         5  5.26% 

  Report options

    Group `=>' *field*
         Mandatory. *field* must be one of the fields passed to the
         constructor.

    List `=>' *field*
    List `=>' [*field*, *field*]
         List the subtotals for *field*s. Defaults to Records.

    Sort `=>' *field*.
         Sort output by *field*. By default, Date and Hour are sorted in
         increasing order, whereas all other fields are sorted in decreasing
         order.

    Reverse `=> 1'
         Reverse sorting order.

    Top `=>' *number*
         Print only the first *number* subtotals.

    Limit `=>' *number*
         Print only the subtotals with Sort field greater than *number*
         (less than number if sorted in increasing order).

    Currently reports are simply printed to STDOUT.

AUTHOR
    Ulrich Pfeifer <pfeifer@ls6.informatik.uni-dortmund.de>

SEE ALSO
    perl(1).