Tutorial

Configuration

DeepLog can be executed without config file. however, without config file, the functionalities of DeepLog is very limited, every log item can only be treated as a string. but with pre-defined loggers in config file, DeepLog can know more about logs, not just a string line by line, but also the log time, the log level, error etc. in general, config file is a yaml file named config.yaml under config root ~/.deep_log/.

example this is what config file look like:

variables:
  loghub_root: /tmp/loghub
root:
  parser:
    name: DefaultLogParser
    params:
      pattern: (?P<content>.*?)
  path: /
loggers:
  - name: apache
    path: '{loghub_root}/Apache'
    modules:
      - apache
    parser:
      name: DefaultLogParser
      params:
        pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)
    handlers:
      - name: TypeLogHandler
        params:
          definitions:
            - field: time
              format: '%a %b %d %H:%M:%S %Y'
              type: datetime

components

  • loggers, logger definitions, the attributes is exactly the same as root. following attributes can be used in this component:

    modules

    description

    path

    where the logger used, always ‘/’ for root

    parser

    define how to parse the log item

    handlers

    define how to handle the parsed log items

    filters

    define how to filter parsed log items

    template

    define which template to refer

  • root, root is specific logger which define the default logger action, it will be matched if no any other loggers matched.

  • variables, define variables which can be used in logger definitions.

  • templates, define templates which can be reused in loggers. see templates for detail.

logger path hierarchy

logger hierarchy is very like file system structure with inheritance. take following as an example:

and there three loggers and root defined:
  • apache, with path /tmp/loghub/Apache/

  • proxifier, with path /tmp/loghub/Proxifier/

  • loghub with path /tmp/loghub/

when analyzing logs:
  • if target logs under folder /tmp/loghub/Apache/ , DeepLog will match apache logger definition, and will parse logs as apache log.

  • if target logs under folder /tmp/loghub/Proxifier/ , DeepLog will match proxifier logger definition, and will parse logs as apache log.

  • if target logs under folder /tmp/loghub/Spark/ , there is no specific loggers defined for Spark log, DeepLog will try to match its parent node, in this case, loghub logger will be used.

  • if target logs under folder /opt/, there is no any logger matching the path, the default root logger will be used.

Command line Options

  • -c, --config config root dir

  • -l, --filter log filter

  • -t, --meta-filter filter by meta object extracted from file meta information

  • -n, --file-name filter by file name

  • -m, --format print format

  • -s, --subscribe subscribe data change, processing unbouned change

  • -o, --order-by field to order by

  • -r, --reverse reverse order, only work with order-by

  • --limit limit query count

  • --window processing window size

  • --workers workers count run in parallel

  • --recent query by time to now, for example,

  • -y, --analyze dsl expression for analysis, integrate with pandas

  • --tags query by tags

  • --modules query by modules

  • --template logger template

  • --distinct remove duplicated records by specified fields separated by comma

  • --template_dir logger template dir

  • --name-only show only file name

  • --full display full

  • --include-history subscribe history or not, only work with subscribe mode

  • --pass-on-exception default value if met exception

  • -D, append definitions

  • --target log dirs to analyze

  • pattern default string pattern to match

Parser

parser is used to parse log line from string to structured data. in DeepLog, currently, there is only one parser named DefaultLogParser.

DefaultLogParser

DefaultLogParser use python regular expression named groups to parse log line as a object. with following attributes:

  • pattern, pattern is named groups regular expression match pattern.

examples:

the log line:

[Sun Dec 04 04:52:15 2005] [error] mod_jk child workerEnv in error state 7

with parser config

parser:
  name: DefaultLogParser
  params:
    pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)

the parsed result will be

handlers

handler is used to transfer data which is parsed from parser.DeepLog provide several following handlers:

Note

handler can be defined more than one, and executed in sequence.

TypeLogHandler

the type of value in parsed object from parser is always string, TypeLogHandler is always used to convert the value to suitable type. and with following attributes:

  • definitions, define a serial of type definitions, one type definition has three sub fields:
    • field, the field name which will be transferred.

    • type, the type to transfer.

    • format, only used when type is datetime, which define the string time format used by strftime function.

examples:

with above parser, we have parsed result

{
    'time': 'Sun Dec 04 04:52:15 2005',
    'level': 'error',
    'message': 'mod_jk child workerEnv in error state 7'
}

with handler configuration:

handlers:
- name: TypeLogHandler
  params:
    definitions:
      - field: time
        format: '%a %b %d %H:%M:%S %Y'
        type: datetime
path: /

the above handler will transfer the field time in parsed result to datetime object with format %a %b %d %H:%M:%S %Y, the result will be {'time': Datetime.Datetime(2005, 12, 4, 4, 52, 15), ...}

TagLogHandler

TagLogHandler is used to tag log line with specified condition. with following attributes:

  • definitions, define a serial of tag condition definitions, one tag condition has two sub fields:
    • name, tag name.

    • condition, define the match condition if tag the name.

examples

the handler configuration is:

handlers:
  - name: TagLogHandler
    params:
      definitions:
        - name: error
        - condition: "'error' == level or 'error' in message"

the above handler will tag log line as error when level is ‘error’ or ‘error’ in message. with above parsed result, the handler output will {tags: Set(‘error’), …}, which can be

StripLogHandler

StripLogHandler is a simple handler, which is used to strip all the string values。there is one attribute:

  • fields, define the string fields to strip. if no fields provided, all the string fields will be stripped.

RegLogHandler

RegLogHandler is used to extract values from specific field, which work very likely what DefaultLogParser do. attributes:

examples

handlers:
  - name: TypeLogHandler
    params:
      definitions:
        - field: time
          format: '%m-%d-%Y %H:%M:%S.%f'
          type: datetime
  - name: RegLogHandler
    params:
      pattern: "\n(?P<exception>.*?Exception):(?P<exception_message>.*)"
      field: "_record"

the above example show using RegLogHandler to parse exception name and messages.

TransformLogHandler

TransformLogHandler use dsl expression to transform record object with new fields. which has attributes:

  • definitions, define a serial of type definitions, one type definition has three sub fields:
    • name, the field to be created

    • value, the value expression.

examples

handlers:
  - name: TransformLogHandler
    params:
      name: is_today
      value: "time.date() == datetime.datetime.today().date()"

the above show using TransformLogHandler to create new field to identify the log date is today or not.

Filters

filter is used to filter the log item in the log files.

DslFilter

DslFilter is a filter which accept a python expression as a filter condition. with attributes:

  • filter, a `dsl expression<dl_dsl>’_, which evaluate filter condition.

  • pass_on_exception, bool type, mark the condition as True or False if met condition

MetaFilters

metaFilters basically is used to filter by log file meta info not log file content.there two kind of meta filters:

NameFilter

NameFilter is used to filter file name based on Unix filename pattern matching syntax. which take two arguments:

  • patterns, define the file name match patterns, which split by comma ,.

  • exclude_patterns, define excluded file name match patterns, which split by comma ,.

examples

meta_filters:
  - name: NameFilter
    params:
      patterns: '*.log'
      exclude_patterns: '*audit.log'

the above means we analyze all the files with extetion name is .log but exclude audit log.

DslMetaFilter

DslMetaFilter is a more powerful filer than name filer, which can use python expression the filter file based file meta info. which can take one argument:

example

meta_filters:
  - name: NameFilter
    params:
      filter: _size > 0

the above means all empty files will be ignored

Template System

logs with the same type always have the same log format. to parse/handle/filter log with the same patterns, user can define those configurations as template.which can be shared by multiple loggers or command line. there are two ways to define templates:

templates in config

templates can be defined directly in config.yaml, see following snippet:

templates:
  - name: apache
    path: '{loghub_root}/Apache'
    modules:
      - apache
    parser:
      name: DefaultLogParser
      params:
        pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)
    handlers:
      - name: TypeLogHandler
        params:
          definitions:
            - field: time
              format: '%a %b %d %H:%M:%S %Y'
              type: datetime
loggers:
  - name: apache
    path: '{loghub_root}/Apache'
    modules:
      - apache
    template: apache

we define template under templates section, and then can be referenced in loggers with template name.

Template Repo

besides templates config, templates can also be defined in template repo. we can define all templates under templates folder under config root. see following apache template for example, you can find full example here:

name: apache
path: '{loghub_root}/Apache'
modules:
  - apache
parser:
  name: DefaultLogParser
  params:
    pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)
handlers:
  - name: TypeLogHandler
    params:
      definitions:
        - field: time
          format: '%a %b %d %H:%M:%S %Y'
          type: datetime

the above example define apache log template, which can be referenced in loggers.

Dsl Expressions

dsl expression in DeepLog in a python expression for different usage with different context, there are four usages in general:

  • filter, is used to filter log content, which can be --filter option value, or filter params in DslFilter definitions. Record Object and _module_object are included in context.

  • handler, is advanced usage in TransformLogHandler, both Record Object and _module_object are included in context.

  • meta filer, is only applied on meta filer, which can be --meta-filer option value or filter param in DslMetaFilter definitions. Meta Object and _module_object are included in context.

  • analyze, is dedicated for analysis function. which can be set in --analyze command line option. both Record Object and _module_object are included in context. besides, user can manipulate the df(DataFrame) property in this situation.

Meta Object

meta object

  • built-in meta properties

property

description

_name

filename

_writable

file is writable or not

_readable

file is readable or not

_executable

file is executable or not

_ctime

file creaction time

_mtime

file modified time

_actime

file access time

_size

file size

_basename

file base name

Record Object

built-in properties

  • Meta Object

  • _record, file line

  • df, log items data frame

Note

property df can only be invoked in analysis function.

user-defined items

  • parsed result by parser, for example, parsed time property.

  • generate by by TransformLogHandler

examples

following is the examples returned by DeepLog:

{
    '_name': '/tmp/apache_v2.log' # meta object property, filename
    '_size': 10000, # meta object property, file size
    'time': Datetime(2025, 12, 04, 4, 52, 5) # user parsed property, parsed by from string 'Sun Dec 04 04:52:05 2005'
}

Built-in Modules

there are kinds of python modules exposed which can be invoked in dsl Expressions:

modules

description

re

Regular expression operations

path

Common pathname manipulations

datetime

Basic date and time types