Tutorial¶
Configuration¶
DeepLog can be executed without config file. however, without config file, the functionalities of DeepLog is very limited, every log item can only be treated as a string. but with pre-defined loggers in config file, DeepLog can know more about logs, not just a string line by line, but also the log time, the log level, error etc.
in general, config file is a yaml file named config.yaml under config root ~/.deep_log/.
example this is what config file look like:
variables:
loghub_root: /tmp/loghub
root:
parser:
name: DefaultLogParser
params:
pattern: (?P<content>.*?)
path: /
loggers:
- name: apache
path: '{loghub_root}/Apache'
modules:
- apache
parser:
name: DefaultLogParser
params:
pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)
handlers:
- name: TypeLogHandler
params:
definitions:
- field: time
format: '%a %b %d %H:%M:%S %Y'
type: datetime
components¶
loggers, logger definitions, the attributes is exactly the same as root. following attributes can be used in this component:modules
description
where the logger used, always ‘/’ for root
define how to parse the log item
define how to handle the parsed log items
define how to filter parsed log items
define which template to refer
root, root is specific logger which define the default logger action, it will be matched if no any other loggers matched.variables, define variables which can be used in logger definitions.templates, define templates which can be reused in loggers. see templates for detail.
logger path hierarchy¶
logger hierarchy is very like file system structure with inheritance. take following as an example:
- and there three loggers and root defined:
apache, with path /tmp/loghub/Apache/proxifier, with path /tmp/loghub/Proxifier/loghubwith path /tmp/loghub/
- when analyzing logs:
if target logs under folder /tmp/loghub/Apache/ , DeepLog will match
apachelogger definition, and will parse logs as apache log.if target logs under folder /tmp/loghub/Proxifier/ , DeepLog will match
proxifierlogger definition, and will parse logs as apache log.if target logs under folder /tmp/loghub/Spark/ , there is no specific loggers defined for Spark log, DeepLog will try to match its parent node, in this case, loghub logger will be used.
if target logs under folder /opt/, there is no any logger matching the path, the default
rootlogger will be used.
Command line Options¶
-c,--configconfig root dir-l,--filterlog filter-t,--meta-filterfilter by meta object extracted from file meta information-n,--file-namefilter by file name-m,--formatprint format-s,--subscribesubscribe data change, processing unbouned change-o,--order-byfield to order by-r,--reversereverse order, only work with order-by--limitlimit query count--windowprocessing window size--workersworkers count run in parallel--recentquery by time to now, for example,-y,--analyzedsl expression for analysis, integrate with pandas--tagsquery by tags--modulesquery by modules--templatelogger template--distinctremove duplicated records by specified fields separated by comma--template_dirlogger template dir--name-onlyshow only file name--fulldisplay full--include-historysubscribe history or not, only work with subscribe mode--pass-on-exceptiondefault value if met exception-D,appenddefinitions--targetlog dirs to analyzepatterndefault string pattern to match
Parser¶
parser is used to parse log line from string to structured data. in DeepLog, currently, there is only one parser named DefaultLogParser.
DefaultLogParser¶
DefaultLogParser use python regular expression named groups to parse log line as a object. with following attributes:
pattern, pattern is named groups regular expression match pattern.
examples:
the log line:
[Sun Dec 04 04:52:15 2005] [error] mod_jk child workerEnv in error state 7
with parser config
parser:
name: DefaultLogParser
params:
pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)
the parsed result will be
handlers¶
handler is used to transfer data which is parsed from parser.DeepLog provide several following handlers:
Note
handler can be defined more than one, and executed in sequence.
TypeLogHandler¶
the type of value in parsed object from parser is always string, TypeLogHandler is always used to convert the value to suitable type. and with following attributes:
- definitions, define a serial of type definitions, one type definition has three sub fields:
field, the field name which will be transferred.
type, the type to transfer.
format, only used when type is datetime, which define the string time format used by strftime function.
examples:
with above parser, we have parsed result
{
'time': 'Sun Dec 04 04:52:15 2005',
'level': 'error',
'message': 'mod_jk child workerEnv in error state 7'
}
with handler configuration:
handlers:
- name: TypeLogHandler
params:
definitions:
- field: time
format: '%a %b %d %H:%M:%S %Y'
type: datetime
path: /
the above handler will transfer the field time in parsed result to datetime object with format %a %b %d %H:%M:%S %Y, the result will be {'time': Datetime.Datetime(2005, 12, 4, 4, 52, 15), ...}
TagLogHandler¶
TagLogHandler is used to tag log line with specified condition. with following attributes:
definitions, define a serial of tag condition definitions, one tag condition has two sub fields:name, tag name.
condition, define the match condition if tag the name.
examples
the handler configuration is:
handlers:
- name: TagLogHandler
params:
definitions:
- name: error
- condition: "'error' == level or 'error' in message"
the above handler will tag log line as error when level is ‘error’ or ‘error’ in message. with above parsed result, the handler output will {tags: Set(‘error’), …}, which can be
StripLogHandler¶
StripLogHandler is a simple handler, which is used to strip all the string values。there is one attribute:
fields, define the string fields to strip. if no fields provided, all the string fields will be stripped.
RegLogHandler¶
RegLogHandler is used to extract values from specific field, which work very likely what DefaultLogParser do. attributes:
pattern, pattern is named groups regular expression match pattern.
examples
handlers:
- name: TypeLogHandler
params:
definitions:
- field: time
format: '%m-%d-%Y %H:%M:%S.%f'
type: datetime
- name: RegLogHandler
params:
pattern: "\n(?P<exception>.*?Exception):(?P<exception_message>.*)"
field: "_record"
the above example show using RegLogHandler to parse exception name and messages.
TransformLogHandler¶
TransformLogHandler use dsl expression to transform record object with new fields. which has attributes:
- definitions, define a serial of type definitions, one type definition has three sub fields:
name, the field to be created
value, the value expression.
examples
handlers:
- name: TransformLogHandler
params:
name: is_today
value: "time.date() == datetime.datetime.today().date()"
the above show using TransformLogHandler to create new field to identify the log date is today or not.
Filters¶
filter is used to filter the log item in the log files.
MetaFilters¶
metaFilters basically is used to filter by log file meta info not log file content.there two kind of meta filters:
NameFilter¶
NameFilter is used to filter file name based on Unix filename pattern matching syntax. which take two arguments:
patterns, define the file name match patterns, which split by comma,.exclude_patterns, define excluded file name match patterns, which split by comma,.
examples
meta_filters:
- name: NameFilter
params:
patterns: '*.log'
exclude_patterns: '*audit.log'
the above means we analyze all the files with extetion name is .log but exclude audit log.
DslMetaFilter¶
DslMetaFilter is a more powerful filer than name filer, which can use python expression the filter file based file meta info. which can take one argument:
filter, which is dsl expression
example
meta_filters:
- name: NameFilter
params:
filter: _size > 0
the above means all empty files will be ignored
Template System¶
logs with the same type always have the same log format. to parse/handle/filter log with the same patterns, user can define those configurations as template.which can be shared by multiple loggers or command line. there are two ways to define templates:
templates in config¶
templates can be defined directly in config.yaml, see following snippet:
templates:
- name: apache
path: '{loghub_root}/Apache'
modules:
- apache
parser:
name: DefaultLogParser
params:
pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)
handlers:
- name: TypeLogHandler
params:
definitions:
- field: time
format: '%a %b %d %H:%M:%S %Y'
type: datetime
loggers:
- name: apache
path: '{loghub_root}/Apache'
modules:
- apache
template: apache
we define template under templates section, and then can be referenced in loggers with template name.
Template Repo¶
besides templates config, templates can also be defined in template repo. we can define all templates under templates folder under config root.
see following apache template for example, you can find full example here:
name: apache
path: '{loghub_root}/Apache'
modules:
- apache
parser:
name: DefaultLogParser
params:
pattern: \[(?P<time>.*?)\] \[(?P<level>.*?)\] (?P<message>.*)
handlers:
- name: TypeLogHandler
params:
definitions:
- field: time
format: '%a %b %d %H:%M:%S %Y'
type: datetime
the above example define apache log template, which can be referenced in loggers.
Dsl Expressions¶
dsl expression in DeepLog in a python expression for different usage with different context, there are four usages in general:
filter, is used to filter log content, which can be--filteroption value, or filter params in DslFilter definitions. Record Object and _module_object are included in context.handler, is advanced usage in TransformLogHandler, both Record Object and _module_object are included in context.meta filer, is only applied on meta filer, which can be--meta-fileroption value or filter param in DslMetaFilter definitions. Meta Object and _module_object are included in context.analyze, is dedicated for analysis function. which can be set in--analyzecommand line option. both Record Object and _module_object are included in context. besides, user can manipulate the df(DataFrame) property in this situation.
Meta Object¶
meta object
built-in meta properties
property |
description |
|---|---|
_name |
filename |
_writable |
file is writable or not |
_readable |
file is readable or not |
_executable |
file is executable or not |
_ctime |
file creaction time |
_mtime |
file modified time |
_actime |
file access time |
_size |
file size |
_basename |
file base name |
Record Object¶
built-in properties¶
_record, file linedf, log items data frame
Note
property df can only be invoked in analysis function.
user-defined items¶
parsed result by parser, for example, parsed time property.
generate by by TransformLogHandler
examples
following is the examples returned by DeepLog:
{
'_name': '/tmp/apache_v2.log' # meta object property, filename
'_size': 10000, # meta object property, file size
'time': Datetime(2025, 12, 04, 4, 52, 5) # user parsed property, parsed by from string 'Sun Dec 04 04:52:05 2005'
}