Quickstart¶
A minimal example¶
suppose we’d like to analyze Apache Log. like following:
[Sun Dec 04 04:52:05 2005] [notice] jk2_init() Found child 6737 in scoreboard slot 8
[Sun Dec 04 04:52:12 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Sun Dec 04 04:52:12 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Sun Dec 04 04:52:12 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Sun Dec 04 04:52:15 2005] [error] mod_jk child workerEnv in error state 6
[Sun Dec 04 04:52:15 2005] [error] mod_jk child workerEnv in error state 7
[Sun Dec 04 04:52:15 2005] [error] mod_jk child workerEnv in error state 7
How to work like grep¶
search all error message:
$ grep error apache_v2.log # using grep
$ dl error --target apache_v2.log # using DeepLog
search all error message and with state 6:
$ grep error apache_v2.log | grep 'state 6' # using grep
$ dl --filter="'error' in _record and 'state 6' in _record" --target apache_v2.log # using DeepLog
Not Just grep - groupby function¶
how to get the log count groupby log level?
grep not fully support such case, but with DeepLog we can do like with following steps:
setup a config file named `config.yaml`_. under ~/.deep_log, with content:
then analyze with groupby function
to leverage powerful pandas function, we can easily get the results.
How about more than one type of logs?¶
the above examples show how to define etl processing in root logger in above. how about one more type of logs? let’s say we have another `proxifier log`_ with different log format to analyze.how to define parser in config file?
we can define more loggers under loggers section in `config.yaml`_.
loggers:
- name: apache
path: '{loghub_root}/Apache'
modules:
- apache
parser:
...
- name: proxifier
path: '{loghub_root}/Proxifier'
modules:
- proxifier
...
then we can use different logger to analyze different type of logs.
how to process unbounded data¶
logs are always increased by time, how to monitor the log changes?
DeepLog provide a option --subscribe to do this, which is quite powerful that it can subscribe the log changes and treat them a data stream to process.
$ dl --subscribe --filter="'error' == level"
it will print out the error message incoming logs, like tail -f <filename>| grep error
what I can do next?¶
as a log analysis system, the main problems are always two parts:
how to find what i want¶
DeepLog provide rich functionalities help user to find what they want
--filter, DeepLog can use python dsl expression as a filter to get what users really want to.--name-filter, DeepLog provided name filter which can filter file name directly. you can refer to NameFilter for the pattern definitions.--meta-filter, DeepLog provided a more powerful file filter which can filter log file by file metadata. you can refer to DslMetaFilter for the pattern definitions.
how to analyze what i found¶
DeepLog also provide lots of functions to support data analysis:
--analyze, the most powerful part in DeepLog is the integration with `pandas`_. you can leverage pandas analysis function in analyze options.
--order-by, user can order by parsed log items by specific columns.--distinct, user can remove duplicated log items with same value with user specified columns.--subscribe, with subscribe mode, user can process unbounded log data like streaming processing.
one more thing¶
how to speed up log processing if met too much logs to handle?
DeepLog support multiple processing, user specific the processors to run in parallel by the option --workers.
$ dl error --target /logs --workers=8
it will launch 8 processes to work in parallel for log analysis.