PyLogsParser: how to write a normalizer

  • Sharebar

Wallix logo

We saw in a previous article how to use the PyLogsParser library in order to analyze connection logs from a SSH server. This was a simple, basic usage of the library. In this article, we will go further and see how we can extend the parsing power of PyLogsParser by writing a new normalizer definition file.

For this, we will continue exploring authentication issues and focus on a useful application called Fail2ban.

Fail2ban, the unforgiving bouncer

According to the Fail2ban README, “fail2ban scans log files and bans IP addresses that make too many password failures. It updates firewall rules to reject the IP address.” Fail2ban is therefore very useful to stop bruteforce attacks over SSH at their beginning. There are many similar projects to Fail2ban, such as DenyHosts, BFD …

As useful as Fail2ban is, pylogsparser cannot parse its log file with the default normalizers shipped with the library. So let’s write a normalizer for Fail2ban !

What’s needed

From now on, most files mentioned in this article and relative to pylogsparser’s normalizers can be found either here : /usr/share/normalizers/ if you have installed pylogsparser correctly, or here : [GIT_FOLDER]/pylogsparser/normalizers/ if you have downloaded the project, where GIT_FOLDER is where your git branch is located.

In order to write a normalizer, you will need:

  • a basic knowledge of XML, since it is the definition file format. If you know how to read a DTD file, the library includes a definition file with enough comments to let you figure out what to write in a normalizer definition file. You can also browse standard normalizers and use them as inspiration for your own custom normalizer ( the syslog normalizer is a good starting point ).
  • a minimal knowledge of python, at least how to write functions.
  • a minimal understanding of regular expressions in python.
  • some documentation about the log format you want to analyze, or at least
    enough log samples that you can use to figure out the log format.

You are also strongly advised to get familiar with pylogsparser’s README file, as it defines the best practices when it comes to tag naming.

Since the Fail2ban wiki does not seem to document the application’s log format, we will have to base our normalizer on samples. Luckily for us, the logs are rather straightforward, as we will see in the next section.

Preliminary analysis

Attached to this article is an excerpt from a Fail2ban log file ( /var/log/fail2ban.log ). It is easy to see how the log entries are composed :

  1. First there is a timestamp, up to the millisecond
  2. Then there is the name of the program, a dot and the program’s component logging the event
  3. Then there is the type of the logged event (INFO or WARNING)
  4. And finally the body of the log, the message describing the event logged.

Additionnally, we can see that WARNING messages have a specific format:

  • the protected protocol, between brackets
  • the action taken, either “Ban” or “Unban”
  • the source IP of the connection prompting Fail2ban’s action.

Therefore, we can describe Fail2ban’s logs with these two patterns :

  • TIMESTAMP PROGRAM.COMPONENT: INFO BODY
  • TIMESTAMP PROGRAM.COMPONENT: WARNING [PROTOCOL] ACTION SOURCE_IP

Where the “variable” parts are of the following syntactical forms :

  • TIMESTAMP : a formatted timestamp
  • PROGRAM : will always be set to “fail2ban”
  • COMPONENT : a string of lowercase characters
  • BODY : a string of any characters
  • PROTOCOL : a string of lower case alphanumerical characters
  • ACTION : “Ban” or “Unban”
  • SOURCE_IP : a IP address

Our preliminary analysis is done, we have everything we need to start writing the normalizer descriptor.

Step 1 : the patterns

A pylogsparser normalizer consists, at its core, of a list of patterns describing log lines. For each pattern, the normalizer describes where the metadata is, how to extract it, and how to name it.

Let’s open the template normalizer (normalizer.template) and start writing down our patterns there :

<?xml version="1.0" encoding="UTF-8"?>
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- pylogparser - Logs parsers python library                  -->
<!-- Copyright (C) 2011 Wallix Inc.                             -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- This package is free software; you can redistribute        -->
<!-- it and/or modify it under the terms of the GNU Lesser      -->
<!-- General Public License as published by the Free Software   -->
<!-- Foundation; either version 2.1 of the License, or (at      -->
<!-- your option) any later version.                            -->
<!--                                                            -->
<!-- This package is distributed in the hope that it will be    -->
<!-- useful, but WITHOUT ANY WARRANTY; without even the implied -->
<!-- warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR    -->
<!-- PURPOSE.  See the GNU Lesser General Public License for    -->
<!-- more details.                                              -->
<!--                                                            -->
<!-- You should have received a copy of the GNU Lesser General  -->
<!-- Public License along with this package; if not, write      -->
<!-- to the Free Software Foundation, Inc., 59 Temple Place,    -->
<!-- Suite 330, Boston, MA  02111-1307  USA                     -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!DOCTYPE normalizer SYSTEM "normalizer.dtd">
<normalizer name="Fail2ban"
            version="0.99"
            unicode="yes"
            ignorecase="yes"
            matchtype="match"
            appliedTo="raw">
    <patterns>
        <pattern name="FAIL2BAN-INFO">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: INFO\s+BODY</text>
            <tags>
                <tag name="date" tagType="myCustomTagType">
                    <substitute>TIMESTAMP</substitute>
                </tag>
                <tag name="program" tagType="myCustomTagType">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="myCustomTagType">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="body" tagType="myCustomTagType">
                    <substitute>BODY</substitute>
                </tag>
            </tags>
        </pattern>
        <pattern name="FAIL2BAN-WARNING">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: WARNING\s+\[PROTOCOL\] ACTION SOURCE_IP</text>
            <tags>
                <tag name="date" tagType="myCustomTagType">
                    <substitute>TIMESTAMP</substitute>
                </tag>
                <tag name="program" tagType="myCustomTagType">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="myCustomTagType">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="protocol" tagType="myCustomTagType">
                    <substitute>PROTOCOL</substitute>
                </tag>
                <tag name="action" tagType="myCustomTagType">
                    <substitute>ACTION</substitute>
                </tag>
                <tag name="source_ip" tagType="myCustomTagType">
                    <substitute>SOURCE_IP</substitute>
                </tag>
            </tags>
        </pattern>
    </patterns>
</normalizer>

Let’s step back a bit and comment on what we have just done :

  • We have put the name of our normalizer in the “normalizer” node. It is a good practice to name your normalizer description file the same as the name you give it there.
  • The attributes “unicode” and “ignorecase” are pattern matching options that are rather self-explaining.
  • The attribute “matchtype” can be set to “search” or “match” : if set to “match”, the defined patterns will have to match at the beginning of a log line for the log line to be processed. If set to “search”, the pattern can be looked for anywhere in the log line.
  • If you remember the previous article, the normalizer engine expects dictionaries as its input. the “appliedTo” attribute tells which key to use in order to apply our patterns. As the fail2ban logs won’t be encapsulated in any log transport protocol such as syslog, we don’t expect them to be preprocessed before the normalization process. Therefore, the patterns must be applied to “raw” material (which is the key we use to store the raw log line in the LogBox project).
  • The patterns’ texts are “semi regular expressions” : it means that you can use the python regular expressions syntax as you write them, and must escape special characters. Using substitution terms make the patterns more descriptive and much easier to read and maintain than a full regular expression.
  • We have followed the project’s best practices on tag naming : the IP prompting the action is referred as “source_ip”.

Now that the normalizer knows what the Fail2ban’s log lines look like and what data to extract from them, we need to teach it how to extract the data, syntactically speaking.

Step 2 : what’s in a tag ?

The previous definition file defines the patterns, the metadata position in the patterns, and how to name them. In order to define the syntactic form of the metadata, we will now define tagTypes. It is also possible to use some very common types that are defined in the file common_tagTypes.xml . In this example we will use a bit of both : we will at least have to define the timestamp format.

Here we go :

<?xml version="1.0" encoding="UTF-8"?>
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- pylogparser - Logs parsers python library                  -->
<!-- Copyright (C) 2011 Wallix Inc.                             -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- This package is free software; you can redistribute        -->
<!-- it and/or modify it under the terms of the GNU Lesser      -->
<!-- General Public License as published by the Free Software   -->
<!-- Foundation; either version 2.1 of the License, or (at      -->
<!-- your option) any later version.                            -->
<!--                                                            -->
<!-- This package is distributed in the hope that it will be    -->
<!-- useful, but WITHOUT ANY WARRANTY; without even the implied -->
<!-- warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR    -->
<!-- PURPOSE.  See the GNU Lesser General Public License for    -->
<!-- more details.                                              -->
<!--                                                            -->
<!-- You should have received a copy of the GNU Lesser General  -->
<!-- Public License along with this package; if not, write      -->
<!-- to the Free Software Foundation, Inc., 59 Temple Place,    -->
<!-- Suite 330, Boston, MA  02111-1307  USA                     -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!DOCTYPE normalizer SYSTEM "normalizer.dtd">
<normalizer name="Fail2ban"
            version="0.99"
            unicode="yes"
            ignorecase="yes"
            matchtype="match"
            appliedTo="raw">
    <tagTypes>
        <tagType name="f2bTimeStamp" type="basestring">
            <regexp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}</regexp>
        </tagType>
        <tagType name="f2bProgram" type="basestring">
            <regexp>fail2ban</regexp>
        </tagType>
        <tagType name="SimpleWord" type="basestring">
            <regexp>\w+</regexp>
        </tagType>
        <tagType name="f2bAction" type="basestring">
            <regexp>(?:Ban)|(?:Unban)</regexp>
        </tagType>
    </tagTypes>
    <patterns>
        <pattern name="FAIL2BAN-INFO">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: INFO\s+BODY</text>
            <tags>
                <tag name="date" tagType="f2bTimeStamp">
                    <substitute>TIMESTAMP</substitute>
                </tag>
                <tag name="program" tagType="f2bProgram">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="SimpleWord">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="body" tagType="Anything">
                    <substitute>BODY</substitute>
                </tag>
            </tags>
        </pattern>
        <pattern name="FAIL2BAN-WARNING">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: WARNING\s+\[PROTOCOL\] ACTION SOURCE_IP</text>
            <tags>
                <tag name="date" tagType="f2bTimeStamp">
                    <substitute>TIMESTAMP</substitute>
                </tag>
                <tag name="program" tagType="f2bProgram">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="SimpleWord">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="protocol" tagType="SimpleWord">
                    <substitute>PROTOCOL</substitute>
                </tag>
                <tag name="action" tagType="f2bAction">
                    <substitute>ACTION</substitute>
                </tag>
                <tag name="source_ip" tagType="IP">
                    <substitute>SOURCE_IP</substitute>
                </tag>
            </tags>
        </pattern>
    </patterns>
</normalizer>

Once again, let’s comment on what we’ve just done :

  • “Anything” and “IP” are common tagTypes, the rest is defined in this file.
  • Though always present, the “type” attribute is not used at the moment in the normalization engine. It might be used later as a type validation, post normalization. Therefore leaving the default value (“basestring”) has no impact at all.
  • The more precise the regular expressions you use to define tagTypes, the better. It would be easy to simply use the “Anything” type for everything, but then the normalizer could lose precision and fire a lot of false positives.

Step 3: post processing

In every normalizer shipping with pylogsparser, metadata expressing a date are always converted from text to python DateTime objects for obvious reasons. We will do that too with our timestamp, thanks to a callback
function. Callback functions are python code that expect two arguments :

  • value, which is the text value on which to call the function,
  • log, which is the log and its current metadata in dictionary form.

To learn more about callback functions, see the dedicated paragraph in the project’s README.

Let’s write the callback :

<?xml version="1.0" encoding="UTF-8"?>
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- pylogparser - Logs parsers python library                  -->
<!-- Copyright (C) 2011 Wallix Inc.                             -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- This package is free software; you can redistribute        -->
<!-- it and/or modify it under the terms of the GNU Lesser      -->
<!-- General Public License as published by the Free Software   -->
<!-- Foundation; either version 2.1 of the License, or (at      -->
<!-- your option) any later version.                            -->
<!--                                                            -->
<!-- This package is distributed in the hope that it will be    -->
<!-- useful, but WITHOUT ANY WARRANTY; without even the implied -->
<!-- warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR    -->
<!-- PURPOSE.  See the GNU Lesser General Public License for    -->
<!-- more details.                                              -->
<!--                                                            -->
<!-- You should have received a copy of the GNU Lesser General  -->
<!-- Public License along with this package; if not, write      -->
<!-- to the Free Software Foundation, Inc., 59 Temple Place,    -->
<!-- Suite 330, Boston, MA  02111-1307  USA                     -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!DOCTYPE normalizer SYSTEM "normalizer.dtd">
<normalizer name="Fail2ban"
            version="0.99"
            unicode="yes"
            ignorecase="yes"
            matchtype="match"
            appliedTo="raw">
    <tagTypes>
        <tagType name="f2bTimeStamp" type="basestring">
            <regexp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}</regexp>
        </tagType>
        <tagType name="f2bProgram" type="basestring">
            <regexp>fail2ban</regexp>
        </tagType>
        <tagType name="SimpleWord" type="basestring">
            <regexp>\w+</regexp>
        </tagType>
        <tagType name="f2bAction" type="basestring">
            <regexp>(?:Ban)|(?:Unban)</regexp>
        </tagType>
    </tagTypes>
    <callbacks>
        <callback name="decodeF2bTimeStamp">
timestamp, milliseconds = value.split(',', 1)
newdate = datetime(int(timestamp[:4]),
                   int(timestamp[5:7]),
                   int(timestamp[8:10]),
                   int(timestamp[11:13]),
                   int(timestamp[14:16]),
                   int(timestamp[17:19]))
log["date"] = newdate.replace(microsecond = int(milliseconds) * 1000 )
        </callback>
    </callbacks>
    <patterns>
        <pattern name="FAIL2BAN-INFO">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: INFO\s+BODY</text>
            <tags>
                <tag name="__date" tagType="f2bTimeStamp">
                    <substitute>TIMESTAMP</substitute>
                    <callbacks>
                        <callback>decodeF2bTimeStamp</callback>
                    </callbacks>
                </tag>
                <tag name="program" tagType="f2bProgram">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="SimpleWord">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="body" tagType="Anything">
                    <substitute>BODY</substitute>
                </tag>
            </tags>
        </pattern>
        <pattern name="FAIL2BAN-WARNING">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: WARNING\s+\[PROTOCOL\] ACTION SOURCE_IP</text>
            <tags>
                <tag name="__date" tagType="f2bTimeStamp">
                    <substitute>TIMESTAMP</substitute>
                    <callbacks>
                        <callback>decodeF2bTimeStamp</callback>
                    </callbacks>
                </tag>
                <tag name="program" tagType="f2bProgram">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="SimpleWord">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="protocol" tagType="SimpleWord">
                    <substitute>PROTOCOL</substitute>
                </tag>
                <tag name="action" tagType="f2bAction">
                    <substitute>ACTION</substitute>
                </tag>
                <tag name="source_ip" tagType="IP">
                    <substitute>SOURCE_IP</substitute>
                </tag>
            </tags>
        </pattern>
    </patterns>
</normalizer>

There is one important thing to note here : we changed the name of the “date” tag to “__date”. Any tag name starting with __ will be considered temporary, typically a value that will be passed to a callback function, in which a final, processed value will then be assigned to a metadata; the initial value is then discarded from the log dictionary.

Unfortunately we cannot use datetime’s function strptime to parse the timestamp with a format string : this limitation is due to the restricted environment allowed to write callback functions. Instead, we use the datetime constructor.

Step 4 : Validation

The normalizer definition file we wrote so far should be ready to be used as it is right now. But we need to make sure that the file is valid, that it normalizes Fail2ban logs the way we expect it to do, and that it doesn’t interfere with existing log normalizers.

You can of course test your description file manually, but the pylogsparser library comes bundled with some testing tools that can automate the process. We will see now how to use these tools.

But first we need to add some real log samples as examples to our definition file: these examples are used for documentation and validation purposes.

Let’s add some examples now :

<?xml version="1.0" encoding="UTF-8"?>
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- pylogparser - Logs parsers python library                  -->
<!-- Copyright (C) 2011 Wallix Inc.                             -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!--                                                            -->
<!-- This package is free software; you can redistribute        -->
<!-- it and/or modify it under the terms of the GNU Lesser      -->
<!-- General Public License as published by the Free Software   -->
<!-- Foundation; either version 2.1 of the License, or (at      -->
<!-- your option) any later version.                            -->
<!--                                                            -->
<!-- This package is distributed in the hope that it will be    -->
<!-- useful, but WITHOUT ANY WARRANTY; without even the implied -->
<!-- warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR    -->
<!-- PURPOSE.  See the GNU Lesser General Public License for    -->
<!-- more details.                                              -->
<!--                                                            -->
<!-- You should have received a copy of the GNU Lesser General  -->
<!-- Public License along with this package; if not, write      -->
<!-- to the Free Software Foundation, Inc., 59 Temple Place,    -->
<!-- Suite 330, Boston, MA  02111-1307  USA                     -->
<!--                                                            -->
<!--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-->
<!DOCTYPE normalizer SYSTEM "normalizer.dtd">
<normalizer name="Fail2ban"
            version="0.99"
            unicode="yes"
            ignorecase="yes"
            matchtype="match"
            appliedTo="raw">
    <tagTypes>
        <tagType name="f2bTimeStamp" type="basestring">
            <regexp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}</regexp>
        </tagType>
        <tagType name="f2bProgram" type="basestring">
            <regexp>fail2ban</regexp>
        </tagType>
        <tagType name="SimpleWord" type="basestring">
            <regexp>\w+</regexp>
        </tagType>
        <tagType name="f2bAction" type="basestring">
            <regexp>(?:Ban)|(?:Unban)</regexp>
        </tagType>
    </tagTypes>
    <callbacks>
        <callback name="decodeF2bTimeStamp">
timestamp, milliseconds = value.split(',', 1)
newdate = datetime(int(timestamp[:4]),
                   int(timestamp[5:7]),
                   int(timestamp[8:10]),
                   int(timestamp[11:13]),
                   int(timestamp[14:16]),
                   int(timestamp[17:19]))
log["date"] = newdate.replace(microsecond = int(milliseconds) * 1000 )
        </callback>
    </callbacks>
    <patterns>
        <pattern name="FAIL2BAN-INFO">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: INFO\s+BODY</text>
            <tags>
                <tag name="__date" tagType="f2bTimeStamp">
                    <substitute>TIMESTAMP</substitute>
                    <callbacks>
                        <callback>decodeF2bTimeStamp</callback>
                    </callbacks>
                </tag>
                <tag name="program" tagType="f2bProgram">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="SimpleWord">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="body" tagType="Anything">
                    <substitute>BODY</substitute>
                </tag>
            </tags>
            <examples>
                <example>
                     <text>2011-09-27 05:02:26,908 fail2ban.server : INFO   Changed logging target to /var/log/fail2ban.log for Fail2ban v0.8.4</text>
                     <expectedTags>
                          <expectedTag name="program">fail2ban</expectedTag>
                          <expectedTag name="component">server</expectedTag>
                          <expectedTag name="body">Changed logging target to /var/log/fail2ban.log for Fail2ban v0.8.4</expectedTag>
                     </expectedTags>
                </example>
            </examples>
        </pattern>
        <pattern name="FAIL2BAN-WARNING">
            <text>TIMESTAMP PROGRAM\.COMPONENT\s*: WARNING\s+\[PROTOCOL\] ACTION SOURCE_IP</text>
            <tags>
                <tag name="__date" tagType="f2bTimeStamp">
                    <substitute>TIMESTAMP</substitute>
                    <callbacks>
                        <callback>decodeF2bTimeStamp</callback>
                    </callbacks>
                </tag>
                <tag name="program" tagType="f2bProgram">
                    <substitute>PROGRAM</substitute>
                </tag>
                <tag name="component" tagType="SimpleWord">
                    <substitute>COMPONENT</substitute>
                </tag>
                <tag name="protocol" tagType="SimpleWord">
                    <substitute>PROTOCOL</substitute>
                </tag>
                <tag name="action" tagType="f2bAction">
                    <substitute>ACTION</substitute>
                </tag>
                <tag name="source_ip" tagType="IP">
                    <substitute>SOURCE_IP</substitute>
                </tag>
            </tags>
            <examples>
                <example>
                     <text>2011-09-26 15:12:58,388 fail2ban.actions: WARNING [ssh] Ban 213.65.93.82</text>
                     <expectedTags>
                          <expectedTag name="program">fail2ban</expectedTag>
                          <expectedTag name="component">actions</expectedTag>
                          <expectedTag name="protocol">ssh</expectedTag>
                          <expectedTag name="action">Ban</expectedTag>
                          <expectedTag name="source_ip">213.65.93.82</expectedTag>
                     </expectedTags>
                </example>
            </examples>
        </pattern>
    </patterns>
</normalizer>

The expected tags will give the automatic validator some way to verify that the normalization occured without problem.

Now let’s add our normalizer to the testing suite. Provided you are working on a git branch of the project, open the file pylogsparser/tests/test_normalizer.py .

All you have to do here is to add the following method to the TestSample class:

    def test_normalize_samples_XXX_fail2ban(self):
        self.normalize_samples('Fail2ban.xml', 'Fail2ban', 0.99)

The nose test suite executes tests in the methods’ names’ alphabetical order, so XXX should be replaced with the next available number in the code.

Save and from the pylogsparser folder, execute :

$ NORMALIZERS_PATH=normalizers/ python tests/test_normalizer.py

If all went well (and it did !) the script should return “OK”.

To test more samples, you’ll have to modify the file pylogsparser/tests/test_log_samples.py . Once again this is rather straightforward : add a method called test_Fail2ban to the class Test, in which you can call the “aS” method with as many log samples you want to test.

“aS” takes 2 arguments :

  1. the log line to test
  2. a dictionary with keys and associated values to validate.

Save and from the pylogsparser folder, execute :

$ NORMALIZERS_PATH=normalizers/ python tests/test_log_samples.py

Once again, make sure everything is okay.

Step 5 : Documenting the format

We could stop here, but since we are nice we are going to offer this normalizer back to the community and publish it in the pylogsparser project. Since other users might not be familiar with the application we wrote this normalizer for, we will add some useful documentation in the description file.

The description should mention at least the application, its version if relevant, some documentation on the patterns and most importantly, if possible, a description of the meaning of the tags.

You can see the final result in the repository, with two languages supported.

Conclusion

In this article, we learned how to write a normalizer description file for pylogsparser. It is much easier than it seems, right ?

Just for fun, let’s have a look at the python regular expression that we would need to parse the two fail2ban patterns :

regexp = (?:(?P<tag0>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?P<tag2>fail2ban)\.(?P<tag3>\w+)\s*: INFO\s+(?P<tag1>.*))|(?:(?P<tag4>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?P<tag8>fail2ban)\.(?P<tag7>\w+)\s*: WARNING\s+\[(?P<tag5>\w+)\] (?P<tag9>(?:Ban)|(?:Unban)) (?P<tag6>(?<![.0-9])(?:\d{1,3}.){3}\d{1,3}(?![.0-9])))
 
names = {'tag4': '__date',
         'tag5': 'protocol',
         'tag6': 'source_ip',
         'tag7': 'component',
         'tag0': '__date',
         'tag1': 'body',
         'tag2': 'program',
         'tag3': 'component',
         'tag8': 'program',
         'tag9': 'action'}

As you might know if you are familiar with named groups, a group’s name needs to be unique, which is why we use the “tagXXX” naming scheme since the two patterns share tags. The regular expression is clearly much harder to read, let alone maintain, than the patterns in pylogsparser’s normalizers. Factorizing the common prefix of
the two patterns would not help much, and there are much more complex log patterns than fail2ban’s. So what do you prefer : work your way in regular expression hell or maintain easily understandable and descriptive patterns ?

There are of course many ways to write a normalizer for an application, and this normalizer is far from perfect : for example, the FAIL2BAN-WARNING pattern should define a body similar to the one defined in the FAIL2BAN-INFO pattern. Maybe one of you can fix that and contribute to the project ? Every normalizer is welcome !

Article contributed by Matthieu Huin, R&D engineer in Wallix LogBox development team.

Incoming search terms:

  • pylogparser
  • pylogsparser howto
  • log normalizer
  • PyLogsParser:howtowriteanormalizer
  • pylogparser example
  • xml normalizer
  • how to read a dtd file
  • pylogsparser samples
  • syslog normalizer
  • pylogsparser sample
This entry was posted in development, log, ssh and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*


6 − = two

* Copy This Password *

* Type Or Paste Password Here *

48,418 Spam Comments Blocked so far by Spam Free Wordpress

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>