Pylogsparser : visualizing ssh attacks in video

  • Sharebar

Wallix logoIn this article we will show another possible application for the pylogsparser library. We will also discover a simple way to draw and use world maps with python. You should read the previous article in this series if you haven’t done so, since we will use what we have done there as a starting point.

Here at Wallix, we have set up a SSH honeypot for testing and analysis purposes. It always amazes me how often this machine gets randomly attacked on this service, and how the brute-force attacks even started mere minutes after the SSH server was up. In our previous article, we’ve gained insight on the origins of attacks and targeted accounts with a classic pie chart. This time, I would like to have a visual way to represent and understand a typical day of brute force attempts. We could picture that as a world map where a country lights up when an attacker from this country tries to gain access to the honeypot. There will be a world map drawn for every moment of the day, and then the resulting pictures will be aggregated into a timelapse animation.

Before we get to work, here are the elements we need :

  • obviously, a ssh log file ! It should span a full day for accuracy.
  • the pylogsparser library along with the GeoIP library. Since pylogsparser 0.3, the geoIP conversion has been included in the library so the countries are already tagged when available, if the GeoIP library is installed !
  • the matplotlib library, and more specifically the Basemap optional extension that can be found here : http://matplotlib.sourceforge.net/basemap/doc/html/ Follow the installation instructions there, as unfortunately this extension is not always packaged for easy deployment.
  • the numpy library, but it is optional as we will use it only with matplotlib’s color maps. It should be installed along matplotlib anyway.
  • a shapefile describing countries borders (more on that later). For this article, I am using the one freely available at http://thematicmapping.org/downloads/world_borders.php. It is probably not the most accurate nor up-to-date dataset, but it is more than enough for our project.
  • python libraries for manipulating shapefile datasets. In this article, we will use pyshapelib : http://ftp.intevation.de/users/bh/ but there are many other libraries available, as pyshapelib hasn’t been maintained in a little while. See this article’s comments for details : http://www.geophysique.be/2011/01/27/matplotlib-basemap-tutorial-07-shapefiles-unleached/ (incidentally, this article was the inspiration for this work)
  • ffmpeg, or anything that can make a timelapse animation out of still pictures.

Now that we’ve got everything, let’s get to work !

First, we will parse our log file and keep only the logs where the action tag is set to “fail”. We will then extract the hour as “1345″ for example, and use it as a key in a dictionary. This key will be associated to another dictionary, where keys are the countries where the attacks occurring at this timestamp come from, and the associated values are the amount of attacks from that specific country during that timestamp. This will look a lot like what we’ve done in the previous article :

from logsparser.lognormalizer import LogNormalizer as LN
 
normalizer = LN('/usr/share/normalizers')
auth_logs = open('/var/log/auth.log', 'r')
 
dataset = {}
for log in auth_logs:
    l = {'raw' : log[:-1] } # remove the ending \n
    normalizer.normalize(l)
    if l.get('action') == 'fail':
        key = str(l['date'].hour).rjust(2,'0') +\
              str(l['date'].minute).rjust(2,'0')
        # add the key if not already present
        dataset[key] = dataset.get(key, {})
        # add the country if not already present. If geoIP failed, replace by Unknown
        country = l.get('source_country', 'Unknown')
        dataset[key][country] = dataset[key].get(country, 0) + 1

Next, we need to draw a world map. It is made very straightforward with Basemap. We will go for a classic “Mercator” representation, but there are lots of other possibilities available.

The code below is self-explanatory for the most part. The initialization options for the Basemap object are the map projection, the lower left and upper right corners’ coordinates of the area to draw, the latitude were the projection is the most accurate (this is specific to Mercator), and the drawing resolution.

 
from mpl_toolkits.basemap import Basemap
 
def makemap():
    m = Basemap(projection="merc",
                llcrnrlat=-70,
                urcrnrlat=78,
                llcrnrlon=-180,
                urcrnrlon=180,
                lat_ts=20,
                resolution='c')
    m.drawcoastlines(color="white")
    m.drawmapboundary(fill_color="black")
    m.drawcountries(linewidth = 0.3, color = "gray")
    return m

The Basemap object has an interesting property : when called as a function and passed a list of longitudes and latitudes as arguments, it will automatically convert the coordinates into valid coordinates for the current subplot. It will be used in the next code snippet.

Now all we need is a way to draw a specific country on our map. Unfortunately, while Basemap allows you to draw every boundaries on the map, it cannot be used to draw a specific country. This is why we need a shapefile defining world borders; the shapefile contains a list of vertices coordinates, each list making a polygon covering a specific country. The accompanying description file in the dataset stores the ISO 3166-1 Alpha-2 Country Code for each polygon, which is the country code used by the geoIP library, so we are in luck. Let’s code it !

from shapelib import ShapeFile
import dbflib
from matplotlib.collections import LineCollection
 
class CountryDrawer:
    def __init__(self,
                 shpfile = "worldmap/TM_WORLD_BORDERS-0.3.shp",
                 dbffile = "worldmap/TM_WORLD_BORDERS-0.3.dbf"):
        shp = ShapeFile(shpfile)
        dbf = dbflib.open(dbffile)
        self.countries = {}
        for i in range(shp.info()[0]):
        # we already know where to find the info we need, otherwise some
        # introspection would have been needed.
            c = dbf.read_record(i)['ISO2']
            poly = shp.read_object(i)
            self.countries[c] = poly.vertices()
 
    def drawcountry(self,
                    ax,
                    base_map,
                    iso2,
                    color,
                    alpha = 1):
        if iso2 not in self.countries:
            raise ValueError, "Where is that country ?"
        vertices = self.countries[iso2]
        shape = []
        for vertex in vertices:
            longs, lats = zip(*vertex)
            # conversion to plot coordinates
            x,y = base_map(longs, lats)
            shape.append(zip(x,y))
        lines = LineCollection(shape,antialiaseds=(1,))
        lines.set_facecolors(cm.hot(np.array([color,])))
        lines.set_edgecolors('white')
        lines.set_linewidth(0.5)
        lines.set_alpha(alpha)
        ax.add_collection(lines)

Let’s put everything together and generate a picture for each minute in a day (which is 1440 in case you are wondering). The countries will be colored according to the proportion they represent of the ongoing attacks at a given timestamp. There is also a light fading effect when nothing is happening for some time. Finally, there will be a timestamp in the lower left corner.

 
import numpy as np
import matplotlib.pyplot as plt
# color palette
from matplotlib import cm
 
# insert previous code snippets here ...
 
if __name__ == "__main__":
    cd = CountryDrawer()
    currentkey = "0000"
    alpha = 1
    i = 0
    for hour in range(0,24):
        for minute in range(60):
            key = str(hour).rjust(2,'0') + str(minute).rjust(2,'0')
            fig = plt.figure(figsize=(6.2,3.6))
            plt.subplots_adjust(left=0,right=1,top=1,bottom=0)
            ax = plt.subplot(111)
            m = makemap()
            if key in dataset:
                currentkey = key
                alpha = 1
            else:
                alpha *= 0.7
            if currentkey in dataset:
                data = dataset[currentkey]
                total_attacks = float(sum(data.values()))
                for c in data:
                    if c != 'Unknown':
                        cd.drawcountry(ax, m, c, 0.6*data[c]/total_attacks, alpha )
            plt.text(50,50,str(hour).rjust(2,'0') +":"+ str(minute).rjust(2,'0'), color = 'white', size=20)
            plt.savefig('rendering/plot%s.png' % str(i).rjust(5,'0'), dpi=200)
            i += 1
        print "Hour %i is done !" % hour

Here’s a resulting sample:

worldmap image

If you run this code yourself, be aware that you might need a lot of processing power and memory. It is probably best to split the rendering process in two.

Once we have all our pictures, let’s make the timelapse animation. With a frames-per-second rate of 15, the animation will be still smooth and last around 90 seconds.

  ffmpeg -r 15 -b 3000 -i rendering/plot%05d.png sshd.mp4

And now let’s watch the world light up as our server gets tirelessly assaulted !

Article contributed by Matthieu Huin, R&D engineer in Wallix LogBox development team.

Incoming search terms:

  • python ply syslog
  • www wallix org
  • matplotlib world map
  • python world map png draw
  • python visualize logging
  • python map matplotlib shapefile
  • log parsing matplotlib
  • syslog geoip ssh
  • python country to iso2
  • matplotlib
This entry was posted in log, ssh and tagged , , , . Bookmark the permalink.

13 comments on “Pylogsparser : visualizing ssh attacks in video

  1. Bjoern on said:

    Interesting idea thanks for the blog post!

    In the current version though you loose the notion of who is attacking you in the long term. Why do you not try to have a fade time for each color in the plot and if you get hit it gets brighter again..?

    Cheers,
    Bjoern

    • Matthieu Huin on said:

      Hey Bjoern,

      This is a great idea ! Due to time constraints I had to keep this whole thing simple, it is mostly a proof of concept to show what can be done with the pylogsparser library.

      Therefore, your suggestion is left as an exercise to the reader. :)

      There are many other ways to experiment as well; surprise us !

  2. Pingback: Wallix: Pylogsparser : visualizing ssh attacks... | Python | Syngu

  3. Pingback: Pylogsparser : visualizing ssh attacks in video | Linux | Syngu

  4. Pingback: Links 1/3/2012: WebOS Layoffs, Eclipse Board Elections | Techrights

  5. Kura on said:

    Great article!

    I’ve been playing with this and made a few changes, namely I for loop over seconds as well as minutes but I noticed a huge problem with this script – memory usage.

    I’ve been parsing 31 days of auth logs for one of my servers (so I can try parsing 250+ servers together) and noticed memory usage was very high.

    A simple change can help this dramatically, after the plt.savefig() you close the fig

    plt.savefig('rendering/plot%s.png' % str(i).rjust(5,'0'), dpi=200)
    import pylab; pylab.close(fig)
    • Kura on said:

      I also made some other modifications, like iterating over sorted dict keys, instead of multiple loops. I am still able to access the time by using the key and splitting it in to 3 sets of 2.

      https://gist.github.com/1955460

    • Matthieu Huin on said:

      Great job ! I noticed the high memory usage as well, and had a feeling it was linked to the image generation, but didn’t give it much thought since I managed to get the images I needed in reasonable time in the end.

      Keep us informed if you ever publish the videos you generate for your servers, it would be interesting to see how the attacking patterns differ from what we observed on our honeypot.

  6. Pingback: Wallix: Pylogsparser : visualizing ssh attacks in video | Python Tips | Scoop.it

  7. Hi there, this weekend is good designed for me,
    as this time i am reading this enormous educational piece
    of writing here at my home.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>