最近有一个很简单的需求,今天趁着下班赶紧写一下。

需求描述如下:log日志中存在ip,url,time三个元素,需要获取 x 秒内访问不同URL超过 max次的恶意ip。

0x01 实现思路

问题的本质在于获取恶意ip,x秒只是一个限制条件,因此问题可以简化为获取{ip:[url1,url2]}这样字典,算一下url_list的len即可获取恶意ip。

0x02 show me the code

这里我以nginx log为例,做下简单实现。默认的nginx log格式如下:

67.218.129.173 - - [15/Jul/2019:16:41:53 +0000] "GET /atom.xml HTTP/1.1" 304 0 "-" "Tiny Tiny RSS/19.02 (http://tt-rss.org/)"

Python实现

#!/usr/bin/env python
# coding=utf-8
# date: 20190715 night, about 1.5 hours
# author: thinkycx
# desciription:
#       try to get malicious ip from nginx log.
#       malicious ip: the number of different urls accessed is larger than MAX within x seconds
# usage:
#       python find_malicious_ip.py <path>/access.log
import time
import json
import sys


def parse_nginx_log(file_path):
    """
    :param filepath: nginx log file path
    :return: list of (url, ip, time)
    """
    result = list()
    try:
        with open(file_path, 'r') as f:

            line = f.readline()                                     # read each line of a file
            while line:
                ttime = line.split("[")[1].split(" ")[0]            # 15/Jul/2019:02:47:40
                ip = line.split(" ")[0]                             # 47.254.91.114
                url = str(line.split("] \"")[1]).split(" ")[1]      # /posts/2018-08-08-CVE-2017-8890-analysis.html

                result.append([url, ip, ttime])
                line = f.readline()
    except IOError as err:
        print("[*] Failed to open file.\n " + str(err))

    return result



def find_malicious_ip(input_list, within_time, max_count):
    """
    :param input_list: list of [url, ip, time]
    :param within_time: time
    :param max_count:   the max number of different urls
    :return:   { ip1: [[url1,time1], [url2,time2]], ip2: ...}
    """

    tmp_dict = dict()
    now_time = time.time()
    for i in input_list:
        url = i[0]
        ip = i[1]
        ttime = i[2]
        timestamp = time.mktime(time.strptime(ttime, '%d/%b/%Y:%H:%M:%S'))  # 1563130060.0

        if now_time - timestamp > within_time:                              # time check
            continue

        if ip not in tmp_dict:                                              # check unique ip
            tmp_dict[ip] = [[url, ttime]]
        else:
            unique = 1
            url_ttime_list = tmp_dict[ip]
            for j in url_ttime_list:                                        # append unique url into url_time_list
                if url == j[0]:
                    unique = 0
                    break
            if unique == 1:
                url_ttime_list.append([url, ttime])
                tmp_dict[ip] = url_ttime_list

    result_dict = dict()                                                    # check the number of urls
    for ip in tmp_dict:
        if len(tmp_dict[ip]) > max_count:
            result_dict[ip] = tmp_dict[ip]

    return json.dumps(result_dict)


if __name__ == '__main__':
    if len(sys.argv) < 2:
        print("python find_malicious_ip.py <path>/access.log")

    file_path = sys.argv[1]
    input_list = parse_nginx_log(file_path)

    within_time = 60*60*24*30
    max_count = 30
    result_dict = find_malicious_ip(input_list, within_time, max_count)
    print(result_dict)

0x03 result

起初写了一个只获取IP和URL的版本,效果如图所示:

后来考虑到TIME这个维度其实有保留的必要,增加了TIME之后,在线格式化的效果如下:

0x04 总结

问题简化的能力、冷静思考的能力...都很重要。