Github敏感信息泄露

Github敏感信息泄露

1.背景

公司已经出现好几次敏感信息泄露,处理方案都是被动应付式的,出现泄露就去github排查下,效果并不大。所以考虑自己搭建或使用已有开源的项目,参考链接如下:

https://www.freebuf.com/articles/web/173479.html 自己动手打造Github代码泄露监控工具

https://www.freebuf.com/sectool/188102.html  自己动手打造Github代码泄露监控工具之改进篇

两款成熟的产品:

https://fl4g.cn/2019/01/22/GSIL%E9%85%8D%E7%BD%AE%E4%BD%BF%E7%94%A8-GitHub%E6%95%8F%E6%84%9F%E4%BF%A1%E6%81%AF%E6%B3%84%E9%9C%B2%E7%9B%91%E6%8E%A7/   gsil

https://blog.csdn.net/u011728305/article/details/79970586 hawkeye

2.手把手

2.1 环境

Python3.8  win7  charm

2.2 代码

# -*- coding: utf-8 -*-
from lxml import html
import requests
import configparser
import  csv
from time import sleep
from tqdm import tqdm
from email.utils import parseaddr,formataddr
from email.mime.multipart import MIMEMultipart
from email.header import Header
from email.mime.base import MIMEBase
def login_github(gUser,gPass):
    login_url = 'https://github.com/login'
    session_url = 'https://github.com/session'
    try:
        s = requests.session()
        resp = s.get(login_url).text
        dom_tree = html.etree.HTML(resp)
        key = dom_tree.xpath('//input[@name="authenticity_token"]/@value')
        user_data = {
            'commit': 'Sign in',
            'utf8': '✓',
            'authenticity_token': key,
            'login': gUser,
            'password': gPass
        }
        print(user_data)
        dl = s.post(session_url,data=user_data)
        if dl.status_code == 200:
            # s.get('https://github.com/search?p=1&q=1111.com&type=Code')
            # print(s.get('https://github.com/search?p=1&q=1111.com&type=Code').text) #验证
            return s
    except:
        print('异常')

def hunter(gUser,gPass,Keyword,payloads):
    global sensitive_list
    global tUrls
    sensitive_list = []
    tUrls = []
    try:
        s = login_github(gUser,gPass)
        print('登陆成功,正在检索泄露信息')
        for page in tqdm(range(1,2)):
            search_code = 'https://github.com/search?p='+str(page)+'&q='+keyword+'&type=Code'
            print(search_code)
            resp = s.get(search_code)
            sleep(1)
            results_code = resp.text
            print(results_code)
            dom_tree_code = html.etree.HTML(results_code)
            Urls = dom_tree_code.xpath('//div[@class="f4 text-normal"]/a/@href')
            users = dom_tree_code.xpath('//a[@class="link-gray"]/text()')
            datetime = dom_tree_code.xpath('//relative-time/text()')
            filename = dom_tree_code.xpath('//div[@class="f4 text-normal"]/a/text()')
            with open('leak.csv', 'w', encoding='utf-8', newline='') as file:
                writer = csv.writer(file)
                writer.writerow(['URL', 'Username', 'Upload Time', 'Filename'])
                for i in range(len(Urls)):
                    for url in Urls:
                        url = 'https://github.com'+url
                        tUrls.append(url)
                    # writer.writerow([tUrls[i],users[i],datetime[i],filename[i]])
            for raw_url in Urls:
                url = 'https://raw.githubusercontent.com' + raw_url.replace('/blob', '')
                code = requests.get(url)
                if code.status_code == 200:
                    code = code.text
                    for payload in payloads:
                        if payload in code:
                            leak_url = '命中的Payload为:' + payload + '\r\n' + 'https://github.com' + raw_url + '\r\n\r\n\r\n' + '代码如下: \r\n' + code + '\r\n\r\n'
                            sensitive_list.append(leak_url)
            return sensitive_list
    except Exception as e:
        print(e)

def send_warning(host,username,password,sender,receivers,content):
    def _format_addr(s):
        name,addr = parseaddr(s)
        return formataddr(Header(name, 'utf-8').encode(), addr)
    msg = MIMEMultipart()
    msg['From'] = _format_addr('Github安全监控<%s>' % sender)
    msg['To'] = ''.join(receivers)
    Subject = 'Github敏感信息泄露通知'
    msg['Subject'] = Header(Subject, 'utf-8').encode()
    msg.attach(MIMEText('Dear all \r\n\r\n请注意,怀疑Github上已经上传敏感信息!以下是可能存在敏感信息的仓库!\r\n\r\n' + content + '\r\n\r\n'))
    with open('leak.csv', 'rb') as f:
        m = MIMEBase('excel', 'csv', filename='leak.csv')
        m.add_header('Content-Disposition', 'attachment', filename='leak.csv')
        m.add_header('Content-ID', '<0>')
        m.add_header('X-Attachment-ID', '0')
        m.set_payload(f.read())
        encoders.encode_base64(m)
        msg.attach(m)
        try:
            server = smtplib.SMTP(host, 25)
            server.login(username, password)
            server.sendmail(sender, receivers, msg.as_string())
            print('邮件发送成功!')
        except Exception as err:
            print(err)
        server.quit()

if __name__ == '__main__':
    config = configparser.ConfigParser()
    config.read('info.ini')
    g_User = config['Github']['user']
    g_Pass = config['Github']['password']
    host = config['EMAIL']['host']
    m_User = config['EMAIL']['user']
    m_Pass = config['EMAIL']['password']
    m_sender = config['SENDER']['sender']
    receivers = []
    for k in config['RECEIVER']:
        receivers.append(config['RECEIVER'][k])
    keyword = config['KEYWORD']['keyword']
    payloads = []
    for key in config['PAYLOADS']:
        payloads.append(config['PAYLOADS'][key])
    sensitive_list = hunter(g_User, g_Pass, keyword, payloads)
    if sensitive_list:
        print('\033[1;31;0m警告:找到敏感信息!\r\n\033[0m')
        print('开始发送告警邮件......')
        content = ''.join(sensitive_list)
        send_warning(host, m_User, m_Pass, m_sender, receivers, content)
    else:
        print('恭喜:未找到敏感信息!\r\n')
        print('所有检查已完成,已生成报表!\r\n')
        print('开始发送报表......\r\n')

2.3 代码分析

首先看模块

from lxml import html
import requests
import configparser
import  csv
from time import sleep
from tqdm import tqdm
from email.utils import parseaddr,formataddr
from email.mime.multipart import MIMEMultipart
from email.header import Header
from email.mime.base import MIMEBase

主要关注lxml模块,其他模块系统自带或pip自动安装即可。

安装lxml模块,下载链接: https://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml,下载相应版本, 直接安装会平台报错,将lxml-4.4.2-cp38-cp38-win32.whl改为lxml-4.4.2-cp38-cp38m-win32.whl,第二个cp38后增加了一个m,再次pip成功安装。

Login_Github()函数,实现github登录,注意

dl = s.post(session_url,data=user_data)
if dl.status_code == 200:

发起post请求后,如果不加一个状态判断,极有可能出错,以游客的形式请求。

key = dom_tree.xpath('//input[@name="authenticity_token"]/@value')

通过css获取token值

Hunder()、send_warning()、ini文件配置

问题:1.没有去除重复,搜索出来的结果可能重复出现,去除重复的方法,以用户、文件名、泄露的代码、代码泄露时间为变量产生一个mid值,mid值相同的去除掉。

     2.找到泄露的代码后保存整个文件的代码,内容有点多,很多信息没必要获取,可以找到泄露点,在获取所在的行,得到所在行的上下几行。

     3.发送邮件报警的时候,第一次我们发送整个泄露点,但是第二次我们的关注点应该是新增的泄漏点,所以和问题1配合,新增的mid值即为新增的泄露点。

3.改进

4.gsil

5.hawkeye

 

posted @ 2020-01-19 11:06  强壮的脸皮  阅读(2724)  评论(0编辑  收藏  举报