如何用java实现一个p2p种子搜索(4)-种子获取

种子获取

在上一篇中我们已经可以获取到dht网络中的infohash了，所以我们只需要通过infohash来获取到种子，最后获取种子里面的文件名，然后和获取到的infohash建立对应关系，那么我们的搜索的数据就算落地了，有了数据再把数据导到es，搜索就算完成了。
获取种子我们需要和其他的peer交互，所以需要使用peer wire protocal发送握手数据包，握手数据包是68字节，第一个字节必须是19代表长度，后面是协议固定为BitTorrent protocol刚好19个字节，然后再跟着8个保留字节。现在一共是28字节，最后40字节分别是infohash和nodeid这样合起来刚好是68字节

@Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
    byte[] infoHash = DHTUtil.hexStr2Bytes(this.infoHash);
    byte[] sendBytes = new byte[68];
    System.arraycopy(HANDSHAKE_BYTES, 0, sendBytes, 0, 28);
    System.arraycopy(infoHash, 0, sendBytes, 28, 20);
    System.arraycopy(routingTable.getNodeId(), 0, sendBytes, 48, 20);
    ctx.channel().writeAndFlush(Unpooled.copiedBuffer(sendBytes));
}

在握手协议后呢还需要在发送一个握手协议，这是因为不是所有的peer都支持种子的下载，种子的下载使用的是扩展bep_0009协议。
这个握手协议发送一个参数为m的字典，格式如下：前面4字节是长度字段，后面1字节是message id用来确认消息，紧接着一个字节0代表握手,在后面就是m参数的那个字典实际的数据了,官方介绍是这样的

This message is sent as any other bittorrent message, with a 4 byte length prefix and a single byte identifying the message (the single byte being 20 in this case). At the start of the payload of the message, is a single byte message identifier. This identifier can refer to different extension messages and only one ID is specified, 0. If the ID is 0, the message is a handshake message which is described below. The layout of a general extended message follows (including the message headers used by the bittorrent protocol):
uint32_t    length prefix. Specifies the number of bytes for the entire message. (Big endian)
uint8_t     bittorrent message ID, = 20
uint8_t     extended message ID. 0 = handshake, >0 = extended message as specified by the handshake.

具体发送代码：

public void sendHandshakeMsg(ChannelHandlerContext ctx) throws Exception{
    Map<String, Object> extendMessageMap = new LinkedHashMap<>();
    Map<String, Object> extendMessageMMap = new LinkedHashMap<>();
    extendMessageMMap.put("ut_metadata", 1);
    extendMessageMap.put("m", extendMessageMMap);
    byte[] tempExtendBytes = bencode.encode(extendMessageMap);
    byte[] extendMessageBytes = new byte[tempExtendBytes.length + 6];
    extendMessageBytes[4] = 20;
    extendMessageBytes[5] = 0;
    byte[] lenBytes = DHTUtil.int2Bytes(tempExtendBytes.length + 2);
    System.arraycopy(lenBytes, 0, extendMessageBytes, 0, 4);
    System.arraycopy(tempExtendBytes, 0, extendMessageBytes, 6, tempExtendBytes.length);
    ctx.channel().writeAndFlush(Unpooled.copiedBuffer(extendMessageBytes));
}

如果返回的消息里面包含ut_metadata和metadata_size，那么说明就支持种子下载协议，metadata_size代表种子的大小，因为每次下载最多是16Kb，所以我们需要根据返回的metadata_size进行分块下载。其中有两个参数一个是msg_type 具体的值有0 1 2，0 代表request也就是发起请求，1 代表data也就是数据，2 reject代表拒绝，还有一个参数是piece代表需要下载第几块数据。看起来还是挺简单的

@SneakyThrows
private void sendMetadataRequest(ChannelHandlerContext ctx, String s){
    int ut_metadata= Integer.parseInt(s.substring(s.indexOf("ut_metadatai") + 12, s.indexOf("ut_metadatai") + 13));
    String str=s.substring(s.indexOf("metadata_sizei") + 14, s.length());
    int metadata_size=Integer.parseInt(str.substring(0, str.indexOf("e")));
    //分块数
    int blockSize = (int) Math.ceil((double) metadata_size / (16 << 10));
    bs=blockSize;
    log.info("blocksize="+blockSize);
    //发送metadata请求
    for (int i = 0; i < blockSize; i++) {
        Map<String, Object> metadataRequestMap = new LinkedHashMap<>();
        metadataRequestMap.put("msg_type", 0);
        metadataRequestMap.put("piece", i);
        byte[] metadataRequestMapBytes = bencode.encode(metadataRequestMap);
        byte[] metadataRequestBytes = new byte[metadataRequestMapBytes.length + 6];
        metadataRequestBytes[4] = 20;
        metadataRequestBytes[5] = (byte) ut_metadata;
        byte[] lenBytes = DHTUtil.int2Bytes(metadataRequestMapBytes.length + 2);
        System.arraycopy(lenBytes, 0, metadataRequestBytes, 0, 4);
        System.arraycopy(metadataRequestMapBytes, 0, metadataRequestBytes, 6, metadataRequestMapBytes.length);
        ctx.channel().writeAndFlush(Unpooled.copiedBuffer(metadataRequestBytes));
    }
}

发送完后，对返回结果进行解码，可以看到里面包含了种子的文件名，种子的长度等等。最后对解析到的文件名和infohash保存的到数据库和es
好了，到此我们介绍完了种子搜索的整个思路和实现，那其实在dht网络中获取到infohash，然后再下载种子，最后能成功的概率没有很高，我自己运行了好几天，数据量不太，infohash到是还算多，但是很多都不支持metadata下载，这个是最骚的。不过还可以根据一些现有的磁力种子网站根据http协议去解析，这样通过多种途径收集数据才算多。
在实现的过程中，遇到了很多问题，看了很多文档和资料，最终能实现感觉还是有点东西的，当然也参考了github上面种子搜索java实现，很多代码都是copy的，哈哈哈哈。
最后再贴一下源码地址吧https://github.com/mistletoe9527/dht-spider

posted @ 2019-04-23 14:32 mistletoe9527 阅读(2765) 评论(0) 收藏举报

刷新页面返回顶部

mistletoe9527

如何用java实现一个p2p种子搜索(4)-种子获取

种子获取

公告