leveldb-Impl:VersionSet.java

VersionSet:leveldb/VersionSet.java at master · dain/leveldb · GitHub

public class VersionSet
        implements SeekingIterable<InternalKey, Slice>
{
    private static final int L0_COMPACTION_TRIGGER = 4;

    public static final int TARGET_FILE_SIZE = 2 * 1048576;

    // Maximum bytes of overlaps in grandparent (i.e., level+2) before we
    // stop building a single file in a level.level+1 compaction.
    public static final long MAX_GRAND_PARENT_OVERLAP_BYTES = 10 * TARGET_FILE_SIZE;

    private final AtomicLong nextFileNumber = new AtomicLong(2);
    private long manifestFileNumber = 1;
    private Version current;
    private long lastSequence;
    private long logNumber;
    private long prevLogNumber;

    private final Map<Version, Object> activeVersions = new MapMaker().weakKeys().makeMap();
    private final File databaseDir;
    private final TableCache tableCache;
    private final InternalKeyComparator internalKeyComparator;

    private LogWriter descriptorLog;
    private final Map<Integer, InternalKey> compactPointers = new TreeMap<>();

 整个leveldb的当前状态被VersionSet管理着,通过Map<Version, Object>将当前最新的Version和其它正在服务的Version维护起来.

维护的信息:每个level下一次compact要选的start_key, 全局的文件logNumber 和FileNumber,当前的version current,当前的manifest_file_number

public VersionSet(File databaseDir, TableCache tableCache, InternalKeyComparator internalKeyComparator)
            throws IOException
    {
        this.databaseDir = databaseDir;
        this.tableCache = tableCache;
        this.internalKeyComparator = internalKeyComparator;
        appendVersion(new Version(this));

        initializeIfNeeded();
    }

tableCache缓存的是sstable的索引数据,append version 即把databaseDir,tableCache和internalKeyComparator的信息添加进versionSet并进行初始化

private void initializeIfNeeded()
            throws IOException
    {
        File currentFile = new File(databaseDir, Filename.currentFileName());

        if (!currentFile.exists()) {
            VersionEdit edit = new VersionEdit();
            edit.setComparatorName(internalKeyComparator.name());
            edit.setLogNumber(prevLogNumber);
            edit.setNextFileNumber(nextFileNumber.get());
            edit.setLastSequenceNumber(lastSequence);

            LogWriter log = Logs.createLogWriter(new File(databaseDir, Filename.descriptorFileName(manifestFileNumber)), manifestFileNumber);
            try {
                writeSnapshot(log);
                log.addRecord(edit.encode(), false);
            }
            finally {
                log.close();
            }

            Filename.setCurrentFile(databaseDir, log.getFileNumber());
        }
    }

使用VersionEdit创建新的Version:由internalKeyComparator,前一个日志标号prevLogNumber,序列号SequenceNumber和下一个文件标号nextFileNumber组成。

初始化时DatabaseDir和manifestFileNumber写入日志log,并记录进snapshot。将初始化的File设为currentFile

public void destroy()
            throws IOException
    {
        if (descriptorLog != null) {
            descriptorLog.close();
            descriptorLog = null;
        }

        Version t = current;
        if (t != null) {
            current = null;
            t.release();
        }

        Set<Version> versions = activeVersions.keySet();
        // TODO:
        // log("DB closed with "+versions.size()+" open snapshots. This could mean your application has a resource leak.");
    }

销毁versionset中的version

private void appendVersion(Version version)
    {
        requireNonNull(version, "version is null");
        checkArgument(version != current, "version is the current version");
        Version previous = current;
        current = version;
        activeVersions.put(version, new Object());
        if (previous != null) {
            previous.release();
        }
    }

将该version设为current version

public void removeVersion(Version version)
    {
        requireNonNull(version, "version is null");
        checkArgument(version != current, "version is the current version");
        boolean removed = activeVersions.remove(version) != null;
        assert removed : "Expected the version to still be in the active set";
    }

移除该version

public InternalKeyComparator getInternalKeyComparator()
    {
        return internalKeyComparator;
    }

    public TableCache getTableCache()
    {
        return tableCache;
    }

    public Version getCurrent()
    {
        return current;
    }

    public long getManifestFileNumber()
    {
        return manifestFileNumber;
    }

    public long getNextFileNumber()
    {
        return nextFileNumber.getAndIncrement();
    }

    public long getLogNumber()
    {
        return logNumber;
    }

    public long getPrevLogNumber()
    {
        return prevLogNumber;
    }

获取versionSet的信息

@Override
    public MergingIterator iterator()
    {
        return current.iterator();
    }

    public MergingIterator makeInputIterator(Compaction c)
    {
        // Level-0 files have to be merged together.  For other levels,
        // we will make a concatenating iterator per level.
        // TODO(opt): use concatenating iterator for level-0 if there is no overlap
        List<InternalIterator> list = new ArrayList<>();
        for (int which = 0; which < 2; which++) {
            if (!c.getInputs()[which].isEmpty()) {
                if (c.getLevel() + which == 0) {
                    List<FileMetaData> files = c.getInputs()[which];
                    list.add(new Level0Iterator(tableCache, files, internalKeyComparator));
                }
                else {
                    // Create concatenating iterator for the files from this level
                    list.add(Level.createLevelConcatIterator(tableCache, c.getInputs()[which], internalKeyComparator));
                }
            }
        }
        return new MergingIterator(list, internalKeyComparator);
    }

Compaction过程中需要对多个文件进行归并操作,并将结果输出到新的下层文件。LevelDB用MergingIterator来实现这个过程. 如果有Level0文件,则包含所有level0文件的TableIterator,其他Level文件,加入NewConcatenationIterator,作为一个TwoLevelIterator,由LevelFileNumIterator作为index iterator,TableIterator作为data iterator

    public LookupResult get(LookupKey key)
    {
        return current.get(key);
    }

    public boolean overlapInLevel(int level, Slice smallestUserKey, Slice largestUserKey)
    {
        return current.overlapInLevel(level, smallestUserKey, largestUserKey);
    }

    public int numberOfFilesInLevel(int level)
    {
        return current.numberOfFilesInLevel(level);
    }

    public long numberOfBytesInLevel(int level)
    {
        return current.numberOfFilesInLevel(level);
    }

    public long getLastSequence()
    {
        return lastSequence;
    }

    public void setLastSequence(long newLastSequence)
    {
        checkArgument(newLastSequence >= lastSequence, "Expected newLastSequence to be greater than or equal to current lastSequence");
        this.lastSequence = newLastSequence;
    }

获取 file重叠所在的level , version的LastSequence,level中的file数量

public void logAndApply(VersionEdit edit)
            throws IOException
    {
        if (edit.getLogNumber() != null) {
            checkArgument(edit.getLogNumber() >= logNumber);
            checkArgument(edit.getLogNumber() < nextFileNumber.get());
        }
        else {
            edit.setLogNumber(logNumber);
        }

        if (edit.getPreviousLogNumber() == null) {
            edit.setPreviousLogNumber(prevLogNumber);
        }

        edit.setNextFileNumber(nextFileNumber.get());
        edit.setLastSequenceNumber(lastSequence);

        Version version = new Version(this);
        Builder builder = new Builder(this, current);
        builder.apply(edit);
        builder.saveTo(version);

        finalizeVersion(version);

        boolean createdNewManifest = false;
        try {
            // Initialize new descriptor log file if necessary by creating
            // a temporary file that contains a snapshot of the current version.
            if (descriptorLog == null) {
                edit.setNextFileNumber(nextFileNumber.get());
                descriptorLog = Logs.createLogWriter(new File(databaseDir, Filename.descriptorFileName(manifestFileNumber)), manifestFileNumber);
                writeSnapshot(descriptorLog);
                createdNewManifest = true;
            }

            // Write new record to MANIFEST log
            Slice record = edit.encode();
            descriptorLog.addRecord(record, true);

            // If we just created a new descriptor file, install it by writing a
            // new CURRENT file that points to it.
            if (createdNewManifest) {
                Filename.setCurrentFile(databaseDir, descriptorLog.getFileNumber());
            }
        }
        catch (IOException e) {
            // New manifest file was not installed, so clean up state and delete the file
            if (createdNewManifest) {
                descriptorLog.close();
                // todo add delete method to LogWriter
                new File(databaseDir, Filename.logFileName(descriptorLog.getFileNumber())).delete();
                descriptorLog = null;
            }
            throw e;
        }

        // Install the new version
        appendVersion(version);
        logNumber = edit.getLogNumber();
        prevLogNumber = edit.getPreviousLogNumber();
    }

VERSION N + VERSION EDIT =VERSION N+1

使用VersionEdit创建新的Version后,创建新的manifest文件并写入manifest log和manifest file,descriptorFile指可写的manifest文件,写入新的current file 。descriptorLog指向log writer对象,记录current version并写入manifest log中,并且append该version

为了重启leveldb后可以恢复到退出前的状态,需要将db中的状态保存下来,这些信息就保存在Manifest中。当leveldb出现异常时,为了能尽可能多的恢复,manifest中不仅保存当前的状态,也会将历史的状态也保存起来,考虑到每次状态的完全保存需要空间和耗费的时间比较多,只在manifest开始保存完整的状态信息。

private void writeSnapshot(LogWriter log)
            throws IOException
    {
        // Save metadata
        VersionEdit edit = new VersionEdit();
        edit.setComparatorName(internalKeyComparator.name());

        // Save compaction pointers
        edit.setCompactPointers(compactPointers);

        // Save files
        edit.addFiles(current.getFiles());

        Slice record = edit.encode();
        log.addRecord(record, false);
    }

将version的metadata和files写入log作为snapshot

public void recover()
            throws IOException
    {
        // Read "CURRENT" file, which contains a pointer to the current manifest file
        File currentFile = new File(databaseDir, Filename.currentFileName());
        checkState(currentFile.exists(), "CURRENT file does not exist");

        String currentName = Files.toString(currentFile, UTF_8);
        if (currentName.isEmpty() || currentName.charAt(currentName.length() - 1) != '\n') {
            throw new IllegalStateException("CURRENT file does not end with newline");
        }
        currentName = currentName.substring(0, currentName.length() - 1);

        // open file channel
        try (FileInputStream fis = new FileInputStream(new File(databaseDir, currentName));
             FileChannel fileChannel = fis.getChannel()) {
            // read log edit log
            Long nextFileNumber = null;
            Long lastSequence = null;
            Long logNumber = null;
            Long prevLogNumber = null;
            Builder builder = new Builder(this, current);

            LogReader reader = new LogReader(fileChannel, throwExceptionMonitor(), true, 0);
            for (Slice record = reader.readRecord(); record != null; record = reader.readRecord()) {
                // read version edit
                VersionEdit edit = new VersionEdit(record);

                // verify comparator
                // todo implement user comparator
                String editComparator = edit.getComparatorName();
                String userComparator = internalKeyComparator.name();
                checkArgument(editComparator == null || editComparator.equals(userComparator),
                        "Expected user comparator %s to match existing database comparator ", userComparator, editComparator);

                // apply edit
                builder.apply(edit);

                // save edit values for verification below
                logNumber = coalesce(edit.getLogNumber(), logNumber);
                prevLogNumber = coalesce(edit.getPreviousLogNumber(), prevLogNumber);
                nextFileNumber = coalesce(edit.getNextFileNumber(), nextFileNumber);
                lastSequence = coalesce(edit.getLastSequenceNumber(), lastSequence);
            }

            List<String> problems = new ArrayList<>();
            if (nextFileNumber == null) {
                problems.add("Descriptor does not contain a meta-nextfile entry");
            }
            if (logNumber == null) {
                problems.add("Descriptor does not contain a meta-lognumber entry");
            }
            if (lastSequence == null) {
                problems.add("Descriptor does not contain a last-sequence-number entry");
            }
            if (!problems.isEmpty()) {
                throw new RuntimeException("Corruption: \n\t" + Joiner.on("\n\t").join(problems));
            }

            if (prevLogNumber == null) {
                prevLogNumber = 0L;
            }

            Version newVersion = new Version(this);
            builder.saveTo(newVersion);

            // Install recovered version
            finalizeVersion(newVersion);

            appendVersion(newVersion);
            manifestFileNumber = nextFileNumber;
            this.nextFileNumber.set(nextFileNumber + 1);
            this.lastSequence = lastSequence;
            this.logNumber = logNumber;
            this.prevLogNumber = prevLogNumber;
        }
    }

将磁盘中的版本信息恢复到内存中:currentFile指向当前的version,记录currentName。LogReader读取version edit的log。比较器验证Argment是否正确,然后apply并save验证过的信息。调用builder添加newVersion,finalize Version,设置nextFileNumber+1

private void finalizeVersion(Version version)
    {
        // Precomputed best level for next compaction
        int bestLevel = -1;
        double bestScore = -1;

        for (int level = 0; level < version.numberOfLevels() - 1; level++) {
            double score;
            if (level == 0) {
                // We treat level-0 specially by bounding the number of files
                // instead of number of bytes for two reasons:
                //
                // (1) With larger write-buffer sizes, it is nice not to do too
                // many level-0 compactions.
                //
                // (2) The files in level-0 are merged on every read and
                // therefore we wish to avoid too many files when the individual
                // file size is small (perhaps because of a small write-buffer
                // setting, or very high compression ratios, or lots of
                // overwrites/deletions).
                score = 1.0 * version.numberOfFilesInLevel(level) / L0_COMPACTION_TRIGGER;
            }
            else {
                // Compute the ratio of current size to size limit.
                long levelBytes = 0;
                for (FileMetaData fileMetaData : version.getFiles(level)) {
                    levelBytes += fileMetaData.getFileSize();
                }
                score = 1.0 * levelBytes / maxBytesForLevel(level);
            }

            if (score > bestScore) {
                bestLevel = level;
                bestScore = score;
            }
        }

        version.setCompactionLevel(bestLevel);
        version.setCompactionScore(bestScore);
    }

 finalizeVersion提前计算下一次compaction的level和score,根据file的数量计算level0的score  对其他level则计算file的字节数量

private static <V> V coalesce(V... values)
    {
        for (V value : values) {
            if (value != null) {
                return value;
            }
        }
        return null;
    }

若不为null,则显示原值

public List<FileMetaData> getLiveFiles()
    {
        ImmutableList.Builder<FileMetaData> builder = ImmutableList.builder();
        for (Version activeVersion : activeVersions.keySet()) {
            builder.addAll(activeVersion.getFiles().values());
        }
        return builder.build();
    }

将active version添加到维护FileMetaData的builder中

private static double maxBytesForLevel(int level)
    {
        // Note: the result for level zero is not really used since we set
        // the level-0 compaction threshold based on number of files.
        double result = 10 * 1048576.0;  // Result for both level-0 and level-1
        while (level > 1) {
            result *= 10;
            level--;
        }
        return result;
    }

每个level最大的字节数量

 public static long maxFileSizeForLevel(int level)
    {
        return TARGET_FILE_SIZE;  // We could vary per level to reduce number of files?
    }

level最大的file大小

 public boolean needsCompaction()
    {
        return current.getCompactionScore() >= 1 || current.getFileToCompact() != null;
    }

是否需要compaction

public Compaction compactRange(int level, InternalKey begin, InternalKey end)
    {
        List<FileMetaData> levelInputs = getOverlappingInputs(level, begin, end);
        if (levelInputs.isEmpty()) {
            return null;
        }

        return setupOtherInputs(level, levelInputs);
    }

compact的范围:InternalKey的begin和end

public Compaction pickCompaction()
    {
        // We prefer compactions triggered by too much data in a level over
        // the compactions triggered by seeks.
        boolean sizeCompaction = (current.getCompactionScore() >= 1);
        boolean seekCompaction = (current.getFileToCompact() != null);

        int level;
        List<FileMetaData> levelInputs;
        if (sizeCompaction) {
            level = current.getCompactionLevel();
            checkState(level >= 0);
            checkState(level + 1 < NUM_LEVELS);

            // Pick the first file that comes after compact_pointer_[level]
            levelInputs = new ArrayList<>();
            for (FileMetaData fileMetaData : current.getFiles(level)) {
                if (!compactPointers.containsKey(level) ||
                        internalKeyComparator.compare(fileMetaData.getLargest(), compactPointers.get(level)) > 0) {
                    levelInputs.add(fileMetaData);
                    break;
                }
            }
            if (levelInputs.isEmpty()) {
                // Wrap-around to the beginning of the key space
                levelInputs.add(current.getFiles(level).get(0));
            }
        }
        else if (seekCompaction) {
            level = current.getFileToCompactLevel();
            levelInputs = ImmutableList.of(current.getFileToCompact());
        }
        else {
            return null;
        }

        // Files in level 0 may overlap each other, so pick up all overlapping ones
        if (level == 0) {
            Entry<InternalKey, InternalKey> range = getRange(levelInputs);
            // Note that the next call will discard the file we placed in
            // c->inputs_[0] earlier and replace it with an overlapping set
            // which will include the picked file.
            levelInputs = getOverlappingInputs(0, range.getKey(), range.getValue());

            checkState(!levelInputs.isEmpty());
        }

        Compaction compaction = setupOtherInputs(level, levelInputs);
        return compaction;
    }

seekCompaction用compact_pointer记录了该层上次 compact 时文件的 largest key,初始值为空,也就是选择该层第一个文件。将要compact的file元数据加入到levelInputs.若在level0选取所有overlap的file加入进levelInputs

private Compaction setupOtherInputs(int level, List<FileMetaData> levelInputs)
    {
        Entry<InternalKey, InternalKey> range = getRange(levelInputs);
        InternalKey smallest = range.getKey();
        InternalKey largest = range.getValue();

        List<FileMetaData> levelUpInputs = getOverlappingInputs(level + 1, smallest, largest);

        // Get entire range covered by compaction
        range = getRange(levelInputs, levelUpInputs);
        InternalKey allStart = range.getKey();
        InternalKey allLimit = range.getValue();

        // See if we can grow the number of inputs in "level" without
        // changing the number of "level+1" files we pick up.
        if (!levelUpInputs.isEmpty()) {
            List<FileMetaData> expanded0 = getOverlappingInputs(level, allStart, allLimit);

            if (expanded0.size() > levelInputs.size()) {
                range = getRange(expanded0);
                InternalKey newStart = range.getKey();
                InternalKey newLimit = range.getValue();

                List<FileMetaData> expanded1 = getOverlappingInputs(level + 1, newStart, newLimit);
                if (expanded1.size() == levelUpInputs.size()) {
//              Log(options_->info_log,
//                  "Expanding@%d %d+%d to %d+%d\n",
//                  level,
//                  int(c->inputs_[0].size()),
//                  int(c->inputs_[1].size()),
//                  int(expanded0.size()),
//                  int(expanded1.size()));
                    smallest = newStart;
                    largest = newLimit;
                    levelInputs = expanded0;
                    levelUpInputs = expanded1;

                    range = getRange(levelInputs, levelUpInputs);
                    allStart = range.getKey();
                    allLimit = range.getValue();
                }
            }
        }

        // Compute the set of grandparent files that overlap this compaction
        // (parent == level+1; grandparent == level+2)
        List<FileMetaData> grandparents = ImmutableList.of();
        if (level + 2 < NUM_LEVELS) {
            grandparents = getOverlappingInputs(level + 2, allStart, allLimit);
        }

//        if (false) {
//            Log(options_ - > info_log, "Compacting %d '%s' .. '%s'",
//                    level,
//                    EscapeString(smallest.Encode()).c_str(),
//                    EscapeString(largest.Encode()).c_str());
//        }

        Compaction compaction = new Compaction(current, level, levelInputs, levelUpInputs, grandparents);

        // Update the place where we will do the next compaction for this level.
        // We update this immediately instead of waiting for the VersionEdit
        // to be applied so that if the compaction fails, we will try a different
        // key range next time.
        compactPointers.put(level, largest);
        compaction.getEdit().setCompactPointer(level, largest);

        return compaction;
    }

判断还有哪些文件有重叠,把这些文件都加入进来后,接下来就是要计算下一层参与 compact 的文件。基本的思想是:所有有重叠的 level + 1 层文件都要参与 compact,得到这些文件后,反过来看下,如果在不增加 level + 1 层文件的前提下,尽量增加 level 层的文件。

List<FileMetaData> getOverlappingInputs(int level, InternalKey begin, InternalKey end)
    {
        ImmutableList.Builder<FileMetaData> files = ImmutableList.builder();
        Slice userBegin = begin.getUserKey();
        Slice userEnd = end.getUserKey();
        UserComparator userComparator = internalKeyComparator.getUserComparator();
        for (FileMetaData fileMetaData : current.getFiles(level)) {
            if (userComparator.compare(fileMetaData.getLargest().getUserKey(), userBegin) < 0 ||
                    userComparator.compare(fileMetaData.getSmallest().getUserKey(), userEnd) > 0) {
                // Either completely before or after range; skip it
            }
            else {
                files.add(fileMetaData);
            }
        }
        return files.build();
    }

计算level的哪些file的InternalKey范围重叠,并添加fileMetaData

public long getMaxNextLevelOverlappingBytes()
    {
        long result = 0;
        for (int level = 1; level < NUM_LEVELS; level++) {
            for (FileMetaData fileMetaData : current.getFiles(level)) {
                List<FileMetaData> overlaps = getOverlappingInputs(level + 1, fileMetaData.getSmallest(), fileMetaData.getLargest());
                long totalSize = 0;
                for (FileMetaData overlap : overlaps) {
                    totalSize += overlap.getFileSize();
                }
                result = Math.max(result, totalSize);
            }
        }
        return result;
    }

 GetOverlappingInputs的目标是找到level中和begin,end重叠的文件,并放到builder中。getMaxNextLevelOverlappingBytes找到level+1中重叠的文件,并计算totalSize

/**
     * A helper class so we can efficiently apply a whole sequence
     * of edits to a particular state without creating intermediate
     * Versions that contain full copies of the intermediate state.
     */
    private static class Builder
    {
        private final VersionSet versionSet;
        private final Version baseVersion;
        private final List<LevelState> levels;

        private Builder(VersionSet versionSet, Version baseVersion)
        {
            this.versionSet = versionSet;
            this.baseVersion = baseVersion;

            levels = new ArrayList<>(baseVersion.numberOfLevels());
            for (int i = 0; i < baseVersion.numberOfLevels(); i++) {
                levels.add(new LevelState(versionSet.internalKeyComparator));
            }
        }

builder作为辅助类传入versionSet和base version,作为intermediate state 。 levels 记录了增加和删除的文件。

/**
         * Apply the specified edit to the current state.
         */
        public void apply(VersionEdit edit)
        {
            // Update compaction pointers
            for (Entry<Integer, InternalKey> entry : edit.getCompactPointers().entrySet()) {
                Integer level = entry.getKey();
                InternalKey internalKey = entry.getValue();
                versionSet.compactPointers.put(level, internalKey);
            }

            // Delete files
            for (Entry<Integer, Long> entry : edit.getDeletedFiles().entries()) {
                Integer level = entry.getKey();
                Long fileNumber = entry.getValue();
                levels.get(level).deletedFiles.add(fileNumber);
                // todo missing update to addedFiles?
            }

            // Add new files
            for (Entry<Integer, FileMetaData> entry : edit.getNewFiles().entries()) {
                Integer level = entry.getKey();
                FileMetaData fileMetaData = entry.getValue();

                // We arrange to automatically compact this file after
                // a certain number of seeks.  Let's assume:
                //   (1) One seek costs 10ms
                //   (2) Writing or reading 1MB costs 10ms (100MB/s)
                //   (3) A compaction of 1MB does 25MB of IO:
                //         1MB read from this level
                //         10-12MB read from next level (boundaries may be misaligned)
                //         10-12MB written to next level
                // This implies that 25 seeks cost the same as the compaction
                // of 1MB of data.  I.e., one seek costs approximately the
                // same as the compaction of 40KB of data.  We are a little
                // conservative and allow approximately one seek for every 16KB
                // of data before triggering a compaction.
                int allowedSeeks = (int) (fileMetaData.getFileSize() / 16384);
                if (allowedSeeks < 100) {
                    allowedSeeks = 100;
                }
                fileMetaData.setAllowedSeeks(allowedSeeks);

                levels.get(level).deletedFiles.remove(fileMetaData.getNumber());
                levels.get(level).addedFiles.add(fileMetaData);
            }
        }

首先将VersionEdit记录的compact_pointers更新,(internalKey和level)。 删除file就是将fileNumber添加到deletedFile,添加file(level和fileMetaData)时设置allowedSeeks,最少设置为100次。然后把要增加和删除的文件记录到自己的levels字段里面。

  1. 一次Seek耗时10ms
  2. 读写1MB耗时10ms,也就是我们的IO速度是100MB/s
  3. 一次Compaction,假设是1MB,需要消耗25MB的IO
    1. 需要从这一层读取1MB
    2. 从下一层读取10-12MB的数据(boundaries may be misaligned)
    3. 写10-12MB的数据到下一层

这说明25次Seek的开销等于1MB数据的Compaction成本,也就是一次Seek大概摊还下来是40KB数据的压缩成本。我们做一些保留,让16KB对应一次Compaction,也就是允许更多的Seek次数。

/**
         * Saves the current state in specified version.
         */
        public void saveTo(Version version)
                throws IOException
        {
            FileMetaDataBySmallestKey cmp = new FileMetaDataBySmallestKey(versionSet.internalKeyComparator);
            for (int level = 0; level < baseVersion.numberOfLevels(); level++) {
                // Merge the set of added files with the set of pre-existing files.
                // Drop any deleted files.  Store the result in *v.

                Collection<FileMetaData> baseFiles = baseVersion.getFiles().asMap().get(level);
                if (baseFiles == null) {
                    baseFiles = ImmutableList.of();
                }
                SortedSet<FileMetaData> addedFiles = levels.get(level).addedFiles;
                if (addedFiles == null) {
                    addedFiles = ImmutableSortedSet.of();
                }

                // files must be added in sorted order so assertion check in maybeAddFile works
                ArrayList<FileMetaData> sortedFiles = new ArrayList<>(baseFiles.size() + addedFiles.size());
                sortedFiles.addAll(baseFiles);
                sortedFiles.addAll(addedFiles);
                Collections.sort(sortedFiles, cmp);

                for (FileMetaData fileMetaData : sortedFiles) {
                    maybeAddFile(version, level, fileMetaData);
                }

                //#ifndef NDEBUG  todo
                // Make sure there is no overlap in levels > 0
                version.assertNoOverlappingFiles();
                //#endif
            }
        }

cmp记录internalKey中的最小key。for循环中,我们依次处理每一层的合并。主要内容是:

  1. 将添加的文件合并到files
  2. 删除文件

记录baseFile和addedFile,添加的新file必须有序。并且根据cmp进行排序。for循环依次加入maybeAddFile , 判断file是否overlap

private void maybeAddFile(Version version, int level, FileMetaData fileMetaData)
                throws IOException
        {
            if (levels.get(level).deletedFiles.contains(fileMetaData.getNumber())) {
                // File is deleted: do nothing
            }
            else {
                List<FileMetaData> files = version.getFiles(level);
                if (level > 0 && !files.isEmpty()) {
                    // Must not overlap
                    boolean filesOverlap = versionSet.internalKeyComparator.compare(files.get(files.size() - 1).getLargest(), fileMetaData.getSmallest()) >= 0;
                    if (filesOverlap) {
                        // A memory compaction, while this compaction was running, resulted in a a database state that is
                        // incompatible with the compaction.  This is rare and expensive to detect while the compaction is
                        // running, so we catch here simply discard the work.
                        throw new IOException(String.format("Compaction is obsolete: Overlapping files %s and %s in level %s",
                                files.get(files.size() - 1).getNumber(),
                                fileMetaData.getNumber(), level));
                    }
                }
                version.addFile(level, fileMetaData);
            }
        }

maybeAddFile判断要add进version的file是否overlap

private static class FileMetaDataBySmallestKey
                implements Comparator<FileMetaData>
        {
            private final InternalKeyComparator internalKeyComparator;

            private FileMetaDataBySmallestKey(InternalKeyComparator internalKeyComparator)
            {
                this.internalKeyComparator = internalKeyComparator;
            }

            @Override
            public int compare(FileMetaData f1, FileMetaData f2)
            {
                return ComparisonChain
                        .start()
                        .compare(f1.getSmallest(), f2.getSmallest(), internalKeyComparator)
                        .compare(f1.getNumber(), f2.getNumber())
                        .result();
            }
        }
FileMetaDataBySmallestKey继承Comparator<FileMetaData>  使用internalKeyComparator。比较FileMetaData的最小key,start,number和result
private static class LevelState
        {
            private final SortedSet<FileMetaData> addedFiles;
            private final Set<Long> deletedFiles = new HashSet<Long>();

            public LevelState(InternalKeyComparator internalKeyComparator)
            {
                addedFiles = new TreeSet<FileMetaData>(new FileMetaDataBySmallestKey(internalKeyComparator));
            }

            @Override
            public String toString()
            {
                final StringBuilder sb = new StringBuilder();
                sb.append("LevelState");
                sb.append("{addedFiles=").append(addedFiles);
                sb.append(", deletedFiles=").append(deletedFiles);
                sb.append('}');
                return sb.toString();
            }
        }

将新文件和删除文件的FileMetaData 写入到levelState中

参考:LevelDB之Compaction实现 | Calvin's Marbles (calvinneo.com)

   庖丁解LevelDB之版本控制 | CatKang的博客

  VersionSet-levelDB源码解析_道希的博客-CSDN博客

  leveldb笔记之17:major compaction之筛选文件 - Ying's Blog (izualzhy.cn)

posted @ 2022-07-22 17:29  只能说运气有点好  阅读(64)  评论(0)    收藏  举报