FSImage 和Edits Log文件用于保存Namenode节点的元数据,用于持久化保存HDFS里各个数据文件之间的对应关系。FSImage在硬盘式以文件的方式保存集群中包括文件目录,数据块与相关datanode之间的映射关系。可能基于性能的考虑, FSImage并不是实时的更新以反映当前HDFS的文件及目录情况, 当前HDFS对于文件及目录等操作都以日志的形式保存于edits.log文件中,基于最小化停机时间的考虑,会存在一个备用的namenode节点, 通过IPC通信,定期的将edits.logs合并进FSImage中, 这样在HDFS下次重启时,namenode将花费较少的时间基于FSImage和edits.log文件在内存中重建HDFS。 感觉有点类似于oralce里面的dynamic check point.
FSImage作为存储集群里面相关文件名及其一系列block与datanode的映射关系, 其存储结构又是怎么样呢? 我们通过分析org.apache.hadoop.hdfs.server.namenode.FSImage可一窥究竟。
Hadoop1.2.1
1 boolean loadFSImage(File curFile) throws IOException { 2 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); 3 FSDirectory fsDir = fsNamesys.dir; 4 5 // 6 // Load in bits 7 // 8 boolean needToSave = true; 9 DataInputStream in = new DataInputStream(new BufferedInputStream( 10 new FileInputStream(curFile))); 11 try { 12 // read image version: first appeared in version -1 13 //image版本号 14 int imgVersion = in.readInt(); 15 // read namespaceID: first appeared in version -2 16 //命名空间id 17 this.namespaceID = in.readInt(); 18 19 // read number of files 20 //文件或目录的数目,根据版本的不同,加以区别 21 long numFiles; 22 if (imgVersion <= -16) { 23 numFiles = in.readLong(); 24 } else { 25 numFiles = in.readInt(); 26 } 27 28 this.layoutVersion = imgVersion; 29 // read in the last generation stamp. 30 //时间戳 31 if (imgVersion <= -12) { 32 long genstamp = in.readLong(); 33 fsNamesys.setGenerationStamp(genstamp); 34 } 35 36 needToSave = (imgVersion != FSConstants.LAYOUT_VERSION); 37 38 // read file info 39 short replication = FSNamesystem.getFSNamesystem().getDefaultReplication(); 40 41 LOG.info("Number of files = " + numFiles); 42 43 String path; 44 String parentPath = ""; 45 INodeDirectory parentINode = fsDir.rootDir; 46 //开始重建目录树 47 for (long i = 0; i < numFiles; i++) { 48 long modificationTime = 0; 49 long atime = 0; 50 long blockSize = 0; 51 path = readString(in);//文件或者目录的路径名 52 replication = in.readShort();//副本因子,默认为3,可配置 (如果是目录,这里应为0) 53 replication = FSEditLog.adjustReplication(replication); 54 modificationTime = in.readLong();//文件的mtime 55 if (imgVersion <= -17) { 56 atime = in.readLong(); //atime 57 } 58 if (imgVersion <= -8) { 59 blockSize = in.readLong(); //block的大小,(目录为0) 60 } 61 int numBlocks = in.readInt(); //对应文件所包含的block总数,(目录为0) 62 Block blocks[] = null; 63 64 // for older versions, a blocklist of size 0 65 // indicates a directory. 66 if ((-9 <= imgVersion && numBlocks > 0) || 67 (imgVersion < -9 && numBlocks >= 0)) { 68 blocks = new Block[numBlocks]; 69 for (int j = 0; j < numBlocks; j++) { 70 blocks[j] = new Block(); 71 if (-14 < imgVersion) { 72 blocks[j].set(in.readLong(), in.readLong(), 73 Block.GRANDFATHER_GENERATION_STAMP); 74 } else { 75 blocks[j].readFields(in); 76 } 77 } 78 } 79 // Older versions of HDFS does not store the block size in inode. 80 // If the file has more than one block, use the size of the 81 // first block as the blocksize. Otherwise use the default block size. 82 // 83 if (-8 <= imgVersion && blockSize == 0) { 84 if (numBlocks > 1) { 85 blockSize = blocks[0].getNumBytes(); 86 } else { 87 long first = ((numBlocks == 1) ? blocks[0].getNumBytes(): 0); 88 blockSize = Math.max(fsNamesys.getDefaultBlockSize(), first); 89 } 90 } 91 92 // get quota only when the node is a directory 93 long nsQuota = -1L; 94 if (imgVersion <= -16 && blocks == null) { 95 nsQuota = in.readLong();//nsQuota 96 } 97 long dsQuota = -1L; 98 if (imgVersion <= -18 && blocks == null) { 99 dsQuota = in.readLong();//dsQuota 100 } 101 102 PermissionStatus permissions = fsNamesys.getUpgradePermission(); 103 if (imgVersion <= -11) { 104 permissions = PermissionStatus.read(in); 105 } 106 if (path.length() == 0) { // it is the root 107 // update the root's attributes 108 if (nsQuota != -1 || dsQuota != -1) { 109 fsDir.rootDir.setQuota(nsQuota, dsQuota); 110 } 111 fsDir.rootDir.setModificationTime(modificationTime); 112 fsDir.rootDir.setPermissionStatus(permissions); 113 continue; 114 } 115 // check if the new inode belongs to the same parent 116 if(!isParent(path, parentPath)) { 117 parentINode = null; 118 parentPath = getParent(path); 119 } 120 // add new inode 121 parentINode = fsDir.addToParent(path, parentINode, permissions, 122 blocks, replication, modificationTime, 123 atime, nsQuota, dsQuota, blockSize); 124 } 125 126 // load datanode info 127 this.loadDatanodes(imgVersion, in); 128 129 // load Files Under Construction 130 this.loadFilesUnderConstruction(imgVersion, in, fsNamesys); 131 132 this.loadSecretManagerState(imgVersion, in, fsNamesys); 133 134 } finally { 135 in.close(); 136 } 137 138 return needToSave; 139 } 140 141 public void set(long blkid, long len, long genStamp) { 142 this.blockId = blkid; 143 this.numBytes = len; 144 this.generationStamp = genStamp; 145 }
imgVersion(int):当前image的版本信息 namespaceID(int):unknown numFiles(long):整个文件系统中包含有多少文件和目录 genStamp(long):image的时间戳 path(String):该目录或文件的路径, replications(short):副本数 mtime(long):mtime atime(long):atime blocksize(long):目录的blocksize都为0 numBlocks(int):实际有多少个文件块,目录的该值都为-1 if(numBlocks > 0){ blockid(long):属于该文件的block的blockid, numBytes(long):该block的大小 genStamp(long):该block的时间戳 } nsQuota(long):namespace Quota值,若没加Quota限制则为-1 dsQuota(long):disk Quota值,若没加限制则也为-1 ... .. .其他fields
Remark: 实际代码中对于不同版本的FSImage文件有一些分支判断
没有找到官方对于FSImage文件结构的描述,只能通过源码进行推断。 如果对于FSImage及edits.log文件结构清楚后, 应该可以实现脱离hadoop client 或者API对hdfs进行离线分析。 比如直接访问FSImage得到HDFS中的文件清单等信息,甚至直接定位到相关的datanode上的block
浙公网安备 33010602011771号