iTOL进化树调图细节记录


iTOL基本用法已经会了,之前记录过一点:系统发育(进化)树绘制小结。最近重用,调图时又发现了些细节,记录下备忘。

1. 注册

不注册也可用,但注册登录可保存树在itol网站上。
image.png

2. 去枝长

进化树能展示枝长是最好的,能用来判断材料和群体间的特殊性。但现在大部分文章中的进化树都是去掉了枝长的,也可以理解,样本太多,展示不好看。

处理前:
image.png

处理后:
image.png

3. 加图例

对进化树添加分类后的注释,图片的图例是需要自己设置的。不然就只能导出,用其他软件添加,或用文字说明。

比如这里我注释三圈,最内圈用range,外两圈用strip(不懂的看之前文章或看itol示例注释)。两部分设置有点不同。

range的注释图例可点开legend即可。可以用鼠标自由移动图例的相对距离。
image.png

strip的注释图例,需要自定义,如果有多重注释,注意看选择的是哪个数据集就对该数据集进行设置。最后同样的,图例位置可自由移动。
image.png

image.png

4. 无根树颜色

材料过多时,我们常用无根树来展示,这时一般展示枝长。很简单,去标签,注释上色即可。

image.png
image.png

直接拖拽注释文件上色。
image.png

但是由于树枝太多密集,颜色显示不是很明显。需要做一点调整:将线型调小一点。
image.png

此外,还可以尝试做一些高级调整,如去掉标尺、旋转图片角度等。
image.png

但因为无根树去掉了标签(样本名称),所以没有range注释,只有strip类型,需要手动设置图例(见上)。或者不用图例,直接在图片备注中说明即可。

5. 导出图片

当你要导出pdf、png等有长宽位置信息类型图片时,最好选择Full image,它会自动设置配置合适你的图片大小。如果用screen,那么在网页上是什么样,导出就是什么样,你不好把握尺寸。

另外,如果是png,分辨率也有必要设置一下,尽量大点。
image.png


更新补充
iTOL真是调图神器,任何细节都可调节,而且非常美观,曾尝试过ggtree,但效果不佳。顶级文章中的进化树多用iTOL吧。不过学习依然需要成本,记录要求以备忘。

6. 无根树添加背景

使用Manual annotation,有很多选项,有固定形状或手动绘制,可多尝试。
image.png
可设置线条或背景颜色、透明度等,个性化非常多。我试了一个例子:
image.png

7. 其他细节

旋转、枝长、标签、弱化线条等。
image.png

去掉tree scale box,展示叶节点等。
image.png

8.导出文本

可以将聚类的结果按顺序导出文本,便于材料分类的后续分析,很实用的功能,再也不用一个个去对照看了。

如果你的材料聚类非常清晰,找到每类的根节点,染色将其所有枝长(clade)标色,导出的文本就含有分组信息。

如果材料聚类不是很清晰,你不太好找到根节点标色,或者不是你预期的分组,这时你也可以把树作为整体标色并导出,导出后你只需找到每类的边界样本就可辨别,而非一个个去对照。

比如:
image.png
以上示例可大体判断聚为三类,找到三类根节点,右击将其标色。
如果找不到每类的节点,把全部的树标色也行(一定要标,不然导不出文本,或者只导出部分文本)。
image.png

选中后选中右上角导出标注文本即可:
image.png
image.png

完全是按进化树聚类的顺序。这时你就可以进行下游分类分析了。

常用的几种注释

多层分类,如群体遗传中不同类型分组时(品种类型:栽培农家野生,地区:南北方,生态类型等),常常用到多层注释。不同类型可用颜色、形状等注释,可在节点、枝条、标签(样本名)、条带、阴影等地方体现。

我常用的几种注释如下:

  • 标签上注释range
    注释可选择某节点,或者每个样本对应的组别(推荐之),注释文件如下:
TREE_COLORS
SEPARATOR TAB
DATA
I148	range	#eeffee	group1
I110	range	#ddddff	group2

image.png

  • 条带strip
    条带分类,且枝条颜色和条带一致。注释文件tol_color_strip.txt:
DATASET_COLORSTRIP
#lines starting with a hash are comments and ignored during parsing
#select the separator which is used to delimit the data below (TAB,SPACE or COMMA).This separator must be used throught this file (except in the SEPARATOR line, which uses space).

#SEPARATOR TAB
SEPARATOR SPACE
#SEPARATOR COMMA

#label is used in the legend table (can be changed later)
DATASET_LABEL color_strip1

#dataset color (can be changed later)
COLOR #ff0000

#optional settings

#all other optional settings can be set or changed later in the web interface (under 'Datasets' tab)
COLOR_BRANCHES 1
#maximum width
STRIP_WIDTH 25

#left margin, used to increase/decrease the spacing to the next dataset. Can be negative, causing datasets to overlap.
MARGIN 0

#border width; if set above 0, a black border of specified width (in pixels) will be drawn around the color strip 
BORDER_WIDTH 1
BORDER_COLOR #000

#show internal values; if set, values associated to internal nodes will be displayed even if these nodes are not collapsed. It could cause overlapping in the dataset display.
SHOW_INTERNAL 0

#In colored strip charts, each ID is associated to a color. Color can be specified in hexadecimal, RGB or RGBA notation
#Internal tree nodes can be specified using IDs directly, or using the 'last common ancestor' method described in iTOL help pages
#Actual data follows after the "DATA" keyword
DATA
#ID1 value1
#ID2 value2
160232 #caf390 COL#caf390
13773 #404c05 COL#404c05

image.png

只对条带分类。注释文件tol_color_strip2.txt:


#optional settings

#all other optional settings can be set or changed later in the web interface (under 'Datasets' tab)
COLOR_BRANCHES 1
#maximum width
STRIP_WIDTH 25

#left margin, used to increase/decrease the spacing to the next dataset. Can be negative, causing datasets to overlap.
MARGIN 0

#border width; if set above 0, a black border of specified width (in pixels) will be drawn around the color strip 
BORDER_WIDTH 1
BORDER_COLOR #000

#show internal values; if set, values associated to internal nodes will be displayed even if these nodes are not collapsed. It could cause overlapping in the dataset display.
SHOW_INTERNAL 0

#In colored strip charts, each ID is associated to a color. Color can be specified in hexadecimal, RGB or RGBA notation
#Internal tree nodes can be specified using IDs directly, or using the 'last common ancestor' method described in iTOL help pages
#Actual data follows after the "DATA" keyword
DATA
#ID1 value1
#ID2 value2
160232 #caf390 COL#caf390
13773 #404c05 COL#404c05

image.png

  • 对枝条或标签分组
    对枝条注释:文件如下:
TREE_COLORS
SEPARATOR SPACE
DATA
s54 clade #377EB8 normal 2
s212 clade #377EB8 normal 2
s219 clade #377EB8 normal 2
......

image.png
对标签注释,文件如下:

TREE_COLORS
SEPARATOR SPACE
DATA
s54 clade #377EB8 normal 2
s212 clade #377EB8 normal 2
s219 clade #377EB8 normal 2
......

image.png
利用这个可实现对标签分类,通过选择at tips,或将shift调为负数。
image.png
还可对节点末端设置形状,使之区分更为明显。
image.png
当然这种方法不是很好,样本多了会显示比较乱。不如直接设置symbol来调节节点。

  • (末端)节点注释symbol
    对末端节点注释在无根树优化中最常见,因为无根树不能像圈图或矩形图可无限加层来添加多类注释,一般就只能通过枝条和节点来区分组别。

这个注释文件iTOL的example_data注释文件是没有提供的。我也是在官方找了很久才找到示例:https://itol.embl.de/help/dataset_symbols_template.txt

DATASET_SYMBOL
#Symbol datasets allow the display of various symbols on the branches of the tree. For each node, one or more symbols can be defined.
#Each symbol's color, size and position along the branch can be specified.

#lines starting with a hash are comments and ignored during parsing
#=================================================================#
#                    MANDATORY SETTINGS                           #
#=================================================================#
#select the separator which is used to delimit the data below (TAB,SPACE or COMMA).This separator must be used throughout this file.
#SEPARATOR TAB
#SEPARATOR SPACE
SEPARATOR COMMA

#label is used in the legend table (can be changed later)
DATASET_LABEL,example symbols

#dataset color (can be changed later)
COLOR,#ffff00

#=================================================================#
#                    OPTIONAL SETTINGS                            #
#=================================================================#


#=================================================================#
#     all other optional settings can be set or changed later     #
#           in the web interface (under 'Datasets' tab)           #
#=================================================================#

#Each dataset can have a legend, which is defined using LEGEND_XXX fields below
#For each row in the legend, there should be one shape, color and label.
#Optionally, you can define an exact legend position using LEGEND_POSITION_X and LEGEND_POSITION_Y. To use automatic legend positioning, do NOT define these values
#Optionally, shape scaling can be present (LEGEND_SHAPE_SCALES). For each shape, you can define a scaling factor between 0 and 1.
#Optionally, shapes can be inverted (LEGEND_SHAPE_INVERT). When inverted, shape border will be drawn using the selected color, and the fill color will be white.
#Shape should be a number between 1 and 6, or any protein domain shape definition.
#1: square
#2: circle
#3: star
#4: right pointing triangle
#5: left pointing triangle
#6: checkmark

#LEGEND_TITLE,Dataset legend
#LEGEND_POSITION_X,100
#LEGEND_POSITION_Y,100
#LEGEND_SHAPES,1,2,3
#LEGEND_COLORS,#ff0000,#00ff00,#0000ff
#LEGEND_LABELS,value1,value2,value3
#LEGEND_SHAPE_SCALES,1,1,0.5
#LEGEND_SHAPE_INVERT,0,0,0


#largest symbol will be displayed with this size, others will be proportionally smaller.
MAXIMUM_SIZE,50

#symbols can be filled with solid color, or a gradient
#GRADIENT_FILL,1

#Internal tree nodes can be specified using IDs directly, or using the 'last common ancestor' method described in iTOL help pages
#=================================================================#
#       Actual data follows after the "DATA" keyword              #
#=================================================================#
#the following fields are required for each node:
#ID,symbol,size,color,fill,position,label
#symbol should be a number between 1 and 5:
#1: rectangle
#2: circle
#3: star
#4: right pointing triangle
#5: left pointing triangle
#6: checkmark

#size can be any number. Maximum size in the dataset will be displayed using MAXIMUM_SIZE, while others will be proportionally smaller
#color can be in hexadecimal, RGB or RGBA notation. If RGB or RGBA are used, dataset SEPARATOR cannot be comma.
#fill can be 1 or 0. If set to 0, only the outline of the symbol will be displayed.
#position is a number between 0 and 1 and defines the position of the symbol on the branch (for example, position 0 is exactly at the start of node branch, position 0.5 is in the middle, and position 1 is at the end)

DATA
#Examples

#internal node will have a red filled circle in the middle of the branch
#9606|184922,2,10,#ff0000,1,0.5

#node 100379 will have a blue star outline at the start of the branch, half the size of the circle defined above (size is 5 compared to 10 above)
#100379,3,5,#0000ff,0,0
#node 100379 will also have a filled green rectangle in the middle of the branch, same size as the circle defined above (size is 10)
#100379,1,10,#00ff00,1,0.5

表头说明看上面注释就好,数据说明:#ID,symbol,size,color,fill,position,label。如100379,1,10,#00ff00,1,0.5表示第一列节点/样本,第二列形状,第三列大小(只有大小不一时才能体现,若大小设置一样,则需要导入后才能用symbol size调节),第四列颜色,第五列填充,第六列位置(0是节点起始,0.5是中间,1是末端)。
image.png

posted @ 2021-07-18 23:56  生物信息与育种  阅读(5139)  评论(0编辑  收藏  举报