重大事故
使用mergeall命令不成功,显示too many variables specified,原因是有的dta中识别变量数据不唯一。
should have done:
预警信号:
|
对应的y观测大致4800,而某些控制变量的数据经过初处理,数量竟然多至40,000,必然是有重复数据的!除了`keep if substr(Accper,6,2)=="12"`,还要`duplicates tag ,gen()`后再`tab`检查。 |
尤其警惕变量 “会计期间”(有季报)、“报表类型”(有母公司报表)

每处理完一份数据,请查重:
点击查看代码
duplicates tag stkcd year,gen(dups)
tab dups

点击查看duplicates解析
duplicates reports, displays, lists, tags, or drops duplicate observations, depending on the subcommand specified. Duplicates are observations with identical values either on all variables if no varlist is specified or on a specified varlist.
duplicates report produces a table showing observations that occur as one or more copies and indicating how many observations are "surplus" in the sense that they are the second (third, ...) copy of the first of each group of duplicates.
duplicates examples lists one example for each group of duplicated observations. Each example represents the first occurrence of each group in the dataset.
duplicates list lists all duplicated observations.
duplicates tag generates a variable representing the number of duplicates for each observation. This will be 0 for all unique observations.
duplicates drop drops all but the first occurrence of each group of duplicated observations. The word drop may not be abbreviated.
Any observations that do not satisfy specified if and/or in conditions are ignored when you use report, examples, list, or drop. The variable created by tag will have missing values for such observations.

浙公网安备 33010602011771号