生成非四大的大所：biglocal

思考：大所的指标标准是什么？审计费用吗？还是要看被审计上市公司的指标？

	*重要原理：
tab audfirm,missing //688个audfirm缺失值

preserve
 keep if audfirm!="" 
 bys yr audfirm: egen ms=total(ta)
 duplicates drop yr audfirm,force 
 /*！！！注意：
1.因为涉及到duplicates删除数据，
     所以必须放在“preserve-restore”的临时文件中。
2.为什么要duplicates 呢，
     因为合并的时候至少有一张表的识别变量是unique的，不能m:n地匹配，
     而主表无法做到，所以只能交给附表完成这个unique要求了。 */

 bys yr:egen rankta=rank(ms),field 
      /*此行易错点俩：
        1.这里是求“年内所排名”，上行duplicates drop 后，得到是 yr-audfirm 的数据，
        所以无法再继续"bys yr audfirm"分组了，否则就一行为一组了。
        2.field得穿上，代表从高到低排列，最高值为第一个。 */
 save temp.dta,replace
 restore

 merge m:1 yr audfirm using temp.dta,keepus(rankta)
 drop if _m==2
 drop _m
 tab audfirm,missing //此时无audfirm缺失值
 gen biglocal = (rankta <= 10 & big4 == 0) if audfirm !=""
 /*注意：这里是“big4==0”，不是“big4!=1”啊！
 因为“!=1”包括两种情况：空白值、0. */

现实中的操作又是什么呢？请写出来。

**现实操作：

*一、先清洗事务所名称。

		gen audfirm1 = audfirm
		tab audfirm1, missing

local biglocal_simle 0  //是否开启简易法（不太准确）
local biglocal_split 0  //subinstr法生成事务所识别名称
local biglocal_subinstr 1 //subinstr法生成事务所识别名称

	**1.简易法（不太准确）——substr法
		if `biglocal_simle' == 1{
		gen audfirm2=substr(audfirm1,1,12)  //一个汉字占三个字节
			 preserve
				 keep if audfirm!="" //（这里最好写上一遍，以免上段命令省略，而缺失了这条重要命令。）
				 bys yr audfirm2: egen ms2=total(ta)
				 duplicates drop yr audfirm2,force /*易错一点：事务所变量用的是截取字节后的：audfirm2,而非未截取的原称。
													否则，截取字节的意义何在？！*/
				 bys yr:egen rankta2=rank(ms2),field
				 save temp.dta,replace
			 restore
		 merge m:1 yr audfirm2 using temp.dta,keepus(rankta2)
		 drop if _m==2
		 drop _m
		 gen biglocal2=(rankta<=10 & big4==0) if audfirm2!=""
		 tab biglocal2
		 }

	**2.分割法（准确）——split法 /subinstr法
               **2.1 split法
		if `biglocal_split' == 1{
		*将会计俩字作为分割点（这步的目的是什么）
		 cap drop audfirm11
		split audfirm1,p("会计") //加不加加g(stub)，结果并没有区别。只是生成的变量名一个是自动加数字；一个是自定义新名称，然后加数字。

		/*加g(stub)方式:
		split audfirm1,p("会计") g(af_)
		replace audfirm1 = af_1 //用“会计”前的字代替全称。
		drop af_*
		*/
		replace audfirm1=audfirm11

		*有限责任字段也删掉
		split audfirm11, p("有限责任") 

		*含集团或（集团）的也split掉
		split audfirm111, p("集团")
		tab audfirm1111 //这里的tab是为了检查事务所的名称是否达到我要的标准
		split audfirm1111,p("（")
		tab audfirm11111
		rename audfirm11111 audfirm2
		drop audfirm1*
		}
		
		** 2.2 subinstr法：
		if `biglocal_subinstr' == 1{
		keep if audfirm1!="" //非常重要的一步，否则后面egen ms=total(ta),空白项的ta加总起来可能占名次。
		cap drop audfirm11
		split audfirm,p("会计") g(af_)
		replace audfirm1 = af_1
		drop af_*
		/*上面三行的逻辑：可直接对audfirm进行split，因为新生成变量，不影响audfirm。
		为什么要g(af_)?正是为了后面把它赋值给audfirm1后，好方便删掉它（指af_*)
		
		*/
		replace audfirm1=subinstr(audfirm1,"有限责任","",1)
		replace audfirm1=subinstr(audfirm1,"集团","",1)
		replace audfirm1=subinstr(audfirm1,"（","",1)
		}

*二、再生成本土十大
		preserve
			bys yr audfirm1: egen ms = total(ta)
			duplicates drop yr audfirm1,force /*易错一点：事务所变量用的是截取字节后的：audfirm2,而非未截取的原称。
												否则，截取字节的意义何在？！*/
			bys yr: egen rankta = rank(ms),field
			save temp.dta,replace
		restore
		merge m:1 yr audfirm1 using temp.dta,keepus(rankta) //好像根据yr audfirm来匹配也可。
		drop if _m == 2
		drop _m
		gen biglocal = (rankta<=10 & big4 == 0) if audfirm != ""
		tab biglocal
save main.dta,replace

小知识链接：
田赛和径赛（英语：Track（径） and Field（田））是田赛及径赛的合称，是包括跑步、跳跃及投掷等技巧在内的运动竞赛[1]。以高度和距离长度计算成绩的跳跃、投掷项目叫“田赛”，以时间计算成绩的跑步项目叫“径赛”。田赛和径赛名称来自其典型的体育场地：体育场外围有椭圆形的跑道（径），其中有一块草地（田），是跳跃及投掷项目进行的场地。

The field option calculates the field rank of exp: the highest value is ranked 1, and there is no correction for ties.
That is, the field rank is 1 + the number of values that are higher.

The track option calculates the track rank of exp: the lowest value is ranked 1, and there is no correction for ties.
That is, the track rank is 1 + the number of values that are lower.

The unique option calculates the unique rank of exp: values are ranked 1,...,#, and values and ties are broken arbitrarily.
Two values that are tied for second are ranked 2 and 3. （并列第二的两个值分别排在第2和第3位。）（tied:并列）

posted @ 2020-09-04 15:32 将军练码阅读(229) 评论(0) 收藏举报

刷新页面返回顶部

今天练码了吗

生成非四大的大所：biglocal

思考：大所的指标标准是什么？审计费用吗？还是要看被审计上市公司的指标？

现实中的操作又是什么呢？请写出来。

公告