背景:在使用Ranger鉴权的过程中,要求必须开启impersonation功能(即执行用户与提交用户保持一致,而不是统一代理的hive/spark)。但是在执行的过程中,会需要在hdfs存储临时的文件,此时容易出现权限不足的问题。对此,我们需要关注这些路径的生成/使用规则。
路径分析
报错异常的日志来自DagUtils.java,这里我们没有粘贴完全
ERROR : Failed to execute tez graph. Caused by: org.apache.hadoop.ipc.RemoteException: Permission denied: user=your_name, access=WRITE, inode="/user":hdfs:hdfsadmingroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:255) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1879) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1863)
分析源码
UserGroupInformation ugi = Utils.getUGI(); String userName = ugi.getShortUserName(); String userPathStr = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USER_INSTALL_DIR); Path userPath = new Path(userPathStr); FileSystem fs = userPath.getFileSystem(conf); String jarPathStr = userPathStr + "/" + userName;
由于没有显示指定HiveConf.ConfVars.HIVE_USER_INSTALL_DIR
HIVE_JAR_DIRECTORY("hive.jar.directory", null,
"This is the location hive in tez mode will look for to find a site wide \n" +
"installed hive instance."),
HIVE_USER_INSTALL_DIR("hive.user.install.directory", "/user/",
"If hive (in tez mode only) cannot find a usable hive jar in \"hive.jar.directory\", \n" +
"it will upload the hive jar to \"hive.user.install.directory/user.name\"\n" +
"and use it to run queries."),
jarPathStr = "/user/user.name"
https://cwiki.apache.org/confluence/display/hive/configuration+properties#ConfigurationProperties-hive.user.install.directory
与官方文档描述的一致:
-
hive.user.install.directory
- Default Value:
hdfs:///user/ - Added In: Hive 0.13.0 with HIVE-5003 and HIVE-6098
If Hive (in Tez mode only) cannot find a usable Hive jar in hive.jar.directory, it will upload the Hive jar to <hive.user.install.directory>/<user_name> and use it to run queries.
- Default Value:
同理,spark在执行时会将文件上传到hdfs的.sparkStaging/applicationId目录下
hdfs://yourcluster/user/your_username
与官方文档描述的一致:https://spark.apache.org/docs/latest/running-on-yarn.html#spark-properties
spark.yarn.stagingDir |
Current user's home directory in the filesystem | Staging directory used while submitting applications. |
路径配置
hadoop fs -mkdir -p /user/ranger/hive/ hadoop fs -chmod -R 777 /user/ranger/hive/ hadoop fs -mkdir -p /user/ranger/spark/staging/ hadoop fs -chmod -R 777 /user/ranger/spark/staging/
hive-site.xml里的 hive.user.install.directory 参数,定义了HDFS的路径
sudo vi /etc/hive/conf/hive-site.xml,增加下面的内容
<property> <name>hive.user.install.directory</name> <value>/user/ranger/hive/
</value> </property>
保存后,重启服务sudo systemctl restart hive-server2.service
spark.yarn.stagingDir /user/ranger/spark/staging
不需要重启,yarn是实时调用生效的。
浙公网安备 33010602011771号