背景:在使用Ranger鉴权的过程中,要求必须开启impersonation功能(即执行用户与提交用户保持一致,而不是统一代理的hive/spark)。但是在执行的过程中,会需要在hdfs存储临时的文件,此时容易出现权限不足的问题。对此,我们需要关注这些路径的生成/使用规则。

路径分析

报错异常的日志来自DagUtils.java,这里我们没有粘贴完全

ERROR : Failed to execute tez graph.
Caused by: org.apache.hadoop.ipc.RemoteException: Permission denied: user=your_name, access=WRITE, inode="/user":hdfs:hdfsadmingroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:255)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1879)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1863)

分析源码

UserGroupInformation ugi = Utils.getUGI();
String userName = ugi.getShortUserName();
String userPathStr = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USER_INSTALL_DIR);
Path userPath = new Path(userPathStr);
FileSystem fs = userPath.getFileSystem(conf);

String jarPathStr = userPathStr + "/" + userName;

由于没有显示指定HiveConf.ConfVars.HIVE_USER_INSTALL_DIR

HIVE_JAR_DIRECTORY("hive.jar.directory", null,
        "This is the location hive in tez mode will look for to find a site wide \n" +
        "installed hive instance."),
        
HIVE_USER_INSTALL_DIR("hive.user.install.directory", "/user/",
        "If hive (in tez mode only) cannot find a usable hive jar in \"hive.jar.directory\", \n" +
        "it will upload the hive jar to \"hive.user.install.directory/user.name\"\n" +
        "and use it to run queries."),

jarPathStr = "/user/user.name"

https://cwiki.apache.org/confluence/display/hive/configuration+properties#ConfigurationProperties-hive.user.install.directory

与官方文档描述的一致:

  • hive.user.install.directory

    If Hive (in Tez mode only) cannot find a usable Hive jar in hive.jar.directory, it will upload the Hive jar to <hive.user.install.directory>/<user_name> and use it to run queries.

     

同理,spark在执行时会将文件上传到hdfs的.sparkStaging/applicationId目录下

hdfs://yourcluster/user/your_username

与官方文档描述的一致:https://spark.apache.org/docs/latest/running-on-yarn.html#spark-properties

spark.yarn.stagingDir Current user's home directory in the filesystem Staging directory used while submitting applications.

 

路径配置

hadoop fs -mkdir -p /user/ranger/hive/
hadoop fs -chmod -R 777 /user/ranger/hive/

hadoop fs -mkdir -p /user/ranger/spark/staging/
hadoop fs -chmod -R 777 /user/ranger/spark/staging/

 

hive-site.xml里的 hive.user.install.directory 参数,定义了HDFS的路径

sudo vi /etc/hive/conf/hive-site.xml,增加下面的内容

<property>
  <name>hive.user.install.directory</name>
  <value>/user/ranger/hive/
</value> </property>
保存后,重启服务

sudo systemctl restart hive-server2.service

 
 
sudo vi /etc/spark/conf/spark-defaults.conf
增加
spark.yarn.stagingDir /user/ranger/spark/staging

不需要重启,yarn是实时调用生效的。

posted on 2022-11-16 16:04  我爱吃胡萝卜  阅读(306)  评论(0)    收藏  举报