提问人:Sumit Kumar 提问时间:8/27/2023 更新时间:8/27/2023 访问量:28
Spark 作业在将数据从 Kafka 加载到 Hive 时抛出异常
Spark job throwing exceptions during loading data from Kafka to Hive
问:
我们有一个大数据集群,我们在 Kakfa 主题中有数据,我们使用 spark 作业(使用 java 8)将其加载到 hive。我使用的是 Cloudera 7.1.7 版本和 spark 版本 (2.4.7.7.1.7.1000-141) SP1 和 SP2 甚至 7.1.6 版本。仍然有一些例外。我认为存在一些不允许写入蜂巢的权限问题。因为我尝试使用 spark shell 在表中加载一些数据进行测试,所以它是用 HDFS 编写的,而不是用 hive 编写的。下面是 spark 作业异常截图和 spark shell 中安全问题的截图 -
WARN metadata.Hive: No partition is generated by dynamic partitioning
ERROR streaming.AKafkaSparkStreamingService: null; org.apache.spark.sql.AnalysisException: null;
org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:937)
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:897)
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:185)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:212)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:101)
org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectBase$class.run(CreateHiveTableAsSelectCommand.scala:55)
org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:103)
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:704)
org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:502)
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:481)
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:429)
com.gamma.skybase.spark.services.loader.TransformedStreamHiveLoader.onTabularDataset(TransformedStreamHiveLoader.java:45)
com.gamma.skybase.spark.services.streaming.avro.AKafkaAvroSparkStreamingService.onInitDataset(AKafkaAvroSparkStreamingService.java:138)
com.gamma.skybase.spark.services.streaming.AKafkaSparkStreamingService.lambda$start$f87052e0$1(AKafkaSparkStreamingService.java:84)
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NullPointerException
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:3047)
org.apache.spark.sql.hive.client.Shim_cdpd.loadDynamicPartitions(HiveShim.scala:1605)
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:940)
ERROR scheduler.JobScheduler: Error running job streaming job 1692949410000 ms.0
com.gamma.components.exceptions.AppUnexpectedException: Failed processing : , e -> null;
AKafkaSparkStreamingService.lambda$start$f87052e0$1(AKafkaSparkStreamingService.java:87)
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:704)
org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:502)
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:481)
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:429)
com.gamma.skybase.spark.services.loader.TransformedStreamHiveLoader.onTabularDataset(TransformedStreamHiveLoader.java:45)
com.gamma.skybase.spark.services.streaming.avro.AKafkaAvroSparkStreamingService.onInitDataset(AKafkaAvroSparkStreamingService.java:138)
com.gamma.skybase.spark.services.streaming.AKafkaSparkStreamingService.lambda$start$f87052e0$1(AKafkaSparkStreamingService.java:84)
Caused by: java.lang.NullPointerException
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:3047)
org.apache.spark.sql.hive.client.Shim_cdpd.loadDynamicPartitions(HiveShim.scala:1605) org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:940)
在此处输入图像描述,在此处输入图像描述
您的帮助将不胜感激。谢谢。
答: 暂无答案
评论