GCP 托管的 Databricks - DBFS 临时文件

问：

嗨，大家好，

我一直在努力在 GCP 托管的 Databricks 中 Hive 元存储中的架构级别获取 DDL。我已经实现了一个 Python 代码，该代码在 dbfs/temp 目录中生成 SQL 文件。但是，在运行代码时，我遇到了“找不到文件路径”错误。奇怪的是，该代码在 AWS 托管的 Databricks 帐户中正常运行。谁能阐明为什么GCP可能存在这种差异？

此外，我尝试使用用户界面检索结果，但由于 UI 限制，它没有提供所有 DDL。

是否有任何潜在的解决方法或建议来解决此问题？

下面是 Python 代码

# set your catalog name

catalog = "your_catalog_name"

# there should be a comma-separated list of schemas or single schema name

schemas = "schema_name".split(",")

spark.catalog.setCurrentCatalog(catalog)

# prepare file

folder_for_script = "/tmp/"

# creating a folder if it does not exist

dbutils.fs.mkdirs(folder_for_script)

file_path = "{}{}_ddl.sql".format(folder_for_script, catalog)

# creating and opening a file for writing

f = open("/dbfs"+file_path, "w")

f.truncate()        

for schema in schemas:   

   allTables = spark.catalog.listTables(schema)

   f.write("-- {}".format(schema))

   f.write("\n")

   for t in allTables:

       # skip temporary tables

       if t.isTemporary != True:

           try:   

               ddl = spark.sql("SHOW CREATE TABLE {}.{};".format(schema, t.name))

               f.write(ddl.first()[0]+";")

               f.write("\n")

           except Exception as error:

               f.write("\n --- START ERROR --- \n /*\n")

               f.write("name: {}.{},\ntableType: {} \n".format(t.namespace, t.name, t.tableType))

               f.write("Unknown exception: {}".format(error))

               f.write("*/\n --- END ERROR --- \n")

f.close()

# console output

script = spark.sparkContext.textFile(file_path)

file_data = script.collect()

for line in file_data:

   print(line)

谢谢。

SQL 文件未找到 FileNotFoundError GCP-Databricks

GCP 托管的 Databricks - DBFS 临时文件 - 未找到

GCP hosted Databricks - DBFS temp files - Not Found

评论