数据库中的表列表-解网

问：

我实现了 DataFrame 的 DataFrame，其中包含 Databricks 数据库中所有表的所有列及其类型。

数据库	桌子	列	列类型
违约	表1	列 1	字符串
违约	表1	列 2	布尔
违约	表2	第 3 列	整数
违约	表2	第 4 列	字符串
违约	表2	专栏 5	字符串

谁能帮我添加两个额外的列，一列指示每个表中每列的空值数，另一列指示每个表中每列的空值的百分比？

数据库	桌子	列	列类型	空值	百分比
违约	表1	列 1	字符串	345	5%
违约	表1	列 2	布尔	0	0%
违约	表2	第 3 列	整数	98760	90%
违约	表2	第 4 列	字符串	56721	52%
违约	表2	专栏 5	字符串	1512	1%

提前致谢！

Python 代码：

table_name = 'table1'
df = spark.sql("SELECT * FROM {}".format(table_name))
col_null_cnt_df = df.select([count(when(col(c).isNull(),c)).alias(c) for c in df.columns])
col_null_cnt_df.show()

python 数据库 null databricks azure-databricks

from pyspark.sql.functions import lit,col,concat_ws 
df = df.withColumn('count_of_nulls',concat(lit('select * from '),concat_ws('.',*['Database','Table']),lit(' where isnull('),col('Column'),lit(')')))

df = df.withColumn('no_of_rows',concat(lit('select * from '),concat_ws('.',*['Database','Table'])))

enter image description here

现在，我在 spark API 上将此数据帧转换为 pandas 数据帧，以使用循环执行相应的操作并更新数据帧：

pdf  = df.to_pandas_on_spark()
#pdf

null_count = []
total_count = []
for i in pdf['count_of_nulls'].to_numpy():
    null_count.append(spark.sql(i).count())


for i in pdf['no_of_rows'].to_numpy():
    total_count.append(spark.sql(i).count())
    
print(null_count,total_count)
pdf['count_of_nulls'] = null_count
pdf['no_of_rows'] = total_count
#pdf

enter image description here

我将其转换回pyspark数据帧，然后计算百分比。

df = pdf.to_spark()
df.withColumn('percentage_of_nulls', col('count_of_nulls')/col('no_of_rows')*100).show()

enter image description here

上一个：检查 myFinal == null 在 flutter 中不起作用

下一个：值不能为 null。参数名称：source

数据库中的表列表

List of tables in a Database

评论