提问人:Shyam 提问时间:8/16/2023 最后编辑:Shyam 更新时间:8/18/2023 访问量:40
Spark Java sum 给出的值不正确
Spark Java sum is giving incorrect value
问:
Spark Java sum 给出的值不正确
Java 示例代码如下
List<Double> points = Arrays.asList(-6221.4, 6380.46);
Dataset<Row> dt = spark.createDataset(points, Encoders.DOUBLE()).toDF("double_vals");
dt.createOrReplaceTempView("dual_table");
spark.sql("select sum(double_vals) from dual_table").show(false);
预期结果是 159.06,但我得到的结果如下
+-----------------+
|sum(double_vals) |
+-----------------+
|159.0600000000004|
+-----------------+
我做错了什么吗?
扩展示例,如果小数在求和后是动态的,如下所示,我将无法将其限制为小数点后 2 位,有什么解决方案吗?
Tuple3<String,String,Double> val1 = new Tuple3<>("Day1","Ram", -6221.4);
Tuple3<String,String,Double> val2 = new Tuple3<>("Day2","Ram", 6380.46);
Tuple3<String,String,Double> val3 = new Tuple3<>("Day1","Sam", 380.46);
Tuple3<String,String,Double> val4 = new Tuple3<>("Day2","Sam", 6380.462);
List<Tuple3<String,String,Double>> points = Arrays.asList(val1,val2,val3,val4);
Dataset<Row> dt = spark.createDataset(points, Encoders.tuple(Encoders.STRING(),Encoders.STRING(),Encoders.DOUBLE())).toDF("day","name","profit");
dt.createOrReplaceTempView("dual_table");
Dataset<Row> newDs = spark.sql("select NAME, sum(profit) sum_val from dual_table group by name");
newDs.show();
结果将是
+----+------------------+
|NAME| sum_val|
+----+------------------+
| Ram| 159.0600000000004| This needs 2 decimal points
| Sam|6760.9220000000005| This needs 3 decimal points
+----+------------------+
答:
1赞
Srinivas
8/16/2023
#1
使用函数截断精度round
spark.sql("select round(sum(double_vals), 2) as sum_value from dual_table").show(false)
+---------+
|sum_value|
+---------+
|159.06 |
+---------+
用cast( sum(<column name>) AS decimal(10, 2))
spark.sql("select cast(sum(double_vals) as decimal(10, 2)) as sum_value from dual_table").show(false)
+---------+
|sum_value|
+---------+
|159.06 |
+---------+
评论
0赞
Coding thermodynamist
8/16/2023
如果要将结果用于中间计算,则将结果四舍五入是一种不好的做法。如果这样做,只会放大浮点运算误差。只有在应用程序的表示层中执行此操作才是可以接受的,以便将数字转换为人性化的格式,并使位数适应输入的精度
0赞
Shyam
8/17/2023
如果我们在带有 group by 的列上使用 sum 函数,这将不起作用,因为聚合后十进制位数会发生变化 Ex : 238.98355, 34.33, -263.4
评论
round
0.1 + 0.2 == 0.3