Pandas 2.0 dtype 推广似乎产生了任意精度的浮点数

Pandas 2.0 dtype promotion seeming to produce arbitrary precision float

提问人:Wyatt Gormley 提问时间:8/25/2023 更新时间:8/25/2023 访问量:26

问:

工作涉及财务报告。在测试 dtype 提升时,我编写了以下脚本(知道应在生产代码中使用显式 dtype 转换),并注意到在 UInt64 值以上,pandas 可以转换为列 dtype,存储 python 对象。当此列进行除法时,输出列不再是 python int 类型。我希望像 OR(具有双精度)这样的东西。检查输出返回 ,尽管表示是八进制精度,采用非浮点格式。谁能帮我了解这里发生了什么?objectintnp.float64floatfloat

import pandas as pd
import numpy as np


def main():
    df = setup()
    print_full(df.dtypes)
    print('###########################')
    print_full(df['UInt256'].head())
    print(type(df.loc[0, 'UInt256']))
    print('###########################')
    df['A'] = df['UInt256'].div(2)
    print_full(df['A'].head())
    print(type(df.loc[0, 'A']))
    return


def setup():
    df = pd.DataFrame(
        pd.Series(np.arange(0, 2 ** 8), dtype=pd.UInt8Dtype()),
        columns=['UInt8'])
    df['UInt16'] = df['UInt8'] * 2 ** (2**3)
    df['UInt32'] = df['UInt16'] * 2 ** (2**4)
    df['UInt64'] = df['UInt32'] * 2 ** (2**5)
    df['UInt128'] = df['UInt64'] * 2 ** (2**6)
    df['UInt256'] = df['UInt128'] * 2 ** (2**7)
    return df


def print_full(x):
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', 2000)
    pd.set_option('display.float_format', '{:20,.2f}'.format)
    pd.set_option('display.max_colwidth', None)
    print(x)
    if 'dtypes' in dir(x):
        print(x.dtypes)
    pd.reset_option('display.max_rows')
    pd.reset_option('display.max_columns')
    pd.reset_option('display.width')
    pd.reset_option('display.float_format')
    pd.reset_option('display.max_colwidth')


if __name__ == '__main__':
    main()

运行此操作可生成:

UInt8       UInt8
UInt16     UInt16
UInt32     UInt32
UInt64     UInt64
UInt128    object
UInt256    object
dtype: object
object
###########################
0                                                                               0
1     452312848583266388373324160190187140051835877600158453279131187530910662656
2     904625697166532776746648320380374280103671755200316906558262375061821325312
3    1356938545749799165119972480570561420155507632800475359837393562592731987968
4    1809251394333065553493296640760748560207343510400633813116524750123642650624
Name: UInt256, dtype: object
object
<class 'int'>
###########################
0                                                                                                     0.00
1   226,156,424,291,633,194,186,662,080,095,093,570,025,917,938,800,079,226,639,565,593,765,455,331,328.00
2   452,312,848,583,266,388,373,324,160,190,187,140,051,835,877,600,158,453,279,131,187,530,910,662,656.00
3   678,469,272,874,899,582,559,986,240,285,280,710,077,753,816,400,237,679,918,696,781,296,365,993,984.00
4   904,625,697,166,532,776,746,648,320,380,374,280,103,671,755,200,316,906,558,262,375,061,821,325,312.00
Name: A, dtype: object
object
<class 'float'>

在 int 对象的对象 dtype 列上尝试 pandas 列划分,期望输出双精度 python float。结果类型检查为浮点数,但具有逗号分隔的格式和 77 位有效数字。在 2015 年的 macbook 上使用 python 3.11.2、pandas 2.0.0rc......

pandas 十进制 精度

评论


答: 暂无答案