提问人:Wyatt Gormley 提问时间:8/25/2023 更新时间:8/25/2023 访问量:26
Pandas 2.0 dtype 推广似乎产生了任意精度的浮点数
Pandas 2.0 dtype promotion seeming to produce arbitrary precision float
问:
工作涉及财务报告。在测试 dtype 提升时,我编写了以下脚本(知道应在生产代码中使用显式 dtype 转换),并注意到在 UInt64 值以上,pandas 可以转换为列 dtype,存储 python 对象。当此列进行除法时,输出列不再是 python int 类型。我希望像 OR(具有双精度)这样的东西。检查输出返回 ,尽管表示是八进制精度,采用非浮点格式。谁能帮我了解这里发生了什么?object
int
np.float64
float
float
import pandas as pd
import numpy as np
def main():
df = setup()
print_full(df.dtypes)
print('###########################')
print_full(df['UInt256'].head())
print(type(df.loc[0, 'UInt256']))
print('###########################')
df['A'] = df['UInt256'].div(2)
print_full(df['A'].head())
print(type(df.loc[0, 'A']))
return
def setup():
df = pd.DataFrame(
pd.Series(np.arange(0, 2 ** 8), dtype=pd.UInt8Dtype()),
columns=['UInt8'])
df['UInt16'] = df['UInt8'] * 2 ** (2**3)
df['UInt32'] = df['UInt16'] * 2 ** (2**4)
df['UInt64'] = df['UInt32'] * 2 ** (2**5)
df['UInt128'] = df['UInt64'] * 2 ** (2**6)
df['UInt256'] = df['UInt128'] * 2 ** (2**7)
return df
def print_full(x):
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 2000)
pd.set_option('display.float_format', '{:20,.2f}'.format)
pd.set_option('display.max_colwidth', None)
print(x)
if 'dtypes' in dir(x):
print(x.dtypes)
pd.reset_option('display.max_rows')
pd.reset_option('display.max_columns')
pd.reset_option('display.width')
pd.reset_option('display.float_format')
pd.reset_option('display.max_colwidth')
if __name__ == '__main__':
main()
运行此操作可生成:
UInt8 UInt8
UInt16 UInt16
UInt32 UInt32
UInt64 UInt64
UInt128 object
UInt256 object
dtype: object
object
###########################
0 0
1 452312848583266388373324160190187140051835877600158453279131187530910662656
2 904625697166532776746648320380374280103671755200316906558262375061821325312
3 1356938545749799165119972480570561420155507632800475359837393562592731987968
4 1809251394333065553493296640760748560207343510400633813116524750123642650624
Name: UInt256, dtype: object
object
<class 'int'>
###########################
0 0.00
1 226,156,424,291,633,194,186,662,080,095,093,570,025,917,938,800,079,226,639,565,593,765,455,331,328.00
2 452,312,848,583,266,388,373,324,160,190,187,140,051,835,877,600,158,453,279,131,187,530,910,662,656.00
3 678,469,272,874,899,582,559,986,240,285,280,710,077,753,816,400,237,679,918,696,781,296,365,993,984.00
4 904,625,697,166,532,776,746,648,320,380,374,280,103,671,755,200,316,906,558,262,375,061,821,325,312.00
Name: A, dtype: object
object
<class 'float'>
在 int 对象的对象 dtype 列上尝试 pandas 列划分,期望输出双精度 python float。结果类型检查为浮点数,但具有逗号分隔的格式和 77 位有效数字。在 2015 年的 macbook 上使用 python 3.11.2、pandas 2.0.0rc......
答: 暂无答案
评论