在循环中更新数据帧时正在创建新的 DataFrame

New DataFrame is being made when updating dataframe inside a loop

提问人:Prakhar Rathi 提问时间:3/6/2022 更新时间:3/6/2022 访问量:124

问:

我正在尝试以这种方式对循环中的三个数据帧进行一些更改。

for sheet in [f1, f2, f3]: 
    sheet = preprocess_df(sheet)

函数如下所示preprocess_df

def preprocess_df(df): 
    """ Making a function to preprocess a dataframe individually rather then all three together """
    
    # make column names uniform
    columns = [
        "Reporting_Type",
        "AA_name",
        "Date_DD/MM/YYYY",
        "Time_HHMMSS",
        "Type",
        "Name",
        "FI_Type",
        "Count_linked",
        "Average_timelag_FI_Notification",
        "FI_Ready_to_FI_request_ratio",
        "Count_Consent_Raised",
        "Actioned_to_raised_ratio",
        "Approved_to_raised_ratio",
        "FI_Ready_to_FI_request_ratio(Daily)",
        "Daily_Consent_Requests_Data_Delivered",
        "Total_Consent_Requests_Data_Delivered",
        "Consent_Requests_Data_Delivered_To_Raised_Ratio",
        "Daily_Consent_Requests_Raised",
        "Daily Consent_Requests_Data_Delivered_To_Raised_Ratio",
    ]
    
    # Set the sheet size 
    df = df.iloc[:, :19]
    
    # Set the column names 
    df.columns = columns

    return df 
    

我基本上是在更新列名并修复数据帧大小。我面临的问题是,如果我在循环中打印数据帧,变量确实会更新,但是,原始的 和数据帧不会更新。我认为这是因为该变量创建了 etc. 的副本,而不是实际使用相同的数据帧。这似乎是按引用传递或按值传递概念的扩展。有没有办法对循环内的所有工作表进行就地更改?sheetf1f2f3sheetf1

Python Pandas DataFrame 按引用传递

评论


答:

0赞 Stubborn 3/6/2022 #1

实际上,当您执行 . 但是,您可以通过使用 drop 来解决此问题,包括:df = df.iloc[:, :19]inplace=True

import pandas as pd
import numpy as np

def preprocess_df(df): 
    columns = [
        "a",
        "b",
    ] # Swap this list with yours
    df.drop(df.columns[:2],inplace=True, axis=1) # Replace 2 with 19 in your code
    df.columns = columns

f1 = pd.DataFrame(np.arange(12).reshape(3, 4),columns=['A', 'B', 'C', 'D']) # Just an example
preprocess_df(f1) # You can put this in your for loop
print(f1)

上面的代码将输出如下内容:

   a  b
0  0  1
1  4  5
2  8  9