python 如何将数据帧的结果插入到read

问：

我正在尝试使用从 Azure Blob 存储生成的数据帧的结果，并将其应用于下一步（它以某种方式提取数据）。

我已经测试了两端（从Azure Blob存储生成数据并使用正则表达式提取数据（如果我单独测试，它可以工作）），但我现在的挑战是将两段代码放在一起。

下面是第一部分（从 Azure Blob 存储获取数据帧）：

import re 
from io import StringIO
import pandas as pd
from azure.storage.blob import BlobClient


blob = BlobClient(account_url="https://test.blob.core.windows.net",
              container_name="xxxx",
              blob_name="Text.csv",
              credential="xxxx")

data = blob.download_blob()
df = pd.read_csv(data)

这是第二部分（仅从 csv 文件中提取部分部分）：

def read_block(names, igidx=True):
    with open("Test.csv") as f:   ###<<<This is where I would like to modify<<<###              
        pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
        return pd.concat([
            pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
                .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                    pat, f.read(), flags=re.M|re.S) if m.group(1) in names # optional
        ], keys=names, ignore_index=igidx)

df2 = read_block(names=["Admissions", "Readmissions"],igidx=False).droplevel(1).reset_index(names="Admission")

因此，我试图做的是使用第一个代码中的 df 并应用于第二个代码的输入部分，其中它说“打开（”Test.csv“）为 f。

如何修改此代码的第二部分以从第一部分获取数据结果？

或者，如果这不起作用，有没有办法使用从 Azure 生成的文件路径 ID（数据），如下所示？

<azure.storage.blob._download.StorageStreamDownloader object at 0x00000xxxxxxx>

更新：

我修改了代码如下，现在我收到 concat 错误：

我不确定这是由于没有任何循环功能（因为我修改为删除“with open（”Test.csv“）作为f：）。

...

data = blob.download_blob()
df = pd.read_csv(data)
df1 = df.to_csv(index=False, header=False)

def read_block(names, igidx=True):    
    pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
    return pd.concat([
        pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
            .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                pat, df1, flags=re.M|re.S) if m.group(1) in names 
    ], keys=names, ignore_index=igidx)

df2 = read_block(names=["Admissions", "Readmissions"], igidx=False).droplevel(1).reset_index(names="Admission")   
print(df2)

新形象：

这是错误消息：

这是最新代码（2023-11/13）：

import re 
from io import StringIO
import pandas as pd
from azure.storage.blob import BlobClient
blob = 
BlobClient(account_url="https://xxxx.blob.core.windows.net",
              container_name="xxxx",
              blob_name="SampleSafe.csv",               
              credential="xxxx")

data = blob.download_blob(); 
df = pd.read_csv(data); 
df1 = df.to_csv(index=False)

def read_block(names, igidx=True):    
    pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
    return pd.concat([
        pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
            .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                pat, data.readall(), flags=re.M|re.S)
               if m.group(1) in names], keys=names, ignore_index=igidx)

df2 = read_block(names=["Admissions", "Readmissions"], igidx=False).droplevel(1).reset_index(names="block")
print(df2)

这是详细的错误消息（2023-11 年 11 月 13 日更新）：

这是 df1 （11/18/2023）：

Division  FacilityName Census Admiss Readmiss  Discharges
          Test1         57    0      0         0
          Test3         2     0      0         1
          Test5         135   0      0         0
          Test6         9     0      0         0
          Test4         3     0      0         1
          Test2         76    0      0         0
          Blindsection  55    1      0         2
                
                
Admissions                  
Not Started: 12 Sent: 3 Completed: 3            
                
Division Community    ResiName  Date      DocStatus  LastUpdate
         TestStation  Jane Doe  9/12/2023 Sent       9/12/2023
         TestStation2 John Doe  9/12/2023 NotStarted    
         Alibaba      SuperMan  9/12/2023 NotStarted    
         Iceland      SuperWoma 9/12/2023 NotStarted    
                
                
Readmissions                    
Not Started: 1  Sent: 0 Completed: 1            
                
Division Community  ResidentName Date      DocStatus    Last Update
         StationK   PrettyWoman  9/12/2023 Not Started  
         MyGoodness UglyMan      7/21/2023 Completed    7/26/2023
                
                
Discharge                   
                
Division Community       ResidentName   Date        
         StationKingdom1 PrettyWoman2   8/22/2023       
         MyGoodness1     UglyMan1       4/8/2023        
         Landmark2       NiceGuys       9/12/2023       
         IcelandKingdom2 Mr.Heroshi2    7/14/2023       
         MoreKingdom2    KingKong       8/31/2023

这是使用以前的代码（11 月 18 日更新）的错误消息：

这是错误消息（11 月 19 日更新）：

python pandas 正则表达式 azure-blob-storage

data = blob.download_blob(max_concurrency=1, encoding="UTF-8")

def read_block(names, igidx=True):    
    pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
    return pd.concat([
        pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
            .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                pat, data.readall(), flags=re.M|re.S)
               if m.group(1) in names], keys=names, ignore_index=igidx)

python 如何将数据帧的结果插入到read_block函数中

python how to insert result of data frame into read_block function

评论

评论