从数据帧中提取数字后,如何统一转换为一个计量单位后计算平均价格

After extracting numbers from dataframe, how can I calculate the average price after converting to one unit of measure uniformly

提问人:X.x 提问时间:9/10/2023 更新时间:9/11/2023 访问量:59

问:

我有一个数据集,其中包含不同的房产租金价格。它看起来像这样:

data = {
    'prices': [
        '$350.00',
        '$450.00 pw',
        '$325 per week',
        '$495pw - Views! White goods!',
        '$460p/w + gst and outgoings',
        '$300 wk',
        '$390pw / $1695pcm',
        '$180 pw / $782 pm',
        '$375 Per Week/Fully Furnished',
        '$350 pw + GST & Outgoings',
        'APPLY NOW - From $270 per week',
        '$185 per night',
        '$400pw incl. power',
        '$500 weekly',
        '$600 per week pw',
        '$850 per week (Fully furnished)',
        'FROM $400PW, FURNITURE AND BILLS INCLUDED',
        'THE DEAL- $780 PER WEEK',
        'THE DEAL: $1,400 PER WEEK',
        '$750/W Unfurnished',
        '$320 - fully furnished pw',
        '$330 PER WEEK | $1,430 P.C.M',
        'Enquire Now: $690 per week',
        '$460 per week / $1999 per month',
        '$490 per week/Under Application approved',
        '$1550pw - Location! Rare gem!',
        '295 per week',  # Example without a dollar sign
        'unit 2 - $780pw unit 3 - $760pw',  # Example with multiple prices
        '$2500 pw high, $1600pw low,$380 pn',  # Example with multiple prices
        'from $786 - $1572 per week',  # Example with multiple prices
        '$590 to $639',  # Example with a range
        '$280 - $290 pw'  # Example with a range
    ]
}

我的目标是清理这个“价格”列,以便只显示每周租金价格。

我未能管理最后五种数据,这就是我所做的:

df = pd.DataFrame(data)

def extract_weekly_price(text):
    price_match = re.search(r'\$?([\d,]+)', text)
    if price_match:
        price_str = price_match.group(1)
        price = int(price_str.replace(',', ''))
        
        # convert to weekly if not
        if re.search(r'(per week|p\.w\.|p/w|pw|/w|weekly)', text):
            return price  
        elif 'p.a' in text:
            return price / 52  
        elif re.search(r'(p\.c\.m|pcm|mth|pm)', text):
            return price / 4.33 
        elif 'per night' in text:
            return price * 7
        else:
            return price  
    else:
        return None

df['prices'] = df['prices'].str.lower()
df['Weekly_rent'] = df['prices'].apply(extract_weekly_price).round(3)

如何修改我的代码,以便获得这些数据的平均每周价格,例如“590 美元到 639 美元”或“每周 280 美元 - 290 美元”?如果您能提供帮助,我将不胜感激。

Python Pandas 正则表达式 数据帧

评论

0赞 Timeless 9/11/2023
应该如何计算这个的平均价格?(其中有两个每周价格和一个每晚/每日价格)。'$2500 pw high, $1600pw low,$380 pn'

答:

1赞 Andrej Kesely 9/11/2023 #1

下面是一个示例,如何使用 + 关键字从文本中获取价格(链接到 regex101):rematch

import re

data = {
    "prices": [
        "$350.00",
        "$450.00 pw",
        "$325 per week",
        "$495pw - Views! White goods!",
        "$460p/w + gst and outgoings",
        "$300 wk",
        "$390pw / $1695pcm",
        "$180 pw / $782 pm",
        "$375 Per Week/Fully Furnished",
        "$350 pw + GST & Outgoings",
        "APPLY NOW - From $270 per week",
        "$185 per night",
        "$400pw incl. power",
        "$500 weekly",
        "$600 per week pw",
        "$850 per week (Fully furnished)",
        "FROM $400PW, FURNITURE AND BILLS INCLUDED",
        "THE DEAL- $780 PER WEEK",
        "THE DEAL: $1,400 PER WEEK",
        "$750/W Unfurnished",
        "$320 - fully furnished pw",
        "$330 PER WEEK | $1,430 P.C.M",
        "Enquire Now: $690 per week",
        "$460 per week / $1999 per month",
        "$490 per week/Under Application approved",
        "$1550pw - Location! Rare gem!",
        "295 per week",  # Example without a dollar sign
        "unit 2 - $780pw unit 3 - $760pw",  # Example with multiple prices
        "$2500 pw high, $1600pw low,$380 pn",  # Example with multiple prices
        "from $786 - $1572 per week",  # Example with multiple prices
        "$590 to $639",  # Example with a range
        "$280 - $290 pw",  # Example with a range
    ]
}

pat = re.compile(
    r"(?i)(?:\$([\d+,.]+)\s*(?:-|to)\s*)?\$?([\d+,.]+)(?=\s*(?:(pw|per week|wk|p?/w|weekly|- fully furnished pw$|$)|(per night|pn)|(pm|per month|p\.c\.m|pcm)))"
)

for i, p in enumerate(data["prices"]):
    for price1, price2, week, night, month in pat.findall(p):
        match [price1, price2, week, night, month]:
            # single price/week
            case ["", _, _, "", ""]:
                price = float(price2.replace(",", ""))
                print(f"{i:<3} CASE 1: {p:<50} {price:<10} / week")

            # single price/night
            case ["", _, "", _, ""]:
                price = float(price2.replace(",", "").replace(".", ""))
                print(f"{i:<3} CASE 2: {p:<50} {price:<10} / night")

            # single price/month
            case ["", _, "", "", _]:
                price = float(price2.replace(",", "").replace(".", ""))
                print(f"{i:<3} CASE 3: {p:<50} {price:<10} / month")

            # range price/week
            case [_, _, _, "", ""]:
                price1 = float(price1.replace(",", "").replace(".", ""))
                price2 = float(price2.replace(",", "").replace(".", ""))
                price = (price1 + price2) / 2
                print(f"{i:<3} CASE 4: {p:<50} {price:<10} / week (mean)")

            # add cases here
            # ...

            case _:
                print(f"{i:<3} UNKNOWN: {p:<50}")

指纹:

0   CASE 1: $350.00                                            350.0      / week
1   CASE 1: $450.00 pw                                         450.0      / week
2   CASE 1: $325 per week                                      325.0      / week
3   CASE 1: $495pw - Views! White goods!                       495.0      / week
4   CASE 1: $460p/w + gst and outgoings                        460.0      / week
5   CASE 1: $300 wk                                            300.0      / week
6   CASE 1: $390pw / $1695pcm                                  390.0      / week
6   CASE 3: $390pw / $1695pcm                                  1695.0     / month
7   CASE 1: $180 pw / $782 pm                                  180.0      / week
7   CASE 3: $180 pw / $782 pm                                  782.0      / month
8   CASE 1: $375 Per Week/Fully Furnished                      375.0      / week
9   CASE 1: $350 pw + GST & Outgoings                          350.0      / week
10  CASE 1: APPLY NOW - From $270 per week                     270.0      / week
11  CASE 2: $185 per night                                     185.0      / night
12  CASE 1: $400pw incl. power                                 400.0      / week
13  CASE 1: $500 weekly                                        500.0      / week
14  CASE 1: $600 per week pw                                   600.0      / week
15  CASE 1: $850 per week (Fully furnished)                    850.0      / week
16  CASE 1: FROM $400PW, FURNITURE AND BILLS INCLUDED          400.0      / week
17  CASE 1: THE DEAL- $780 PER WEEK                            780.0      / week
18  CASE 1: THE DEAL: $1,400 PER WEEK                          1400.0     / week
19  CASE 1: $750/W Unfurnished                                 750.0      / week
20  CASE 1: $320 - fully furnished pw                          320.0      / week
21  CASE 1: $330 PER WEEK | $1,430 P.C.M                       330.0      / week
21  CASE 3: $330 PER WEEK | $1,430 P.C.M                       1430.0     / month
22  CASE 1: Enquire Now: $690 per week                         690.0      / week
23  CASE 1: $460 per week / $1999 per month                    460.0      / week
23  CASE 3: $460 per week / $1999 per month                    1999.0     / month
24  CASE 1: $490 per week/Under Application approved           490.0      / week
25  CASE 1: $1550pw - Location! Rare gem!                      1550.0     / week
26  CASE 1: 295 per week                                       295.0      / week
27  CASE 1: unit 2 - $780pw unit 3 - $760pw                    780.0      / week
27  CASE 1: unit 2 - $780pw unit 3 - $760pw                    760.0      / week
28  CASE 1: $2500 pw high, $1600pw low,$380 pn                 2500.0     / week
28  CASE 1: $2500 pw high, $1600pw low,$380 pn                 1600.0     / week
28  CASE 2: $2500 pw high, $1600pw low,$380 pn                 380.0      / night
29  CASE 4: from $786 - $1572 per week                         1179.0     / week (mean)
30  CASE 4: $590 to $639                                       614.5      / week (mean)
31  CASE 4: $280 - $290 pw                                     285.0      / week (mean)
0赞 Timeless 9/11/2023 #2

您可以测试此方法,看看平均价格是否计算正确:

import numpy as np

pat = (
    r"(?i)"
    r"(\d+[.,]?\d+)\s*" # prices
    r"(?:-.*\bfurnished\s*)?" # optional text
    r"(p/?w|wdk|per week|weekly|/w|" # weekly
    r"p\.?c\.?m|mth|pm|per month|" # monthly
    r"per night|pn|" # daily
    r"p.a)?" # ??
)

tmp = df["prices"].str.extractall(pat)
fn = lambda x: tmp[1].str.contains(x, case=False, na=False)
s0 = tmp[0].replace(",", "", regex=True).astype(float)

averge = np.select(
    [fn("w"), fn("n"), fn("m"), fn("p.a")],
    [s0, s0.mul(7), s0.div(4.33), s0.div(52)], default=s0
)

out = (
    df[["prices"]].join(
        tmp.assign(all_prices=averge.round(2)).groupby(level=0)
            .agg(
                computed_prices=("all_prices", list), # optional
                average=("all_prices", "mean")
            )
    )
)

正则表达式 : [demo]

输出:

print(out)


                                       prices           computed_prices  average
0                                     $350.00                   [350.0]   350.00
1                                  $450.00 pw                   [450.0]   450.00
2                               $325 per week                   [325.0]   325.00
3                $495pw - Views! White goods!                   [495.0]   495.00
4                 $460p/w + gst and outgoings                   [460.0]   460.00
5                                     $300 wk                   [300.0]   300.00
6                           $390pw / $1695pcm           [390.0, 391.45]   390.73
7                           $180 pw / $782 pm            [180.0, 180.6]   180.30
8               $375 Per Week/Fully Furnished                   [375.0]   375.00
9                   $350 pw + GST & Outgoings                   [350.0]   350.00
10             APPLY NOW - From $270 per week                   [270.0]   270.00
11                             $185 per night                  [1295.0]  1295.00
12                         $400pw incl. power                   [400.0]   400.00
13                                $500 weekly                   [500.0]   500.00
14                           $600 per week pw                   [600.0]   600.00
15            $850 per week (Fully furnished)                   [850.0]   850.00
16  FROM $400PW, FURNITURE AND BILLS INCLUDED                   [400.0]   400.00
17                    THE DEAL- $780 PER WEEK                   [780.0]   780.00
18                  THE DEAL: $1,400 PER WEEK                  [1400.0]  1400.00
19                         $750/W Unfurnished                   [750.0]   750.00
20                  $320 - fully furnished pw                   [320.0]   320.00
21               $330 PER WEEK | $1,430 P.C.M           [330.0, 1430.0]   880.00
22                 Enquire Now: $690 per week                   [690.0]   690.00
23            $460 per week / $1999 per month          [460.0, 13993.0]  7226.50
24   $490 per week/Under Application approved                   [490.0]   490.00
25              $1550pw - Location! Rare gem!                  [1550.0]  1550.00
26                               295 per week                   [295.0]   295.00
27            unit 2 - $780pw unit 3 - $760pw            [780.0, 760.0]   770.00
28         $2500 pw high, $1600pw low,$380 pn  [2500.0, 1600.0, 2660.0]  2253.33
29                 from $786 - $1572 per week           [786.0, 1572.0]  1179.00
30                               $590 to $639            [590.0, 639.0]   614.50
31                             $280 - $290 pw            [280.0, 290.0]   285.00

评论

1赞 X.x 9/11/2023
非常感谢!这帮了大忙!我根据您的代码进行了一些修改,基本上只是为了部分np.select