Python:获取email_list并删除重复的用户名,附加到新update_list?

Python: take email_list and remove duplicate usernames, append to new update_list?

提问人:Crimson 提问时间:11/17/2022 最后编辑:Crimson 更新时间:11/17/2022 访问量:48

问:

您如何对格式为 [email protected] 的电子邮件进行email_list,并将唯一名称附加到新update_list?我会使用此update_list并将其转换为 CamelCase,但我不确定如何仅获取索引的一部分来搜索重复项。有没有办法使用正则表达式?不断获取 TpeError:预期的字符串或类似字节的对象。

import re

input_list = []
email_list = []
dup_email_list = []
domain_gmail = []
domain_outlook = []
dup_domain_gmail = []
dup_domain_outlook = []
update_list = []
camel_list = []
n = 0

while n < 5:
    input_list = []
    email = []
    # input_string split by ','; ignores whitespace
    input_string = input('enter first, last name, ID and email domain: ')
    if input_string == 'done':
        n=5
        break
    else:
        input_list = [x.strip() for x in input_string.split(',')]
        print(input_list)
    # convert input_list into email format [email protected]
    email = "{0}.{1}@{3}.com".format(*input_list)
    # convert email to lowercase
    email_lower =email.lower()
    print(email)
    # check ID validity (9 digits)
    if input_list[2].isdigit() and len(input_list[2]) == 9:
        print('valid ID')
        continue
    else:
        print('invalid ID')
        n = 0
    # check domain validity (gmail or outlook)
    if input_list[3] == 'gmail':
        email_list.append(email)
        domain_gmail.append(email)
        n = 0
    elif input_list[3] == 'outlook':
        email_list.append(email)
        domain_outlook.append(email)
        n = 0
    else:
        print('invalid domain!')
        n = 0
if n == 5:
    # append unique email_list indexes to dup_email_list
    for x in email_list:
        if x not in dup_email_list:
            dup_email_list.append(x)
    # append unique emails from domain_gmail to new list
    for x in domain_gmail:
        if x not in dup_domain_gmail:
            dup_domain_gmail.append(x)
    # append unique emails from domain_outlook to new list
    for x in domain_outlook:
        if x not in dup_domain_outlook:
            dup_domain_outlook.append(x)
    # append dup_email_list to update_list
    for string in dup_email_list:
       update_list = re.match(r'[a-z]{1}[.]{1}[a-z]{1}', dup_email_list)
    # append names from update_list to camel_list in CamelCase format FirstLast
    for x in dup_email_list:
        while i < len.update_list[i]:
            camel_list = re.split(r'[a-z]{1}[.]{1}[a-z]{1}', dup_email_list)
    # print cases
    print('mail list: ', dup_email_list)
    print('After grouping: ', dup_domain_gmail, dup_domain_outlook)
    print('After updating: ', update_list)
    print('CamelCase list: ', camel_list)

python 字符串 电子邮件 输入 格式

评论

0赞 Alexander 11/17/2022
if input_list[2].isdigit() and len(input_list[2]) == 9: print('valid ID') continue.............删除continue
0赞 Crimson 11/17/2022
谢谢 - 它现在取得了进展,但我仍然被困在正则表达式问题上,即从 dup_email_list 中为 update_list 和 CamelCase 查找唯一的名字和姓氏

答:

0赞 Alexander 11/17/2022 #1

这应该有效......实际上,我最终根本没有使用正则表达式,只是坚持使用字符串方法来拆分电子邮件地址并将其转换为驼峰大小写......

我还对脚本的其余部分进行了一些小的清理,主要是删除不必要的行。

email_list = []
dup_email_list = []
domain_gmail = []
domain_outlook = []
dup_domain_gmail = []
dup_domain_outlook = []
update_list = []
camel_list = []
n = 0

while True:
    # input_string split by ','; ignores whitespace
    input_string = input('enter first, last name, ID and email domain: ')
    if input_string == 'done':
        n=5
        break
    input_list = [x.strip() for x in input_string.split(',')]
    print(input_list)
    # convert input_list into email format [email protected]
    email = "{0}.{1}@{3}.com".format(*input_list)
    # convert email to lowercase
    email_lower = email.lower()
    print(email)
    # check ID validity (9 digits)
    if input_list[2].isdigit() and len(input_list[2]) == 9:
        print('valid ID')
    else:
        print('invalid ID')
        continue
    # check domain validity (gmail or outlook)
    if input_list[3] not in ['gmail', 'outlook']:
        print('invalid domain!')
        continue
    email_list.append(email)
    if input_list[3] == 'gmail':
        domain_gmail.append(email)
    else:
        domain_outlook.append(email)

# append unique email_list indexes to dup_email_list
for x in email_list:
    if x not in dup_email_list:
        dup_email_list.append(x)
# append unique emails from domain_gmail to new list
for x in domain_gmail:
    if x not in dup_domain_gmail:
        dup_domain_gmail.append(x)
# append unique emails from domain_outlook to new list
for x in domain_outlook:
    if x not in dup_domain_outlook:
        dup_domain_outlook.append(x)
# append dup_email_list to update_list
update_list += dup_email_list
# append names from update_list to camel_list in CamelCase format FirstLast
for x in update_list:
   name, domain = x.split('@')
   first, last = name.split('.')
   name = first.title() + last.title()
   camel_email = "@".join([name, domain])
   camel_list.append(camel_email)   
# print cases
print('mail list: ', dup_email_list)
print('After grouping: ', dup_domain_gmail, dup_domain_outlook)
print('After updating: ', update_list)
print('CamelCase list: ', camel_list)
0赞 ukBaz 11/17/2022 #2

将脚本作为一个大函数进行测试和调试变得困难。

我建议你把它分解成更小的函数,更容易测试。具有处理数据的小函数。然后有一个具有脚本业务逻辑的 main 函数。

我还将使用 Python 数据类来存储信息而不是列表。这将使它更具可读性。该数据类也将成为单一事实来源。然后,要获取其他列表,请使用提取所需信息的函数。

要获取唯一的字符串列表,可以使用 Python 功能从列表中删除重复项。set

这可能看起来像一个示例:

from dataclasses import dataclass


@dataclass
class UserInfo:
    first_name: str
    last_name: str
    user_id: int
    domain: str
    email: str = ''

    def __post_init__(self):
        self.email = f"{self.first_name}.{self.last_name}@{self.domain}.com".casefold()


def check_id_valid(user_info):
    if not len(str(user_info.user_id)) == 9:
        print("User ID needs to be 9 digits")
        return False
    return True


def check_domain_valid(user_info):
    valid_domain = ['gmail', 'outlook']
    if user_info.domain not in valid_domain:
        print(f'{user_info.domain} is not a valid domain')
        return False
    return True


def user_input():
    # Keep looking round asking for information until it is the correct
    # format or `done` is entered
    while True:
        input_string = input('enter first, last name, ID and email domain: ')
        if input_string.casefold() == 'done':
            return None
        input_list = [field.strip() for field in input_string.split(',')]
        if len(input_list) == 4 and input_list[2].isdigit():
            user_info = UserInfo(first_name=input_list[0],
                                 last_name=input_list[1],
                                 user_id=int(input_list[2]),
                                 domain=input_list[3])
            if all((check_id_valid(user_info), check_domain_valid(user_info))):
                return user_info
        print('Enter the four fields separated by commas (,)')


def get_unique_emails(group_info):
    unique_emails = set()
    for user_info in group_info:
        unique_emails.add(user_info.email)
    return unique_emails


def get_domain(domain_name, group_info):
    email_in_domian = set()
    for user_info in group_info:
        if domain_name == user_info.domain:
            email_in_domian.add(user_info.email)
    return email_in_domian


def create_camelcase_email(group_info):
    unique_emails = set()
    for user_info in group_info:
        unique_emails.add((f'{user_info.first_name.title()}'
                           f'{user_info.last_name.title()}@'
                           f'{user_info.domain}.com'))
    return unique_emails


def collect_input(max_user=5):
    group_info = []
    collect_data = True
    while collect_data:
        user_info = user_input()
        if user_info:
            group_info.append(user_info)
        else:
            collect_data = False
        if len(group_info) == max_user:
            collect_data = False
    return group_info


def show_results(group_info):
    print('\n+++++++++++ Results  ++++++++++++')
    print('mail list:', get_unique_emails(group_info))
    for domain in ['gmail', 'outlook']:
        emails = get_domain(domain, group_info)
        if emails:
            print(f'Group {domain}: {emails}')
    print('CamelCase list: ', create_camelcase_email(group_info))


def main():
    group_info = collect_input()
    show_results(group_info)


if __name__ == '__main__':
    main()

以下是我所做的测试的成绩单:

enter first, last name, ID and email domain: sam, smith, 123456789, gmail
enter first, last name, ID and email domain: sam, smith, 123456789, gmail
enter first, last name, ID and email domain: sam, smith, 123456789, gmail
enter first, last name, ID and email domain: jane, smith, 123456789, gmail
enter first, last name, ID and email domain: jane, smith, 123456789, outlook

+++++++++++ Results  ++++++++++++
mail list: {'[email protected]', '[email protected]', '[email protected]'}
Group gmail: {'[email protected]', '[email protected]'}
Group outlook: {'[email protected]'}
CamelCase list:  {'[email protected]', '[email protected]', '[email protected]'}