提问人:Crimson 提问时间:11/17/2022 最后编辑:Crimson 更新时间:11/17/2022 访问量:48
Python:获取email_list并删除重复的用户名,附加到新update_list?
Python: take email_list and remove duplicate usernames, append to new update_list?
问:
您如何对格式为 [email protected] 的电子邮件进行email_list,并将唯一名称附加到新update_list?我会使用此update_list并将其转换为 CamelCase,但我不确定如何仅获取索引的一部分来搜索重复项。有没有办法使用正则表达式?不断获取 TpeError:预期的字符串或类似字节的对象。
import re
input_list = []
email_list = []
dup_email_list = []
domain_gmail = []
domain_outlook = []
dup_domain_gmail = []
dup_domain_outlook = []
update_list = []
camel_list = []
n = 0
while n < 5:
input_list = []
email = []
# input_string split by ','; ignores whitespace
input_string = input('enter first, last name, ID and email domain: ')
if input_string == 'done':
n=5
break
else:
input_list = [x.strip() for x in input_string.split(',')]
print(input_list)
# convert input_list into email format [email protected]
email = "{0}.{1}@{3}.com".format(*input_list)
# convert email to lowercase
email_lower =email.lower()
print(email)
# check ID validity (9 digits)
if input_list[2].isdigit() and len(input_list[2]) == 9:
print('valid ID')
continue
else:
print('invalid ID')
n = 0
# check domain validity (gmail or outlook)
if input_list[3] == 'gmail':
email_list.append(email)
domain_gmail.append(email)
n = 0
elif input_list[3] == 'outlook':
email_list.append(email)
domain_outlook.append(email)
n = 0
else:
print('invalid domain!')
n = 0
if n == 5:
# append unique email_list indexes to dup_email_list
for x in email_list:
if x not in dup_email_list:
dup_email_list.append(x)
# append unique emails from domain_gmail to new list
for x in domain_gmail:
if x not in dup_domain_gmail:
dup_domain_gmail.append(x)
# append unique emails from domain_outlook to new list
for x in domain_outlook:
if x not in dup_domain_outlook:
dup_domain_outlook.append(x)
# append dup_email_list to update_list
for string in dup_email_list:
update_list = re.match(r'[a-z]{1}[.]{1}[a-z]{1}', dup_email_list)
# append names from update_list to camel_list in CamelCase format FirstLast
for x in dup_email_list:
while i < len.update_list[i]:
camel_list = re.split(r'[a-z]{1}[.]{1}[a-z]{1}', dup_email_list)
# print cases
print('mail list: ', dup_email_list)
print('After grouping: ', dup_domain_gmail, dup_domain_outlook)
print('After updating: ', update_list)
print('CamelCase list: ', camel_list)
答:
0赞
Alexander
11/17/2022
#1
这应该有效......实际上,我最终根本没有使用正则表达式,只是坚持使用字符串方法来拆分电子邮件地址并将其转换为驼峰大小写......
我还对脚本的其余部分进行了一些小的清理,主要是删除不必要的行。
email_list = []
dup_email_list = []
domain_gmail = []
domain_outlook = []
dup_domain_gmail = []
dup_domain_outlook = []
update_list = []
camel_list = []
n = 0
while True:
# input_string split by ','; ignores whitespace
input_string = input('enter first, last name, ID and email domain: ')
if input_string == 'done':
n=5
break
input_list = [x.strip() for x in input_string.split(',')]
print(input_list)
# convert input_list into email format [email protected]
email = "{0}.{1}@{3}.com".format(*input_list)
# convert email to lowercase
email_lower = email.lower()
print(email)
# check ID validity (9 digits)
if input_list[2].isdigit() and len(input_list[2]) == 9:
print('valid ID')
else:
print('invalid ID')
continue
# check domain validity (gmail or outlook)
if input_list[3] not in ['gmail', 'outlook']:
print('invalid domain!')
continue
email_list.append(email)
if input_list[3] == 'gmail':
domain_gmail.append(email)
else:
domain_outlook.append(email)
# append unique email_list indexes to dup_email_list
for x in email_list:
if x not in dup_email_list:
dup_email_list.append(x)
# append unique emails from domain_gmail to new list
for x in domain_gmail:
if x not in dup_domain_gmail:
dup_domain_gmail.append(x)
# append unique emails from domain_outlook to new list
for x in domain_outlook:
if x not in dup_domain_outlook:
dup_domain_outlook.append(x)
# append dup_email_list to update_list
update_list += dup_email_list
# append names from update_list to camel_list in CamelCase format FirstLast
for x in update_list:
name, domain = x.split('@')
first, last = name.split('.')
name = first.title() + last.title()
camel_email = "@".join([name, domain])
camel_list.append(camel_email)
# print cases
print('mail list: ', dup_email_list)
print('After grouping: ', dup_domain_gmail, dup_domain_outlook)
print('After updating: ', update_list)
print('CamelCase list: ', camel_list)
0赞
ukBaz
11/17/2022
#2
将脚本作为一个大函数进行测试和调试变得困难。
我建议你把它分解成更小的函数,更容易测试。具有处理数据的小函数。然后有一个具有脚本业务逻辑的 main 函数。
我还将使用 Python 数据类来存储信息而不是列表。这将使它更具可读性。该数据类也将成为单一事实来源。然后,要获取其他列表,请使用提取所需信息的函数。
要获取唯一的字符串列表,可以使用 Python 功能从列表中删除重复项。set
这可能看起来像一个示例:
from dataclasses import dataclass
@dataclass
class UserInfo:
first_name: str
last_name: str
user_id: int
domain: str
email: str = ''
def __post_init__(self):
self.email = f"{self.first_name}.{self.last_name}@{self.domain}.com".casefold()
def check_id_valid(user_info):
if not len(str(user_info.user_id)) == 9:
print("User ID needs to be 9 digits")
return False
return True
def check_domain_valid(user_info):
valid_domain = ['gmail', 'outlook']
if user_info.domain not in valid_domain:
print(f'{user_info.domain} is not a valid domain')
return False
return True
def user_input():
# Keep looking round asking for information until it is the correct
# format or `done` is entered
while True:
input_string = input('enter first, last name, ID and email domain: ')
if input_string.casefold() == 'done':
return None
input_list = [field.strip() for field in input_string.split(',')]
if len(input_list) == 4 and input_list[2].isdigit():
user_info = UserInfo(first_name=input_list[0],
last_name=input_list[1],
user_id=int(input_list[2]),
domain=input_list[3])
if all((check_id_valid(user_info), check_domain_valid(user_info))):
return user_info
print('Enter the four fields separated by commas (,)')
def get_unique_emails(group_info):
unique_emails = set()
for user_info in group_info:
unique_emails.add(user_info.email)
return unique_emails
def get_domain(domain_name, group_info):
email_in_domian = set()
for user_info in group_info:
if domain_name == user_info.domain:
email_in_domian.add(user_info.email)
return email_in_domian
def create_camelcase_email(group_info):
unique_emails = set()
for user_info in group_info:
unique_emails.add((f'{user_info.first_name.title()}'
f'{user_info.last_name.title()}@'
f'{user_info.domain}.com'))
return unique_emails
def collect_input(max_user=5):
group_info = []
collect_data = True
while collect_data:
user_info = user_input()
if user_info:
group_info.append(user_info)
else:
collect_data = False
if len(group_info) == max_user:
collect_data = False
return group_info
def show_results(group_info):
print('\n+++++++++++ Results ++++++++++++')
print('mail list:', get_unique_emails(group_info))
for domain in ['gmail', 'outlook']:
emails = get_domain(domain, group_info)
if emails:
print(f'Group {domain}: {emails}')
print('CamelCase list: ', create_camelcase_email(group_info))
def main():
group_info = collect_input()
show_results(group_info)
if __name__ == '__main__':
main()
以下是我所做的测试的成绩单:
enter first, last name, ID and email domain: sam, smith, 123456789, gmail
enter first, last name, ID and email domain: sam, smith, 123456789, gmail
enter first, last name, ID and email domain: sam, smith, 123456789, gmail
enter first, last name, ID and email domain: jane, smith, 123456789, gmail
enter first, last name, ID and email domain: jane, smith, 123456789, outlook
+++++++++++ Results ++++++++++++
mail list: {'[email protected]', '[email protected]', '[email protected]'}
Group gmail: {'[email protected]', '[email protected]'}
Group outlook: {'[email protected]'}
CamelCase list: {'[email protected]', '[email protected]', '[email protected]'}
评论
if input_list[2].isdigit() and len(input_list[2]) == 9:
print('valid ID')
continue
.............删除continue