提问人:Kaley 提问时间:11/6/2023 最后编辑:Kaley 更新时间:11/10/2023 访问量:60
CS50 第 6 周 DNA 程序错误地识别 DNA 序列
cs50 week 6 dna program incorrectly identifies dna sequence
问:
我的代码有点工作,除了它在工作内容上有选择性。它为特定序列提供了正确的名称,但对于其他序列,它会搞砸。
例如,它将正确识别一条链属于 Bob,但会将假定的“不匹配”链与“Charlie”进行匹配,后者甚至不存在于 cs50 提供给我们的列表中。
这真的很奇怪,我已经将我的代码与其他人进行了对比检查,他们似乎大多相似。不知道为什么会这样,希望能得到一些帮助。
import csv
import sys
def main():
# TODO: Check for command-line usage
if len(sys.argv) != 3:
sys.exit("Usage: python dna.py data.csv sequence.txt")
# TODO: Read database file into a variable
database = []
with open(sys.argv[1], 'r') as file:
reader = csv.DictReader(file)
for row in reader:
database.append(row)
# TODO: Read DNA sequence file into a variable
with open(sys.argv[2], 'r') as file:
dna_sequence = file.read()
# TODO: Find longest match of each STR in DNA sequence
subsequences = list(database[0].keys())[1:]
results = {}
for subsequence in subsequences:
match = 0
results[subsequence] = longest_match(dna_sequence, subsequence)
match += 1
# TODO: Check database for matching profiles
for person in database:
for subsequence in subsequences:
if int(person[subsequence]) == results[subsequence]:
match += 1
if match == len(subsequence):
print(person["name"])
return
print("No match")
return
def longest_match(sequence, subsequence):
"""Returns length of longest run of subsequence in sequence."""
# Initialize variables
longest_run = 0
subsequence_length = len(subsequence)
sequence_length = len(sequence)
# Check each character in sequence for most consecutive runs of subsequence
for i in range(sequence_length):
# Initialize count of consecutive runs
count = 0
# Check for a subsequence match in a "substring" (a subset of characters) within
#sequence
# If a match, move substring to next potential match in sequence
# Continue moving substring and checking for matches until out of consecutive matches
while True:
# Adjust substring start and end
start = i + count * subsequence_length
end = start + subsequence_length
# If there is a match in the substring
if sequence[start:end] == subsequence:
count += 1
# If there is no match in the substring
else:
break
# Update most consecutive matches found
longest_run = max(longest_run, count)
# After checking for runs at each character in seqeuence, return longest run found
return longest_run
main()
答:
0赞
kcw78
11/10/2023
#1
你还在努力吗?如果是这样,则有 2 个数据库和 20 个序列需要测试。(它们在 DNA PSET 的末尾列出了正确答案。哪一个给你上面的错误?我怀疑这是第三次测试。它显示以 .您的程序应输出 .python dna.py databases/small.csv sequences/3.txt
No match
当我这样做时,您的程序输出而不是 .
您需要检查的子序列包括: 您的子序列计数为:
这与 small.csv 文件中的任何人不匹配。
查理很接近,但他的DNA亚序列计数是:Charlie
No match
['AGATC', 'AATG', 'TATC']
{'AGATC': 3, 'AATG': 3, 'TATC': 5}
('AGATC', '3'), ('AATG', '2'), ('TATC', '5')
当您将每个人与子序列计数进行比较时,会发生错误。有 3 件事需要解决:
- 的值是在上一个循环 () 中设置的。In 需要在循环中。
match
for subsequence in subsequences:
for person in database:
- 需要修改要测试的缩进。(这是在第二个 for 循环中。
match
subsequence in subsequences:
- 您正在针对 进行测试。想想吧。。。。
match
len(subsequence)
我进行了这些更改,它适用于所有 4 个测试和我尝试过的 3 个测试。small.csv
large.csv
评论