查找 DNA 序列之间最长的公共子串 (Python)

Finding longest common substring between DNA sequences (Python)

提问人:Anton Holt 提问时间:10/6/2023 更新时间:10/6/2023 访问量:30

问:

我正在研究 rosalind.info (https://rosalind.info/problems/lcsm/) 的“寻找共享主题”问题。此问题要求您在长度为 ~1000 个核苷酸的 <100 个 DNA 字符串的集合中找到最长的公共子串。我编写的代码似乎在我可以自己生成的任何测试数据上返回正确的结果,但似乎始终未能通过 rosalin 的实际检查。我不确定实际问题是什么,我认为在反复尝试通过编辑单行来修复它之后,我的代码最终变得有些混乱。我总共使用了三个函数来解决问题,我怀疑问题出在“检查”功能上。此外,final 函数返回“None”,而不是 newkeys[-1] 的预期值

def readfile(filepath):
    with open(filepath,'r') as f:
        return [l.strip() for l in f.readlines()]

def check(key,lst):
    while True:
        for line in lst:
            if key not in line:
                return False
        else:
            return True


def Sharedmotif(filepath):
    Fastafile = readfile(filepath)
    Fastadict = {}
    Fastalabel = ''
    for line in Fastafile:
        if '>' in line:
            Fastalabel = line
            Fastadict[Fastalabel] = ''
        else:
            Fastadict[Fastalabel] += line
    Fastalist = list(Fastadict.values())
    #Actual solution
    alphabet=['A','C','T','G']
    tuples=list(itertools.combinations_with_replacement(alphabet,3))
    keys=[''.join(item) for item in tuples]
    newkeys=[]
    merged_list = []
    i=1
    for i in range(100):
        for key in keys:
            if check(key,Fastalist):
                newkeys.append(key)
                newkeys=newkeys[-4:]
        for key in newkeys:
            Tadd=[key+'T']
            Cadd=[key+'C']
            Aadd=[key+'A']
            Gadd=[key+'G']
            allkeys=list(Tadd+Cadd+Aadd+Gadd)
            for minis in allkeys:
                merged_list.append(minis)
        keys=merged_list
        length=len(newkeys[-1])
        i+=1
        print(newkeys)
        if i>=length:
            return newkeys[-1]`

Python 函数 循环子 字符串 Rosalind

评论


答: 暂无答案