Python 中的密码子生成

Codon generation in Python

提问人:arteaga_m 提问时间:9/11/2022 最后编辑:arteaga_m 更新时间:9/11/2022 访问量:785

问:

我有这个代码,用于将 DNA 字符串转换为密码子列表,然后将此列表转换为具有各自氨基酸的字符串。 但是,当我运行代码并且 DNA 字符串以一对核苷酸(例如 CT)而不是三联体结尾时,代码不会生成氨基酸序列。正如您在输出中看到的那样。

from collections import defaultdict
from collections import Counter


dna_sequence = "GAGCGTCTGCTCCGTGTATAAGCCACGTCGGAGCT"
codons = [dna_sequence[i:i+3]
      for i in range (0, len(dna_sequence), 3)]
      print(codons)

genetic_code = {
   "GCG" :"A","GCA" :"A","GCT" :"A","GCC" :"A",
   "AGG" :"R","AGA" :"R","CGG" :"R", "CGA" :"R","CGT" :"R","CGC" :"R",
   "AAT" :"N","AAC" :"N",
   "GAT" :"D", "GAC":"D", "TGT" :"C","TGC" :"C",
   "TGA" :"*","TAG" :"*","TAA" :"*", # * Stop codon
   "CAG" :"Q","CAA" :"Q",
   "GAG" :"E","GAA" :"E",
   "GGG" :"G","GGA" :"G","GGT" :"G","GGC" :"G",
   "CAT" :"H","CAC" :"H",
   "ATA" :"I","ATT" :"I","ATC" :"I",
   "TTG" :"L","TTA" :"L","CTG" : "L","CTA" :"L","CTT" :"L","CTC" :"L",
   "AAG" :"K","AAA" :"K",
   "ATG" :"M" , # Start codon
   "TTT" :"F" ,"TTC" :"F" ,
   "CCG" :"P" ,"CCA" :"P" ,"CCT" :"P" ,"CCC" :"P" ,
   "AGT" :"S" ,"AGC" :"S" ,"TCG" :"S" ,"TCA" :"S" ,"TCT" :"S" ,"TCC" :"S" ,
   "ACG" :"T" ,"ACA" :"T" ,"ACT" :"T" ,"ACC" :"T" ,
   "TGG" :"W" ,
   "TAT" :"Y" ,"TAC" :"Y" ,
   "GTG" :"V" ,"GTA" :"V" ,"GTT" :"V" ,"GTC" :"V" 


}
def codon_seq(seq):
tmpList = []
for i in range(0, len(seq) - 2, 3):
    if genetic_code [seq[i:i + 3]]:
        tmpList.append(seq[i:i + 3])
        print(tmpList)

def amino_seq(seq):
 protein =""
 if len(seq)%3 == 0:
  for i in range(0, len(seq), 3): 
   codon = seq[i:i + 3]
   protein+= genetic_code[codon]

  return protein

print("Aminoacids: ")
amino_seq(dna_sequence)

输出:

密码子: ['GAG', 'CGT', 'CTG', 'CTC', 'CGT', 'GTA', 'TAA', 'GCC', 'ACG', 'TCG', 'GAG', 'CT']

氨基酸: ''

我想找到一个解决方案,以便我可以使用字符串的最后两个核苷酸来预测下一个氨基酸,选择以这两个核苷酸开头的密码子 (genetic_code),然后选择一个。我该怎么做?请提供任何建议。

python-3.x 字符串 列表 DNA 序列

评论

0赞 Tom McLean 9/11/2022
您在线上的缩进是错误的。docs.python.org/3/tutorial/controlflow.html#if-statements docs.python.org/3/tutorial/controlflow.html#for-statementsif len(seq)%3 == 0: for i in range(0, len(seq), 3):
0赞 Tom McLean 9/11/2022
事实上,你有很多缩进错误 这段代码粘贴正确了吗?
0赞 arteaga_m 9/11/2022
@TomMcLean是的,我很抱歉。我正在尝试解决它

答:

0赞 tobias_k 9/11/2022 #1

您可以定义一个函数来获取以给定前缀和其中一个开头的所有密码子,如果当前密码子是 ,则使用该密码子。random_proteinrandom.choicelen!= 3

import random
def random_protein(prefix):
    codons = [c for c in genetic_code if c.startswith(prefix)]
    return genetic_code[random.choice(codons)]

dna_sequence = "GAGCGTCTGCTCCGTGTATAAGCCACGTCGGAGCT"
codons = [dna_sequence[i:i+3] for i in range (0, len(dna_sequence), 3)]
proteins = [genetic_code[c] if len(c) == 3 else random_protein(c) for c in codons]