提问人:user6048670 提问时间:8/10/2016 最后编辑:Tyler Durdenuser6048670 更新时间:1/19/2020 访问量:3175
可以替换的最小子字符串,使字符串的每个字符数相同
Smallest substring that can be replaced to make the string have the same number of each character
问:
我正在尝试解决一个几乎完全相同的问题。特别是,我得到了一个字符串,使得 each 是 、 或 之一。我想找到可以替换的最小子字符串,以便每个 , , 和 出现的时间都准确无误。s
s.Length % 4 == 0
s[i]
'A'
'C'
'T'
'G'
'A'
'C'
'T'
'G'
s.Length / 4
例如,使用 ,一个最佳解决方案是将子字符串替换为 ,结果为 。s="GAAATAAA"
"AAATA"
"TTCCG"
"GTTCCGAA"
我已经在下面的评论中描述了我的方法,我想知道它是否在基因上是正确的,因为它会让我得到正确的答案。
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
class Solution
{
static string ReplacementForSteadiness(string s)
{
var counter = new Dictionary<char,int>() {
{ 'A', 0 }, { 'C', 0 }, { 'G', 0 }, { 'T', 0 }
};
for(int i = 0; i < s.Length; ++i)
counter[s[i]] += 1;
int div = s.Length / 4;
var pairs = counter.ToList();
if(pairs.All(p => p.Value == div))
return "";
// If here, that means there is an even count of characters in s. For example, if
// s = "AAATGTTCTTGCGGGG", then counter = { A -> 3, T -> 5, C -> 2, G -> 6 },
// div = 4, and we know that we need to increase the number of As by 1, decrease
// the number of Ts by 1, increase the number of Cs by 2 and decrease the number
// of Gs by 2.
// The smallest strings to replace will have 1 T and 2 Gs, to be replaced with 1 A and
// 2 Cs (The order of characters in the replacement string doesn't matter).
// "TGG" --> "ACC"
// "GTG" --> "ACC"
// "GGT" --> "ACC"
// None of those strings exist in s. The next smallest strings that could be replaced
// would have 1 T and 3Gs, to be replaced with 1 A and 2 of the Gs to be replaced with
// Cs. Or, 2 Ts and 2Gs, 1 of the Ts to be replaced by an A and both the Gs to be replaced
// by Cs.
// "TGGG" --> "AGCC"
// "GTGG" --> "AGCC"
// "GGTG" --> "AGCC"
// "GGGT" --> "AGCC"
// "TTGG" --> "ATCC"
// "TGTG" --> "ATCC"
// "GTGT" --> "ATCC"
// "GGTT" --> "ATCC"
// None of those strings exist in s. Etc.
string r;
// ...
return r;
}
static void Main(String[] args)
{
Console.ReadLine(); // n
string str = Console.ReadLine();
string replacement = ReplacementForSteadiness(str);
Console.WriteLine(replacement.Length);
}
}
答:
如果字符串已经有一组平衡的字符,那么你就完成了,不需要做任何事情。
否则,您始终可以通过替换最小值的零字符来解决问题。您可以通过添加缺少的任何字符来执行此操作。例如,以测试用例为例:
嘎嘎嘎嘎
出现次数最多的字符是 A 和 6。您需要 5 个额外的 G、5 个额外的 T 和 6 个额外的 C。因此,将一个 A 替换为所需的字符,包括 A 本身:
嘎
由于原来的 A 被替换为 A,因此您实际上替换了零个字符,这是可能的最小值。
评论
我认为你的解决方案会起作用,但它的复杂性太高了。
这是一个替代解决方案
:如果计算字符串中的字符返回 { 'A', 4 }, { 'C', 6 }, { 'G', 6 }, { 'T', 4 } 必须以 C 或 G 开头,以 C 或 G 结尾且长度为 >= 2
的子字符串 因此,我们需要做的是获取验证这些条件的每个字符串,测试它是否包含“坏字符”,在我们的例子中是一个 C 和一个 G。如果它的长度 = 2,我们赢了,否则我们保存一个临时变量并继续我们的测试
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
class Solution
{
static void Main(String[] args)
{
string[] inputs = { "GAAATAAA", "CACCGCTACCGC", "CAGCTAGC", "AAAAAAAA", "GAAAAAAA", "GATGAATAACCA", "ACGT" };
List<string> replacement = new List<string>();
foreach (var item in inputs)
{
replacement.Add(StringThatHasToBeReplaced(item));
}
}
static string StringThatHasToBeReplaced(string s)
{
var counter = new Dictionary<char, int>() {
{ 'A', 0 }, { 'C', 0 }, { 'G', 0 }, { 'T', 0 }
};
for (int i = 0; i < s.Length; ++i)
counter[s[i]] += 1;
int div = s.Length / 4;
var pairs = counter.ToList();
if (pairs.Where(p => p.Value != div).Count() == 0)
{
return null;
}
List<char> surplusCharacter = pairs.Where(p => p.Value > div).Select(p => p.Key).ToList();
int minLength = pairs.Where(p => p.Value > div).Sum(p => p.Value - div);
string result = s;
for (int i = 0; i < s.Length - minLength + 1; i++) // i is the start index
{
if (surplusCharacter.Contains(s[i]))
{
if (minLength == 1)
return s[i].ToString();
for (int j = i + minLength - 1; j < s.Length; j++) // j is the end index
{
if (surplusCharacter.Contains(s[j]))
{
var substring = s.Substring(i, j - i);
if (substring.Length >= result.Length)
{
break;
}
// we test if substring can be the string that need to be replaced
var isValid = true;
foreach (var c in surplusCharacter)
{
if (substring.Count(f => f == c) < counter[c] - div)
{
isValid = false;
break;
}
}
if (isValid)
result = substring;
}
}
}
}
return result;
}
}
我做了一些修改来处理边缘情况。
这是一些测试样本,我得到的结果看起来不错
评论
思潮?对不起,混乱的代码 + python 解决方案。我最初是在手机上写的,感觉很懒。
import re
from itertools import permutations
def find_min(s):
freq = {ch:0 for ch in 'ATGC'}
for ch in s:
freq[ch] += 1
desired_len = int(len(s)/4)
fixes = {ch:desired_len-freq[ch] for ch in 'ATGC'}
replacement = ''
for ch in fixes:
adj = fixes[ch]
if adj < 0:
replacement += ch*(-1*adj)
perms = set(permutations(replacement))
m = len(s)
to_replace = ''
for rep in perms:
regex = '.*?'.join([ch for ch in rep])
finds = re.findall(regex,s)
if finds:
x = sorted(finds, key=lambda x:len(x))[0]
if m >= len(x):
m = len(x)
to_replace = x
print_replacement(s, to_replace, fixes)
def print_replacement(inp, to_replace, fixes):
replacement = ''
for ch in to_replace:
if fixes[ch] > 0:
replacement += ch
for ch in fixes:
if fixes[ch] > 0:
replacement += ch*fixes[ch]
print('{0}\t\t- Replace {1} with {2} (min length: {3})'.format(inp ,to_replace, replacement, len(replacement)))
def main():
inputs = ["GAAATAAA", "CACCGCTACCGC", "CAGCTAGC", "AAAAAAAA", "GAAAAAAA", "GATGAATAACCA", "ACGT"]
for inp in inputs:
find_min(inp)
if __name__ == '__main__':
main()
感谢@AnotherGeek的测试输入!这是输出。
GAAATAAA - Replace AAATA with TCCGT (min length: 5)
CACCGCTACCGC - Replace CACCGC with AGAGTT (min length: 6)
CAGCTAGC - Replace C with T (min length: 1)
AAAAAAAA - Replace AAAAAA with CCGGTT (min length: 6)
GAAAAAAA - Replace AAAAA with CCGTT (min length: 5)
GATGAATAACCA - Replace ATGAA with TGCGT (min length: 5)
ACGT - Replace with (min length: 0)
我意识到这非常低效。有什么改进建议吗?
public int balancedString(String s) {
int[] count = new int[128];
int n = s.length(), res = n, i = 0, k = n / 4;
for (int j = 0; j < n; ++j) {
++count[s.charAt(j)];
}
for (int j = 0; j < n; ++j) {
--count[s.charAt(j)];
while (i < n && count['A'] <= k && count['C'] <= k && count['T'] <= k && count['G'] <= k) {
res = Math.min(res, j - i + 1);
++count[s.charAt(i++)];
}
}
return res;
}
评论