正则表达式删除空格和空行

Regex removes whitespace and empty line

提问人:Briedis 提问时间:1/5/2021 更新时间:1/5/2021 访问量:90

问:

我正在尝试创建一个程序,将文本文件破译为可读文本。编码文本遵循两个规则,它总是以元音或辅音开头,这表示为 C 或 V,后跟其索引位置(例如:V1C1C3,将是“abc”)

我创建了两个数组来包含元音和辅音(“0”位于索引 0 处,因此数组的其余部分可以从 index1 开始):

String[] vowels = {"0", "a", "A", "e", "E", "i", "I", "o", "O", "u", "U", "y", "Y"};
String[] cons = {"0","b", "B", "c", "C", "d", "D", "f", "F", "g", "G", "h", "H", "j", "J", "k", "K", "l", "L", "m", "M", "n", "N", "p", "P", "q", "Q", "r", "R", "s", "S", "t", "T", "v", "V", "w", "W", "x", "X", "z", "Z"};

我正在使用扫描仪将代码分成元音和辅音,并获取索引值:

while(scan.hasNext()){
    String[] parts = scan.nextLine().split("(?=[CV])");

    for (String part : parts) {
        Scanner num = new Scanner(part).useDelimiter("[^0-9]+");
        int value = num.nextInt();

        if(part.charAt(0) == 'C'){
           System.out.print(cons[value]);
        }
        else if (part.charAt(0) == 'V'){
           System.out.print(vowels[value]);
        }
    }
}

加密文本:V6 C17V7C33V3 V1C23C23C17V3C29(结果应该是:我爱苹果)

我得到的结果:Iloveapples

PS:如果我有多段加密文本,扫描仪会在第一个段落后停止并输出错误。如果我使用“scan.next()”而不是“scan.nextLine()”,则不会发生这种情况

则表达式 java.util.scanner

评论

1赞 AdrianHHH 1/5/2021
代码中没有任何内容可以检测和输出缺失的空格。
0赞 Briedis 1/5/2021
@AdrianHHH 那么,您是否建议我制作一个新的扫描仪来检查空格或在现有扫描仪中添加“\\s”?
0赞 user1442498 1/5/2021
您需要围绕零件循环的另一个循环。对空间执行拆分,然后将该输出馈送到部件循环中。在新循环的末尾打印出一个空格。或者,您可以使用流或正则表达式来重新映射您的角色并保留空间。

答:

0赞 Andreas 1/5/2021 #1

您似乎希望任何不是编码的部分保持不变,例如给定示例中的空格,但也可能是逗号、问号等符号。

这意味着使用正则表达式替换会更好,使用正则表达式来定位编码,即字母或后跟数字。CV

private static final String VOWELS = "aAeEiIoOuUyY";
private static final String CONSONANTS = "bBcCdDfFgGhHjJkKlLmMnNpPqQrRsStTvVwWxXzZ";
// Requires Java 9+
public static String decode(String encodedText) {
    return Pattern.compile("[CV]\\d+").matcher(encodedText)
            .replaceAll(r -> String.valueOf((r.group().charAt(0) == 'V' ? VOWELS : CONSONANTS)
                                            .charAt(Integer.parseInt(r.group().substring(1)) - 1)));
}
// For Java 1.4+
public static String decode(String encodedText) {
    StringBuffer buf = new StringBuffer();
    Matcher m = Pattern.compile("[CV]\\d+").matcher(encodedText);
    while (m.find()) {
        String token = m.group();
        int value = Integer.parseInt(token.substring(1));
        char ch = (token.charAt(0) == 'V' ? VOWELS : CONSONANTS).charAt(value - 1);
        m.appendReplacement(buf, String.valueOf(ch));
    }
    return m.appendTail(buf).toString();
}

测试

System.out.println(decode("V6 C17V7C33V3 V1C23C23C17V3C29"));
System.out.println(decode("C30C11V3 V1C29C15V3C5, \"C36C11V1C31 V1C27V3 V11V7V9 C5V7V5C21C9?\""));

输出

I love apples
She asked, "What are you doing?"

为了完整起见,下面是一个简单的未优化编码方法,用于创建上面的第二个示例:

// For Java 1.5+
public static String encode(String plainText) {
    int index;
    StringBuilder buf = new StringBuilder();
    for (int i = 0; i < plainText.length(); i++) {
        char ch = plainText.charAt(i);
        if ((index = VOWELS.indexOf(ch)) != -1)
            buf.append('V').append(index + 1);
        else if ((index = CONSONANTS.indexOf(ch)) != -1)
            buf.append('C').append(index + 1);
        else
            buf.append(ch);
    }
    return (buf.length() == plainText.length() ? plainText : buf.toString());
}
0赞 WJS 1/5/2021 #2

第一部分对一些测试短语进行编码。第二部分对它们进行解码,保留标点符号。

String[] vowels = { "0", "a", "A", "e", "E", "i", "I", "o",
        "O", "u", "U", "y", "Y" };
String[] cons = { "0", "b", "B", "c", "C", "d", "D", "f", "F",
        "g", "G", "h", "H", "j", "J", "k", "K", "l", "L", "m",
        "M", "n", "N", "p", "P", "q", "Q", "r", "R", "s", "S",
        "t", "T", "v", "V", "w", "W", "x", "X", "z", "Z" };

String c = String.join("", cons);
String v = String.join("", vowels);
    
String[] testData = { "I Love Apples\n",
        "To be or not to be that is the question!\n",
        "This also handles (parens) and `single` and \"double\" quotes\n" };
    
// this section iterates over the phrases and encodes them, storing the values in
// a StringBuilder
List<String> coded = new ArrayList<>();
for (String text : testData) {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < text.length(); i++) {
        char ch = text.charAt(i);
        int index = v.indexOf(ch);
        if (index >= 0) {
            sb.append("V").append(index);
            continue;
        }
        index = c.indexOf(ch);
        if (index >= 0) {
            sb.append("C").append(index);
            continue;
        }
        sb.append(ch);
    }
    coded.add(sb.toString());
}
    
for (String code : coded) {
    System.out.println(code);
}
// Now simply search on encoded values or a single character. Keep finding the
// pattern and converting back to a letter.  Punctuation is preserved.
String pat = "([CV]\\d+|.)";
for (String code : coded) {
    StringBuilder plain = new StringBuilder();
    Matcher m = Pattern.compile(pat).matcher(code);
    while (m.find()) {
        String group = m.group();
        if (group.charAt(0) == 'V') {
            plain.append(v.charAt(
                    Integer.valueOf(group.substring(1))));
        } else if (group.charAt(0) == 'C') {
            plain.append(c.charAt(
                    Integer.valueOf(group.substring(1))));
        } else {
            plain.append(group);
        }
    }
    
    System.out.println(plain);
}

上面首先打印编码的字符串,然后打印解码的字符串。

V6 C18V7C33V3 V2C23C23C17V3C29

C32V7 C1V3 V7C27 C21V7C31 C31V7 C1V3 C31C11V1C31 V5C29 C31C11V3 C25V9V3C29C31V5V
7C21!

C32C11V5C29 V1C17C29V7 C11V1C21C5C17V3C29 (C23V1C27V3C21C29) V1C21C5 `C29V5C21C9
C17V3` V1C21C5 "C5V7V9C1C17V3" C25V9V7C31V3C29

I Love Apples
To be or not to be that is the question!
This also handles (parens) and `single` and "double" quotes