有没有办法将数字单词转换为整数?

Is there a way to convert number words to Integers?

提问人: 提问时间:1/30/2009 最后编辑:Jonathan Leffler 更新时间:11/21/2023 访问量:154140

问:

我需要转换成、成等。one1two2

有没有办法用库或类或其他任何东西来做到这一点?

python 字符串 文本 整数

评论

3赞 tzot 1/31/2009
Смотритетакже: stackoverflow.com/questions/70161/...
0赞 alvas 10/26/2015
也许这会有所帮助:pastebin.com/WwFCjYtt
5赞 stackErr 3/31/2019
如果有人仍在寻找这个问题的答案,我从下面的所有答案中汲取灵感,并创建了一个 python 包: github.com/careless25/text2digits
1赞 Alejandro Alcalde 6/28/2019
我使用以下示例来开发和扩展此过程,但将其转换为西班牙语,以供将来参考:github.com/elbaulp/text2digits_es
1赞 Tomerikoo 8/10/2021
任何到达这里的人不是在寻找 Python 解决方案,这里是并行的 C# 问题:将单词(字符串)转换为 Int,这是 Java 问题:在 Java 中将单词转换为数字

答:

6赞 Jeff Bauer 1/30/2009 #1

以下是微不足道的案例方法:

>>> number = {'one':1,
...           'two':2,
...           'three':3,}
>>> 
>>> number['two']
2

或者你在寻找可以处理“一万二千,一百七十二”的东西?

评论

1赞 yeliabsalohcin 3/7/2023
这帮助了我,谢谢。当文本来自只有有限数量的文本数字选项的问卷之类的东西编码时,有用的答案。
3赞 Kena 1/30/2009 #2

如果您要解析的数字数量有限,可以很容易地将其硬编码到字典中。

对于稍微复杂一点的情况,您可能希望根据相对简单的数字语法自动生成此字典。类似的东西(当然,概括......

for i in range(10):
   myDict[30 + i] = "thirty-" + singleDigitsDict[i]

如果你需要更广泛的东西,那么看起来你需要自然语言处理工具。本文可能是一个很好的起点。

142赞 recursive 1/30/2009 #3

此代码的大部分内容是设置numwords dict,这仅在第一次调用时完成。

def text2int(textnum, numwords={}):
    if not numwords:
      units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
      ]

      tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

      scales = ["hundred", "thousand", "million", "billion", "trillion"]

      numwords["and"] = (1, 0)
      for idx, word in enumerate(units):    numwords[word] = (1, idx)
      for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
      for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

    current = result = 0
    for word in textnum.split():
        if word not in numwords:
          raise Exception("Illegal word: " + word)

        scale, increment = numwords[word]
        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current

print text2int("seven billion one hundred million thirty one thousand three hundred thirty seven")
#7100031337

评论

1赞 Nick Ruiz 5/13/2014
仅供参考,这不适用于日期。尝试:print text2int("nineteen ninety six") # 115
28赞 recursive 5/13/2014
将 1996 年写成单词数字的正确方式是“一千九百九十六”。如果要支持数年,则需要不同的代码。
0赞 dimid 3/6/2015
Marc Burns 的红宝石就是这样做的。我最近分叉了它以添加多年来的支持。您可以从 python 调用 ruby 代码
1赞 Harish Kayarohanam 2/26/2017
它打破了“一百零六”尝试.print(text2int(“百零六”)) ..还打印(text2int(“千”))
2赞 recursive 10/24/2019
“人们会期待什么”。我想不同的用户有不同的期望。就个人而言,我的是它不会用该输入调用它,因为它不是有效的数字。这是两个。
12赞 Jarret Hardie 3/1/2009 #4

我需要处理一些额外的解析情况,例如序数词(“first”、“second”)、连字符词(“one-hundred”)和连字符序数词(如“五十七”),所以我添加了几行:

def text2int(textnum, numwords={}):
    if not numwords:
        units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
        ]

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion"]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units):  numwords[word] = (1, idx)
        for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    textnum = textnum.replace('-', ' ')

    current = result = 0
    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]
        
         current = current * scale + increment
         if scale > 100:
            result += current
            current = 0

    return result + current`

评论

2赞 rohithpr 3/27/2016
注意:这将返回零,以此类推。用于获得 !hundredththousandthone hundredth100
1赞 Neil 9/6/2021
可变的默认参数是 antipattern
1赞 Dawa 4/22/2010 #5

进行了更改,以便 text2int(scale) 将返回正确的转换。例如,text2int(“hundred”) => 100。

import re

numwords = {}


def text2int(textnum):

    if not numwords:

        units = [ "zero", "one", "two", "three", "four", "five", "six",
                "seven", "eight", "nine", "ten", "eleven", "twelve",
                "thirteen", "fourteen", "fifteen", "sixteen", "seventeen",
                "eighteen", "nineteen"]

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", 
                "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion", 
                'quadrillion', 'quintillion', 'sexillion', 'septillion', 
                'octillion', 'nonillion', 'decillion' ]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 
            'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]
    current = result = 0
    tokens = re.split(r"[\s-]+", textnum)
    for word in tokens:
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]

        if scale > 1:
            current = max(1, current)

        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current

评论

0赞 recursive 4/28/2011
我认为 100 的正确英文拼写是“一百”。
0赞 Neil 12/30/2016
@recursive你说得完全正确,但这段代码的优点是它处理“百分之一”(也许这就是 Dawa 试图强调的)。从描述的声音来看,其他类似的代码需要“百分之一”,这并不总是常用的术语(例如“她挑选出第一百个要丢弃的项目”)
1赞 alukach 2/10/2014 #6

一个快速的解决方案是使用 inflect.py 生成用于翻译的字典。

inflect.py 有一个函数,可以将一个数字(例如)转换为它的单词形式(例如)。不幸的是,没有提供它的反向(这将允许您避免翻译词典路由)。尽管如此,您可以使用该函数来构建翻译词典:number_to_words()2'two'

>>> import inflect
>>> p = inflect.engine()
>>> word_to_number_mapping = {}
>>>
>>> for i in range(1, 100):
...     word_form = p.number_to_words(i)  # 1 -> 'one'
...     word_to_number_mapping[word_form] = i
...
>>> print word_to_number_mapping['one']
1
>>> print word_to_number_mapping['eleven']
11
>>> print word_to_number_mapping['forty-three']
43

如果您愿意投入一些时间,则可以检查函数的 inflect.py 内部工作原理并构建自己的代码来动态执行此操作(我还没有尝试这样做)。number_to_words()

1赞 dimid 3/6/2015 #7

Marc Burns 的红宝石就是这样做的。我最近分叉了它以添加多年来的支持。您可以从 python 调用 ruby 代码

  require 'numbers_in_words'
  require 'numbers_in_words/duck_punch'

  nums = ["fifteen sixteen", "eighty five sixteen",  "nineteen ninety six",
          "one hundred and seventy nine", "thirteen hundred", "nine thousand two hundred and ninety seven"]
  nums.each {|n| p n; p n.in_numbers}

结果:
"fifteen sixteen" 1516 "eighty five sixteen" 8516 "nineteen ninety six" 1996 "one hundred and seventy nine" 179 "thirteen hundred" 1300 "nine thousand two hundred and ninety seven" 9297

评论

0赞 yekta 10/10/2016
请不要从 python 调用 ruby 代码或从 ruby 调用 python 代码。它们离得很近,像这样的东西应该被移植过来。
1赞 dimid 10/10/2016
同意,但在移植之前,调用 ruby 代码总比没有好。
0赞 yekta 10/10/2016
它不是很复杂,下面@recursive提供了可以使用的逻辑(有几行代码)。
0赞 PascalVKooten 10/29/2016
实际上在我看来,“十五十六”是错的?
0赞 dimid 10/30/2016
@yekta 是的,我认为递归的答案在 SO 答案的范围内是好的。但是,Gem 提供了一个完整的软件包,其中包含测试和其他功能。无论如何,我认为两者都有其位置。
43赞 akshaynagpal 1/3/2016 #8

我刚刚向 PyPI 发布了一个名为 word2number 的 python 模块,用于确切目的。https://github.com/akshaynagpal/w2n

使用以下方法安装它:

pip install word2number

确保您的画中画已更新到最新版本。

用法:

from word2number import w2n

print w2n.word_to_num("two million three thousand nine hundred and eighty four")
2003984

评论

2赞 Ray 5/5/2016
尝试了您的包裹。建议处理如下字符串:或。w2n.word_to_num(“100 万”) 抛出错误。"1 million""1M"
1赞 akshaynagpal 5/5/2016
@Ray 感谢您的试用。您能否在 github.com/akshaynagpal/w2n/issues 提出问题。如果您愿意,您也可以做出贡献。否则,我一定会在下一个版本中查看此问题。再次感谢!
16赞 akshaynagpal 8/7/2016
Robert,开源软件就是人们协作改进它。我想要一个图书馆,看到人们也想要一个。所以做到了。它可能还没有为生产级系统做好准备,也不符合教科书上的流行语。但是,它适用于目的。此外,如果您可以提交 PR,以便为所有用户进一步改进,那就太好了。
0赞 S.Jackson 11/5/2020
它做计算吗?说:十九%五十七?或任何其他运算符,即 +、6、* 和 /
1赞 akshaynagpal 11/6/2020
截至目前,它还没有@S.Jackson。
16赞 Andrew 8/4/2016 #9

如果有人感兴趣,我破解了一个维护字符串其余部分的版本(尽管它可能有错误,但没有对其进行太多测试)。

def text2int (textnum, numwords={}):
    if not numwords:
        units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
        ]

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion"]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units):  numwords[word] = (1, idx)
        for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ""
    onnumber = False
    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                if onnumber:
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
            else:
                scale, increment = numwords[word]

                current = current * scale + increment
                if scale > 100:
                    result += current
                    current = 0
                onnumber = True

    if onnumber:
        curstring += repr(result + current)

    return curstring

例:

 >>> text2int("I want fifty five hot dogs for two hundred dollars.")
 I want 55 hot dogs for 200 dollars.

如果您有“200 美元”,可能会出现问题。但是,这真的很艰难。

评论

7赞 stackErr 3/31/2019
我从这里获取了这个和其他代码片段,并将其制作成一个 python 库: github.com/careless25/text2digits
-3赞 Shriram Jadhav 8/21/2017 #10

此代码仅适用于 99 以下的数字。word 到 int 和 int 到 word(其余的需要实现 10-20 行代码和简单的逻辑。这只是初学者的简单代码):

num = input("Enter the number you want to convert : ")
mydict = {'1': 'One', '2': 'Two', '3': 'Three', '4': 'Four', '5': 'Five','6': 'Six', '7': 'Seven', '8': 'Eight', '9': 'Nine', '10': 'Ten','11': 'Eleven', '12': 'Twelve', '13': 'Thirteen', '14': 'Fourteen', '15': 'Fifteen', '16': 'Sixteen', '17': 'Seventeen', '18': 'Eighteen', '19': 'Nineteen'}
mydict2 = ['', '', 'Twenty', 'Thirty', 'Fourty', 'fifty', 'sixty', 'Seventy', 'Eighty', 'Ninty']

if num.isdigit():
    if(int(num) < 20):
        print(" :---> " + mydict[num])
    else:
        var1 = int(num) % 10
        var2 = int(num) / 10
        print(" :---> " + mydict2[int(var2)] + mydict[str(var1)])
else:
    num = num.lower()
    dict_w = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11, 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': '17', 'eighteen': '18', 'nineteen': '19'}
    mydict2 = ['', '', 'twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninty']
    divide = num[num.find("ty")+2:]
    if num:
        if(num in dict_w.keys()):
            print(" :---> " + str(dict_w[num]))
        elif divide == '' :
            for i in range(0, len(mydict2)-1):
                if mydict2[i] == num:
                    print(" :---> " + str(i * 10))
        else :
            str3 = 0
            str1 = num[num.find("ty")+2:]
            str2 = num[:-len(str1)]
            for i in range(0, len(mydict2)):
                if mydict2[i] == str2:
                    str3 = i
            if str2 not in mydict2:
                print("----->Invalid Input<-----")                
            else:
                try:
                    print(" :---> " + str((str3*10) + dict_w[str1]))
                except:
                    print("----->Invalid Input<-----")
    else:
        print("----->Please Enter Input<-----")

评论

1赞 Luuklag 8/21/2017
请解释一下这段代码是做什么的,以及它是如何做到的。这样一来,你的答案对那些还不太了解编码的人来说更有价值。
0赞 Shriram Jadhav 12/6/2017
如果用户输入数字,程序将以单词形式返回,反之亦然,例如 5->5 和 5->5.program 适用于低于 100 的数字,但只需添加几行代码即可扩展到任何范围。
18赞 totalhack 11/21/2018 #11

我需要一些不同的东西,因为我的输入来自语音到文本的转换,而解决方案并不总是对数字求和。例如,“我的邮政编码是一、二、三、四五”不应转换为“我的邮政编码是 15”。

我采纳了安德鲁的答案,并对其进行了调整,以处理人们强调为错误的其他一些情况,并且还添加了对我上面提到的邮政编码等示例的支持。下面显示了一些基本的测试用例,但我相信仍有改进的余地。

def is_number(x):
    if type(x) == str:
        x = x.replace(',', '')
    try:
        float(x)
    except:
        return False
    return True

def text2int (textnum, numwords={}):
    units = [
        'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
        'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
        'sixteen', 'seventeen', 'eighteen', 'nineteen',
    ]
    tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
    scales = ['hundred', 'thousand', 'million', 'billion', 'trillion']
    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    if not numwords:
        numwords['and'] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ''
    onnumber = False
    lastunit = False
    lastscale = False

    def is_numword(x):
        if is_number(x):
            return True
        if word in numwords:
            return True
        return False

    def from_numword(x):
        if is_number(x):
            scale = 0
            increment = int(x.replace(',', ''))
            return scale, increment
        return numwords[x]

    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
            lastunit = False
            lastscale = False
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if (not is_numword(word)) or (word == 'and' and not lastscale):
                if onnumber:
                    # Flush the current number we are building
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
                lastunit = False
                lastscale = False
            else:
                scale, increment = from_numword(word)
                onnumber = True

                if lastunit and (word not in scales):                                                                                                                                                                                                                                         
                    # Assume this is part of a string of individual numbers to                                                                                                                                                                                                                
                    # be flushed, such as a zipcode "one two three four five"                                                                                                                                                                                                                 
                    curstring += repr(result + current)                                                                                                                                                                                                                                       
                    result = current = 0                                                                                                                                                                                                                                                      

                if scale > 1:                                                                                                                                                                                                                                                                 
                    current = max(1, current)                                                                                                                                                                                                                                                 

                current = current * scale + increment                                                                                                                                                                                                                                         
                if scale > 100:                                                                                                                                                                                                                                                               
                    result += current                                                                                                                                                                                                                                                         
                    current = 0                                                                                                                                                                                                                                                               

                lastscale = False                                                                                                                                                                                                              
                lastunit = False                                                                                                                                                
                if word in scales:                                                                                                                                                                                                             
                    lastscale = True                                                                                                                                                                                                         
                elif word in units:                                                                                                                                                                                                             
                    lastunit = True

    if onnumber:
        curstring += repr(result + current)

    return curstring

一些测试...

one two three -> 123
three forty five -> 345
three and forty five -> 3 and 45
three hundred and forty five -> 345
three hundred -> 300
twenty five hundred -> 2500
three thousand and six -> 3006
three thousand six -> 3006
nineteenth -> 19
twentieth -> 20
first -> 1
my zip is one two three four five -> my zip is 12345
nineteen ninety six -> 1996
fifty-seventh -> 57
one million -> 1000000
first hundred -> 100
I will buy the first thousand -> I will buy the 1000  # probably should leave ordinal in the string
thousand -> 1000
hundred and six -> 106
1 million -> 1000000

评论

2赞 stackErr 3/31/2019
我接受了你的回答并修复了一些错误。添加了对“二十个”-2010 >和所有十个的支持。你可以在这里找到它: github.com/careless25/text2digits
0赞 S.Jackson 11/5/2020
它做计算吗?说:十九%五十七?或任何其他运算符,即 +、6、* 和 /
0赞 totalhack 1/12/2021
@S.Jackson,它不做计算。如果你的文本片段是 python 中的有效方程式,我想你可以使用它首先转换为整数,然后是结果(假设你熟悉并熟悉其中的安全问题)。所以“10 + 5”变成“10 + 5”,然后给你 15。不过,这只能处理最简单的情况。没有浮点数,括号控制顺序,支持在语音转文本中说加号/减号/等。evaleval("10 + 5")
4赞 Abhishek Rawat 5/30/2020 #12

使用 Python 包:WordToDigits

pip install wordtodigits

它可以在句子中找到以单词形式存在的数字,然后将它们转换为正确的数字格式。还处理小数部分(如果存在)。数字表示这个词可以出现在经文的任何地方

0赞 whatapalaver 7/22/2020 #13

我采用了 @recursive 的逻辑,并转换为 Ruby。我还对查找表进行了硬编码,因此它不那么酷,但可能有助于新手了解正在发生的事情。

WORDNUMS = {"zero"=> [1,0], "one"=> [1,1], "two"=> [1,2], "three"=> [1,3],
            "four"=> [1,4], "five"=> [1,5], "six"=> [1,6], "seven"=> [1,7], 
            "eight"=> [1,8], "nine"=> [1,9], "ten"=> [1,10], 
            "eleven"=> [1,11], "twelve"=> [1,12], "thirteen"=> [1,13], 
            "fourteen"=> [1,14], "fifteen"=> [1,15], "sixteen"=> [1,16], 
            "seventeen"=> [1,17], "eighteen"=> [1,18], "nineteen"=> [1,19], 
            "twenty"=> [1,20], "thirty" => [1,30], "forty" => [1,40], 
            "fifty" => [1,50], "sixty" => [1,60], "seventy" => [1,70], 
            "eighty" => [1,80], "ninety" => [1,90],
            "hundred" => [100,0], "thousand" => [1000,0], 
            "million" => [1000000, 0]}

def text_2_int(string)
  numberWords = string.gsub('-', ' ').split(/ /) - %w{and}
  current = result = 0
  numberWords.each do |word|
    scale, increment = WORDNUMS[word]
    current = current * scale + increment
    if scale > 100
      result += current
      current = 0
    end
  end
  return result + current
end

我想处理像这样的字符串two thousand one hundred and forty-six

-2赞 WireData india 8/3/2020 #14

此代码适用于序列数据:

import pandas as pd
mylist = pd.Series(['one','two','three'])
mylist1 = []
for x in range(len(mylist)):
    mylist1.append(w2n.word_to_num(mylist[x]))
print(mylist1)

评论

0赞 Tomerikoo 8/10/2021
什么 ?它没有在任何地方定义w2n
6赞 hassan27sn 4/26/2021 #15
def parse_int(string):
    ONES = {'zero': 0,
            'one': 1,
            'two': 2,
            'three': 3,
            'four': 4,
            'five': 5,
            'six': 6,
            'seven': 7,
            'eight': 8,
            'nine': 9,
            'ten': 10,
            'eleven': 11,
            'twelve': 12,
            'thirteen': 13,
            'fourteen': 14,
            'fifteen': 15,
            'sixteen': 16,
            'seventeen': 17,
            'eighteen': 18,
            'nineteen': 19,
            'twenty': 20,
            'thirty': 30,
            'forty': 40,
            'fifty': 50,
            'sixty': 60,
            'seventy': 70,
            'eighty': 80,
            'ninety': 90,
              }

    numbers = []
    for token in string.replace('-', ' ').split(' '):
        if token in ONES:
            numbers.append(ONES[token])
        elif token == 'hundred':
            numbers[-1] *= 100
        elif token == 'thousand':
            numbers = [x * 1000 for x in numbers]
        elif token == 'million':
            numbers = [x * 1000000 for x in numbers]
    return sum(numbers)

使用 1 到 100 万范围内的 700 个随机数进行测试效果很好。

评论

0赞 Eric 3/14/2022
这不适用于数以亿计的数字。
0赞 Hemant Hegde 6/28/2021 #16

这处理了印度风格的单词中的数字、一些分数、数字和单词的组合以及加法。

def words_to_number(words):
    numbers = {"zero":0, "a":1, "half":0.5, "quarter":0.25, "one":1,"two":2,
               "three":3, "four":4,"five":5,"six":6,"seven":7,"eight":8,
               "nine":9, "ten":10,"eleven":11,"twelve":12, "thirteen":13,
               "fourteen":14, "fifteen":15,"sixteen":16,"seventeen":17,
               "eighteen":18,"nineteen":19, "twenty":20,"thirty":30, "forty":40,
               "fifty":50,"sixty":60,"seventy":70, "eighty":80,"ninety":90}

    groups = {"hundred":100, "thousand":1_000, 
              "lac":1_00_000, "lakh":1_00_000, 
              "million":1_000_000, "crore":10**7, 
              "billion":10**9, "trillion":10**12}
    
    split_at = ["and", "plus"]
    
    n = 0
    skip = False
    words_array = words.split(" ")
    for i, word in enumerate(words_array):
        if not skip:
            if word in groups:
                n*= groups[word]
            elif word in numbers:
                n += numbers[word]
            elif word in split_at:
                skip = True
                remaining = ' '.join(words_array[i+1:])
                n+=words_to_number(remaining)
            else:
                try:
                    n += float(word)
                except ValueError as e:
                    raise ValueError(f"Invalid word {word}") from e
    return n

测试:

print(words_to_number("a million and one"))
>> 1000001

print(words_to_number("one crore and one"))
>> 1000,0001

print(words_to_number("0.5 million one"))
>> 500001.0

print(words_to_number("half million and one hundred"))
>> 500100.0

print(words_to_number("quarter"))
>> 0.25

print(words_to_number("one hundred plus one"))
>> 101

评论

0赞 Hemant Hegde 6/28/2021
我又做了一些测试,“一千七百”=1700“一千七百”=1700,但是“一千七百”=(一千七)百=1007*100=100700。从技术上讲,说“一千七百”而不是“一千七百”是错误的吗?!
-1赞 user20549697 11/20/2022 #17

我发现我更快的方法:

Da_Unità_a_Cifre = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11,
 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': 17, 'eighteen': 18, 'nineteen': 19}

Da_Lettere_a_Decine = {"tw": 20, "th": 30, "fo": 40, "fi": 50, "si": 60, "se": 70, "ei": 80, "ni": 90, }

elemento = input("insert the word:")
Val_Num = 0
try:
    elemento.lower()
    elemento.strip()
    Unità = elemento[elemento.find("ty")+2:] # è uguale alla str: five

    if elemento[-1] == "y":
        Val_Num = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
        print(Val_Num)
    elif elemento == "onehundred":
        Val_Num = 100
        print(Val_Num)
    else:
        Cifre_Unità = int(Da_Unità_a_Cifre[Unità])
        Cifre_Decine = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
        Val_Num = int(Cifre_Decine + Cifre_Unità)
        print(Val_Num)
except:
    print("invalid input")
1赞 Harshit Dalal 10/16/2023 #18

我一直在寻找一个库来帮助我支持上述所有情况以及更多边缘情况,如序数(第一个、第二个)、更大的数字、运算符等,我找到了这个数字字数到 nums

您可以通过以下方式安装

pip install numwords_to_nums

下面是一个基本示例

from numwords_to_nums.numwords_to_nums import NumWordsToNum
num = NumWordsToNum()
   
result = num.numerical_words_to_numbers("twenty ten and twenty one")
print(result)  # Output: 2010 and 21
   
eval_result = num.evaluate('Hey calculate 2+5')
print(eval_result) # Output: 7

result = num.numerical_words_to_numbers('first')
print(result) # Output: 1st
0赞 cruz0e 11/15/2023 #19

这是一个很酷的解决方案,所以我从他们的回答中获取了 @recursive 的 Python 代码,并在 ChatGPT 的帮助下将其转换为 C#,并对其进行了简化、格式化,并使其更加紧凑。

是的,我不得不给 ChatGPT 一大堆指令。我花了一段时间,但就在这里。

我相信它更清晰、更容易理解这段代码以及算法是如何工作的:

public class Parser
{
    public static int ParseInt(string s)
    {
        Dictionary<string, (int scale, int increment)> numwords = new Dictionary<string, (int, int)>
        {
            {"and", (1, 0)}, {"zero", (1, 0)}, {"one", (1, 1)}, {"two", (1, 2)}, {"three", (1, 3)},
            {"four", (1, 4)}, {"five", (1, 5)}, {"six", (1, 6)}, {"seven", (1, 7)}, {"eight", (1, 8)},
            {"nine", (1, 9)}, {"ten", (1, 10)}, {"eleven", (1, 11)}, {"twelve", (1, 12)}, {"thirteen", (1, 13)},
            {"fourteen", (1, 14)}, {"fifteen", (1, 15)}, {"sixteen", (1, 16)}, {"seventeen", (1, 17)}, {"eighteen", (1, 18)},
            {"nineteen", (1, 19)}, {"twenty", (1, 20)}, {"thirty", (1, 30)}, {"forty", (1, 40)}, {"fifty", (1, 50)},
            {"sixty", (1, 60)}, {"seventy", (1, 70)}, {"eighty", (1, 80)}, {"ninety", (1, 90)}, {"hundred", (100, 0)},
            {"thousand", (1000, 0)}, {"million", (1000000, 0)}, {"billion", (1000000000, 0)}
        };

        int current = 0;
        int result = 0;

        foreach (string word in s.Replace("-", " ").Split())
        {
            var (scale, increment) = numwords[word];

            current = current * scale + increment;

            if (scale > 100)
            {
                result += current;
                current = 0;
            }
        }

        return result + current;
    }
}