获取 RuntimeError:生成器在使用 nltk 模块生成二元组时引发了 StopIteration

Getting RuntimeError: generator raised StopIteration while generating bigrams using nltk module

提问人:gmohor21 提问时间:8/26/2023 最后编辑:gmohor21 更新时间:8/26/2023 访问量:33

问:

我正在尝试使用生成二元组,但出现错误。如何针对我的特定问题修复此错误?nltk.ngramsRuntimeError: generator raised StopIteration

我的数据帧有多列,其中只有两列是我感兴趣的,即 和 。在生成 unigram 和 bigram 之前,这两列看起来像这样作为标记。dfFonctionsStagiaireExigencesParticulieres

FonctionsStagiaire ExigencesParticulieres
[2022 年秋季,人力资源、培训、国际... 【必》《须》《技》《术》《什》《么》《你》《需》《要》《的》《事》《...
[具体来说,寻找,软件,工程师,r... [对于,游戏,机制,核心,引擎,工具,li...
【》《怎》《么》《做》《你》《值》《在》《一》《个》《职》《业》《上》《... [学士、学位、学生、电气、工程学...
[您的贡献,报告,可靠性,s... [流利、英语、口语、技能、能力、游泳......

输出df[['FonctionsStagiaire', 'ExigencesParticulieres']].head(4).to_dict()

{'FonctionsStagiaire': {21: ['fall',
   '2022',
   'human',
   'resources',
   'training',
   'internship',
   'remote',
   'mea01229',
   'what',
   'do',
   'you',
   'value',
   'in',
   'a',
   'career',
   'at',
   'agnico',
   'eagle',
   'values',
   'never',
   'waver',
   'we',
   'believe',
   'trust',
   'respect',
   'equality',
   'family',
   'responsibility',
   'why',
   'because',
   'express',
   'helped',
   'us',
   'succeed',
   'business',
   '60',
   'years',
   'about',
   'meadowbank',
   'our',
   'nunavut',
   'operations',
   'agnico',
   'eagle',
   'always',
   'looking',
   'new',
   'talented',
   'team',
   'members',
   'join',
   'nunavut',
   'mining',
   'operations',
   'we',
   'operating',
   'meadowbank',
   'first',
   'low',
   'arctic',
   'mine',
   'near',
   'baker',
   'lake',
   'nine',
   'years',
   'the',
   'mine',
   'produced',
   'three',
   'millionth',
   'ounce',
   'gold',
   '2018',
   '2019',
   'marked',
   'last',
   'year',
   'production',
   'meadowbank',
   'mine',
   'since',
   'transitioned',
   'process',
   'ore',
   'amaruq',
   'satellite',
   'deposit',
   'with',
   'official',
   'opening',
   'amaruq',
   'whale',
   'tail',
   'project',
   'august',
   '2019',
   'project',
   'referred',
   'meadowbank',
   'complex',
   'your',
   'contribution',
   'reporting',
   'training',
   'coordinator',
   'training',
   'intern',
   'part',
   'people',
   'development',
   'department',
   'collaborates',
   'departments',
   'mine',
   'shehe',
   'ensure',
   'goals',
   'objectives',
   'achieved',
   'promoting',
   'respecting',
   'agnico',
   "eagle's",
   'culture',
   'health',
   'safety',
   'code',
   'conduct',
   'environment',
   'coordinate',
   'manage',
   'projects',
   'related',
   'training',
   'northern',
   'mining',
   'environment',
   'develop',
   'modify',
   'training',
   'content',
   'assist',
   'planning',
   'tracking',
   'training',
   'activities',
   'develop',
   'maintain',
   'effective',
   'training',
   'materials',
   'assist',
   'management',
   'elearning',
   'training',
   'platform',
   'participate',
   'creation',
   'training',
   'practices',
   'procedures',
   'your',
   'work',
   'schedule',
   'schedule',
   '14',
   'days',
   'work',
   '12',
   'hour',
   'shifts',
   'followed',
   '14',
   'days',
   'transportation',
   'rest',
   'flights',
   'departing',
   'communities',
   'kivalliq',
   'region',
   'mirabel',
   "vald'or",
   'quebec',
   'travel',
   'room',
   'board',
   'provided',
   'agnico',
   'eagle',
   'to',
   'apply',
   'position',
   'please',
   'use',
   'following',
   'url',
   'httpsars2equestcomresponse_id1cf099f2f50a501d123d332ae1931084'],
  22: ['fall',
   '2022',
   'mechanical',
   'engineering',
   'mobile',
   'maintenance',
   'planner',
   'intern',
   'remote',
   'mea01233',
   'your',
   'contribution',
   'reporting',
   'reliability',
   'specialist',
   'part',
   'maintenance',
   'department',
   'collaborates',
   'departments',
   'mine',
   'heshe',
   'ensure',
   'goals',
   'objectives',
   'achieved',
   'promoting',
   'respecting',
   'agnico',
   "eagle's",
   'culture',
   'health',
   'safety',
   'code',
   'conduct',
   'environment',
   'your',
   'task',
   'be',
   'optimize',
   'implement',
   'preventive',
   'maintenance',
   'plans',
   'monitor',
   'oil',
   'samples',
   'fleet',
   'schedule',
   'preventive',
   'replacement',
   'benchmarked',
   'components',
   'update',
   'prediction',
   'report',
   'be',
   'responsible',
   'various',
   'optimization',
   'projects',
   'maintenance',
   'department',
   'monitor',
   '23',
   'new',
   'long',
   'haul',
   'trucks',
   'mine',
   'acquired',
   'support',
   'reliability',
   'specialist',
   'different',
   'tasks',
   'your',
   'work',
   'schedule',
   'schedule',
   '14',
   'days',
   'work',
   'followed',
   '14',
   'days',
   'transportation',
   'rest',
   'flights',
   'departing',
   'communities',
   'kivalliq',
   'region',
   'mirabel',
   "vald'or",
   'quebec',
   'to',
   'apply',
   'position',
   'please',
   'use',
   'following',
   'url',
   'httpsars2equestcomresponse_ide5baa1f06dae8d5dcf05e2b228680085'],
  23: ['fall',
   '2022',
   'mine',
   'engineering',
   'drill',
   'blast',
   'internship',
   'remote',
   'mea01235',
   'your',
   'contribution',
   'reporting',
   'production',
   'engineering',
   'coordinator',
   'production',
   'engineering',
   'intern',
   'part',
   'engineering',
   'department',
   'collaborates',
   'departments',
   'mine',
   'shehe',
   'ensure',
   'goals',
   'objectives',
   'achieved',
   'promoting',
   'respecting',
   'agnico',
   "eagle's",
   'culture',
   'health',
   'safety',
   'code',
   'conduct',
   'environment',
   'there',
   'two',
   'primary',
   'tasks',
   'engineering',
   'intern',
   'the',
   'first',
   'task',
   'position',
   'fulfilling',
   'quality',
   'assurancequality',
   'control',
   'qaqc',
   'duties',
   'drill',
   'blast',
   'by',
   'performing',
   'qaqc',
   'drill',
   'blast',
   'patterns',
   'field',
   'henceforth',
   'referred',
   'qaqc',
   'drilling',
   'loading',
   'collecting',
   'compiling',
   'qaqc',
   'data',
   'communicating',
   'qaqc',
   'data',
   'engineering',
   'team',
   'the',
   'second',
   'task',
   'position',
   'performing',
   'fragmentation',
   'analysis',
   'drill',
   'blast',
   'engineers',
   'by',
   'taking',
   'pictures',
   'muck',
   'faces',
   'field',
   'performing',
   'split',
   'desktop',
   'analysis',
   'pictures',
   'communicating',
   'fragmentation',
   'results',
   'engineering',
   'team',
   'primary',
   'duties',
   'tracking',
   'progression',
   'drilling',
   'mucking',
   'morning',
   'meeting',
   'ensure',
   'priorities',
   'qaqc',
   'fragmentation',
   'analysis',
   'met',
   'quality',
   'assurancequality',
   'control',
   'drilling',
   'loading',
   'practices',
   'field',
   'fragmentation',
   'analysis',
   'active',
   'mucking',
   'faces',
   'promote',
   'health',
   'safety',
   'participating',
   'monthly',
   'departmental',
   'hs',
   'meeting',
   'secondary',
   'duties',
   'provide',
   'technical',
   'support',
   'drill',
   'blast',
   'engineer',
   'blast',
   'optimization',
   'project',
   'proposals',
   'or',
   'needsinterests',
   'floor',
   'analysis',
   'muck',
   'floor',
   'water',
   'presence',
   'drill',
   'patterns',
   'loading',
   'statistics',
   'powder',
   'factor',
   'analysis',
   'provide',
   'relief',
   'support',
   'mine',
   'clerk',
   'vacations',
   'special',
   'projects',
   'according',
   'needs',
   'engineering',
   'mine',
   'department',
   'providing',
   'relief',
   'support',
   'engineering',
   'team',
   'vacations',
   'drill',
   'pattern',
   'design',
   'blast',
   'timing',
   'design',
   'your',
   'work',
   'schedule',
   'schedule',
   '14',
   'days',
   'work',
   'followed',
   '14',
   'days',
   'transportation',
   'rest',
   'flights',
   'departing',
   'communities',
   'kivalliq',
   'region',
   'mirabel',
   "vald'or",
   'quebec',
   'to',
   'apply',
   'position',
   'please',
   'use',
   'following',
   'url',
   'httpsars2equestcomresponse_id0fc30561bcc0715fe84e7690663f1bc8'],
  24: ['what',
   'do',
   'you',
   'value',
   'in',
   'a',
   'career',
   'at',
   'agnico',
   'eagle',
   'values',
   'never',
   'waver',
   'we',
   'believe',
   'trust',
   'respect',
   'equality',
   'family',
   'responsibility',
   'why',
   'because',
   'express',
   'helped',
   'us',
   'succeed',
   'business',
   '60',
   'years',
   'about',
   'meadowbank',
   'our',
   'nunavut',
   'operations',
   'agnico',
   'eagle',
   'always',
   'looking',
   'new',
   'talented',
   'team',
   'members',
   'join',
   'nunavut',
   'mining',
   'operations',
   'we',
   'operating',
   'meadowbank',
   'first',
   'low',
   'arctic',
   'mine',
   'near',
   'baker',
   'lake',
   'nine',
   'years',
   'the',
   'mine',
   'produced',
   'three',
   'millionth',
   'ounce',
   'gold',
   '2018',
   '2019',
   'marked',
   'last',
   'year',
   'production',
   'meadowbank',
   'mine',
   'since',
   'transitioned',
   'process',
   'ore',
   'amaruq',
   'satellite',
   'deposit',
   'with',
   'official',
   'opening',
   'amaruq',
   'whale',
   'tail',
   'project',
   'august',
   '2019',
   'project',
   'referred',
   'meadowbank',
   'complex',
   'your',
   'contribution',
   'reporting',
   'senior',
   'grade',
   'control',
   'technician',
   'geology',
   'intern',
   'part',
   'mine',
   'geology',
   'department',
   'collaborates',
   'departments',
   'mine',
   'shehe',
   'ensure',
   'goals',
   'objectives',
   'achieved',
   'promoting',
   'respecting',
   'agnico',
   "eagle's",
   'culture',
   'health',
   'safety',
   'code',
   'conduct',
   'environment',
   'drill',
   'blast',
   'excavation',
   'monitoring',
   'regards',
   'mine',
   'geology',
   'grade',
   'control',
   'standards',
   'qaqc',
   'field',
   'regular',
   'audits',
   'quality',
   'sampling',
   'layout',
   'ore',
   'packets',
   'blasted',
   'muck',
   'define',
   'different',
   'ore',
   'zones',
   'mining',
   'daily',
   'sample',
   'collection',
   'mine',
   'shipment',
   'lab',
   'monitoring',
   'any',
   'tasks',
   'senior',
   'grade',
   'control',
   'andor',
   'production',
   'geologist',
   'might',
   'identify',
   'position',
   'approximately',
   '90',
   'field',
   'work',
   '10',
   'office',
   'work',
   'your',
   'work',
   'schedule',
   'schedule',
   '14',
   'days',
   'work',
   'followed',
   '14',
   'days',
   'transportation',
   'rest',
   'flights',
   'departing',
   'communities',
   'kivalliq',
   'region',
   'mirabel',
   "vald'or",
   'quebec',
   'travel',
   'room',
   'board',
   'provided',
   'agnico',
   'eagle',
   'to',
   'apply',
   'position',
   'please',
   'use',
   'following',
   'url',
   'httpsars2equestcomresponse_idb79a50f3edc987d26ffdf6568a7c1604']},
 'ExigencesParticulieres': {21: ['required',
   'skills',
   'what',
   'you',
   'need',
   'to',
   'succeed',
   'enrolled',
   'graduated',
   "bachelor's",
   'degree',
   'human',
   'resources',
   'administration',
   'management',
   'industrial',
   'relations',
   'related',
   'field',
   'mining',
   'experience',
   'asset',
   'strong',
   'sense',
   'organization',
   'quick',
   'learner',
   'experience',
   'working',
   'multicultural',
   'environment',
   'asset',
   'excellent',
   'communication',
   'skills',
   'english',
   'written',
   'spoken',
   'must',
   'strong',
   'interpersonal',
   'communication',
   'team',
   'building',
   'skills',
   'strong',
   'computer',
   'skills',
   'including',
   'use',
   'word',
   'excel',
   'powerpoint'],
  22: ['required',
   'skills',
   'what',
   'you',
   'need',
   'to',
   'succeed',
   'being',
   'autonomous',
   'proactive',
   'valuable',
   'must',
   'quick',
   'learner',
   'organizational',
   'skills',
   'required',
   'enrolled',
   'graduated',
   "bachelor's",
   'degree',
   'mechanical',
   'mining',
   'engineering',
   'related',
   'field',
   'mining',
   'experience',
   'asset',
   'underground',
   'experience',
   'asset',
   'experience',
   'working',
   'multicultural',
   'environment',
   'asset',
   'excellent',
   'communication',
   'skills',
   'english',
   'written',
   'spoken',
   'must',
   'strong',
   'interpersonal',
   'communication',
   'team',
   'building',
   'skills'],
  23: ['required',
   'skills',
   'what',
   'you',
   'need',
   'to',
   'succeed',
   'valid',
   "driver's",
   'license',
   'enrolled',
   'graduated',
   "bachelor's",
   'degree',
   'mining',
   'engineering',
   'related',
   'field',
   'mining',
   'experience',
   'asset',
   'experience',
   'working',
   'multicultural',
   'environment',
   'asset',
   'excellent',
   'communication',
   'skills',
   'english',
   'written',
   'spoken',
   'must',
   'strong',
   'interpersonal',
   'communication',
   'team',
   'building',
   'skills'],
  24: ['required',
   'skills',
   'what',
   'you',
   'need',
   'to',
   'succeed',
   'enrolled',
   'graduated',
   "bachelor's",
   'degree',
   'kind',
   'geosciences',
   'related',
   'fields',
   'mining',
   'experience',
   'asset',
   'experience',
   'working',
   'multicultural',
   'environment',
   'asset',
   'excellent',
   'communication',
   'skills',
   'english',
   'written',
   'spoken',
   'must',
   'strong',
   'interpersonal',
   'communication',
   'team',
   'building',
   'skills']}}

The code for generating the unigrams and bigrams and their counts

from nltk.util import ngrams
from nltk import FreqDist
from collections import Counter

col_list = ['FonctionsStagiaire', 'ExigencesParticulieres']


for col in col_list:
    
    df[col+'_unigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 1))) 
    #try:
    df[col+'_bigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 2)))
    #except RuntimeError:
    #    for i in df.index:
    #            print(df.index)
    df[col+'_unigrams_freq_dist'] = df[col+'_unigrams'].apply(lambda row: list(nltk.FreqDist(row)))
    df[col+'_bigrams_freq_dist'] = df[col+'_bigrams'].apply(lambda row: list(nltk.FreqDist(row)))
    df[col+'_unigrams_counts'] = df[col+'_unigrams_freq_dist'].apply(lambda row: list(Counter(row).most_common()))
    df[col+'_bigrams_counts'] = df[col+'_bigrams_freq_dist'].apply(lambda row: list(Counter(row).most_common()))

更详细的错误

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/nltk/util.py:468, in ngrams(sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol)
    467 while n > 1:
--> 468     history.append(next(sequence))
    469     n -= 1

StopIteration: 

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[100], line 12
     10 df[col+'_unigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 1))) 
     11 #try:
---> 12 df[col+'_bigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 2)))
     13 #except RuntimeError:
     14 #    for i in df.index:
     15 #            print(df.index)
     16 df[col+'_unigrams_freq_dist'] = df[col+'_unigrams'].apply(lambda row: list(nltk.FreqDist(row)))

File /opt/conda/lib/python3.10/site-packages/pandas/core/series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4661 def apply(
   4662     self,
   4663     func: AggFuncType,
   (...)
   4666     **kwargs,
   4667 ) -> DataFrame | Series:
   4668     """
   4669     Invoke function on values of Series.
   4670 
   (...)
   4769     dtype: float64
   4770     """
-> 4771     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File /opt/conda/lib/python3.10/site-packages/pandas/core/apply.py:1123, in SeriesApply.apply(self)
   1120     return self.apply_str()
   1122 # self.f is Callable
-> 1123 return self.apply_standard()

File /opt/conda/lib/python3.10/site-packages/pandas/core/apply.py:1174, in SeriesApply.apply_standard(self)
   1172     else:
   1173         values = obj.astype(object)._values
-> 1174         mapped = lib.map_infer(
   1175             values,
   1176             f,
   1177             convert=self.convert_dtype,
   1178         )
   1180 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1181     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1182     #  See also GH#25959 regarding EA support
   1183     return obj._constructor_expanddim(list(mapped), index=obj.index)

File /opt/conda/lib/python3.10/site-packages/pandas/_libs/lib.pyx:2924, in pandas._libs.lib.map_infer()

Cell In[100], line 12, in <lambda>(row)
     10 df[col+'_unigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 1))) 
     11 #try:
---> 12 df[col+'_bigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 2)))
     13 #except RuntimeError:
     14 #    for i in df.index:
     15 #            print(df.index)
     16 df[col+'_unigrams_freq_dist'] = df[col+'_unigrams'].apply(lambda row: list(nltk.FreqDist(row)))

RuntimeError: generator raised StopIteration

我无法理解为什么会抛出此错误。我也在其他数据集上运行了相同的代码,并且在每个数据集上都运行良好。

任何帮助将不胜感激。

Python Pandas DataFrame NLP 频率分析

评论

0赞 gmohor21 8/26/2023
是的,添加了输出。

答: 暂无答案