提问人:gmohor21 提问时间:8/26/2023 最后编辑:gmohor21 更新时间:8/26/2023 访问量:33
获取 RuntimeError:生成器在使用 nltk 模块生成二元组时引发了 StopIteration
Getting RuntimeError: generator raised StopIteration while generating bigrams using nltk module
问:
我正在尝试使用生成二元组,但出现错误。如何针对我的特定问题修复此错误?nltk.ngrams
RuntimeError: generator raised StopIteration
我的数据帧有多列,其中只有两列是我感兴趣的,即 和 。在生成 unigram 和 bigram 之前,这两列看起来像这样作为标记。df
FonctionsStagiaire
ExigencesParticulieres
FonctionsStagiaire | ExigencesParticulieres |
---|---|
[2022 年秋季,人力资源、培训、国际... | 【必》《须》《技》《术》《什》《么》《你》《需》《要》《的》《事》《... |
[具体来说,寻找,软件,工程师,r... | [对于,游戏,机制,核心,引擎,工具,li... |
【》《怎》《么》《做》《你》《值》《在》《一》《个》《职》《业》《上》《... | [学士、学位、学生、电气、工程学... |
[您的贡献,报告,可靠性,s... | [流利、英语、口语、技能、能力、游泳...... |
输出df[['FonctionsStagiaire', 'ExigencesParticulieres']].head(4).to_dict()
{'FonctionsStagiaire': {21: ['fall',
'2022',
'human',
'resources',
'training',
'internship',
'remote',
'mea01229',
'what',
'do',
'you',
'value',
'in',
'a',
'career',
'at',
'agnico',
'eagle',
'values',
'never',
'waver',
'we',
'believe',
'trust',
'respect',
'equality',
'family',
'responsibility',
'why',
'because',
'express',
'helped',
'us',
'succeed',
'business',
'60',
'years',
'about',
'meadowbank',
'our',
'nunavut',
'operations',
'agnico',
'eagle',
'always',
'looking',
'new',
'talented',
'team',
'members',
'join',
'nunavut',
'mining',
'operations',
'we',
'operating',
'meadowbank',
'first',
'low',
'arctic',
'mine',
'near',
'baker',
'lake',
'nine',
'years',
'the',
'mine',
'produced',
'three',
'millionth',
'ounce',
'gold',
'2018',
'2019',
'marked',
'last',
'year',
'production',
'meadowbank',
'mine',
'since',
'transitioned',
'process',
'ore',
'amaruq',
'satellite',
'deposit',
'with',
'official',
'opening',
'amaruq',
'whale',
'tail',
'project',
'august',
'2019',
'project',
'referred',
'meadowbank',
'complex',
'your',
'contribution',
'reporting',
'training',
'coordinator',
'training',
'intern',
'part',
'people',
'development',
'department',
'collaborates',
'departments',
'mine',
'shehe',
'ensure',
'goals',
'objectives',
'achieved',
'promoting',
'respecting',
'agnico',
"eagle's",
'culture',
'health',
'safety',
'code',
'conduct',
'environment',
'coordinate',
'manage',
'projects',
'related',
'training',
'northern',
'mining',
'environment',
'develop',
'modify',
'training',
'content',
'assist',
'planning',
'tracking',
'training',
'activities',
'develop',
'maintain',
'effective',
'training',
'materials',
'assist',
'management',
'elearning',
'training',
'platform',
'participate',
'creation',
'training',
'practices',
'procedures',
'your',
'work',
'schedule',
'schedule',
'14',
'days',
'work',
'12',
'hour',
'shifts',
'followed',
'14',
'days',
'transportation',
'rest',
'flights',
'departing',
'communities',
'kivalliq',
'region',
'mirabel',
"vald'or",
'quebec',
'travel',
'room',
'board',
'provided',
'agnico',
'eagle',
'to',
'apply',
'position',
'please',
'use',
'following',
'url',
'httpsars2equestcomresponse_id1cf099f2f50a501d123d332ae1931084'],
22: ['fall',
'2022',
'mechanical',
'engineering',
'mobile',
'maintenance',
'planner',
'intern',
'remote',
'mea01233',
'your',
'contribution',
'reporting',
'reliability',
'specialist',
'part',
'maintenance',
'department',
'collaborates',
'departments',
'mine',
'heshe',
'ensure',
'goals',
'objectives',
'achieved',
'promoting',
'respecting',
'agnico',
"eagle's",
'culture',
'health',
'safety',
'code',
'conduct',
'environment',
'your',
'task',
'be',
'optimize',
'implement',
'preventive',
'maintenance',
'plans',
'monitor',
'oil',
'samples',
'fleet',
'schedule',
'preventive',
'replacement',
'benchmarked',
'components',
'update',
'prediction',
'report',
'be',
'responsible',
'various',
'optimization',
'projects',
'maintenance',
'department',
'monitor',
'23',
'new',
'long',
'haul',
'trucks',
'mine',
'acquired',
'support',
'reliability',
'specialist',
'different',
'tasks',
'your',
'work',
'schedule',
'schedule',
'14',
'days',
'work',
'followed',
'14',
'days',
'transportation',
'rest',
'flights',
'departing',
'communities',
'kivalliq',
'region',
'mirabel',
"vald'or",
'quebec',
'to',
'apply',
'position',
'please',
'use',
'following',
'url',
'httpsars2equestcomresponse_ide5baa1f06dae8d5dcf05e2b228680085'],
23: ['fall',
'2022',
'mine',
'engineering',
'drill',
'blast',
'internship',
'remote',
'mea01235',
'your',
'contribution',
'reporting',
'production',
'engineering',
'coordinator',
'production',
'engineering',
'intern',
'part',
'engineering',
'department',
'collaborates',
'departments',
'mine',
'shehe',
'ensure',
'goals',
'objectives',
'achieved',
'promoting',
'respecting',
'agnico',
"eagle's",
'culture',
'health',
'safety',
'code',
'conduct',
'environment',
'there',
'two',
'primary',
'tasks',
'engineering',
'intern',
'the',
'first',
'task',
'position',
'fulfilling',
'quality',
'assurancequality',
'control',
'qaqc',
'duties',
'drill',
'blast',
'by',
'performing',
'qaqc',
'drill',
'blast',
'patterns',
'field',
'henceforth',
'referred',
'qaqc',
'drilling',
'loading',
'collecting',
'compiling',
'qaqc',
'data',
'communicating',
'qaqc',
'data',
'engineering',
'team',
'the',
'second',
'task',
'position',
'performing',
'fragmentation',
'analysis',
'drill',
'blast',
'engineers',
'by',
'taking',
'pictures',
'muck',
'faces',
'field',
'performing',
'split',
'desktop',
'analysis',
'pictures',
'communicating',
'fragmentation',
'results',
'engineering',
'team',
'primary',
'duties',
'tracking',
'progression',
'drilling',
'mucking',
'morning',
'meeting',
'ensure',
'priorities',
'qaqc',
'fragmentation',
'analysis',
'met',
'quality',
'assurancequality',
'control',
'drilling',
'loading',
'practices',
'field',
'fragmentation',
'analysis',
'active',
'mucking',
'faces',
'promote',
'health',
'safety',
'participating',
'monthly',
'departmental',
'hs',
'meeting',
'secondary',
'duties',
'provide',
'technical',
'support',
'drill',
'blast',
'engineer',
'blast',
'optimization',
'project',
'proposals',
'or',
'needsinterests',
'floor',
'analysis',
'muck',
'floor',
'water',
'presence',
'drill',
'patterns',
'loading',
'statistics',
'powder',
'factor',
'analysis',
'provide',
'relief',
'support',
'mine',
'clerk',
'vacations',
'special',
'projects',
'according',
'needs',
'engineering',
'mine',
'department',
'providing',
'relief',
'support',
'engineering',
'team',
'vacations',
'drill',
'pattern',
'design',
'blast',
'timing',
'design',
'your',
'work',
'schedule',
'schedule',
'14',
'days',
'work',
'followed',
'14',
'days',
'transportation',
'rest',
'flights',
'departing',
'communities',
'kivalliq',
'region',
'mirabel',
"vald'or",
'quebec',
'to',
'apply',
'position',
'please',
'use',
'following',
'url',
'httpsars2equestcomresponse_id0fc30561bcc0715fe84e7690663f1bc8'],
24: ['what',
'do',
'you',
'value',
'in',
'a',
'career',
'at',
'agnico',
'eagle',
'values',
'never',
'waver',
'we',
'believe',
'trust',
'respect',
'equality',
'family',
'responsibility',
'why',
'because',
'express',
'helped',
'us',
'succeed',
'business',
'60',
'years',
'about',
'meadowbank',
'our',
'nunavut',
'operations',
'agnico',
'eagle',
'always',
'looking',
'new',
'talented',
'team',
'members',
'join',
'nunavut',
'mining',
'operations',
'we',
'operating',
'meadowbank',
'first',
'low',
'arctic',
'mine',
'near',
'baker',
'lake',
'nine',
'years',
'the',
'mine',
'produced',
'three',
'millionth',
'ounce',
'gold',
'2018',
'2019',
'marked',
'last',
'year',
'production',
'meadowbank',
'mine',
'since',
'transitioned',
'process',
'ore',
'amaruq',
'satellite',
'deposit',
'with',
'official',
'opening',
'amaruq',
'whale',
'tail',
'project',
'august',
'2019',
'project',
'referred',
'meadowbank',
'complex',
'your',
'contribution',
'reporting',
'senior',
'grade',
'control',
'technician',
'geology',
'intern',
'part',
'mine',
'geology',
'department',
'collaborates',
'departments',
'mine',
'shehe',
'ensure',
'goals',
'objectives',
'achieved',
'promoting',
'respecting',
'agnico',
"eagle's",
'culture',
'health',
'safety',
'code',
'conduct',
'environment',
'drill',
'blast',
'excavation',
'monitoring',
'regards',
'mine',
'geology',
'grade',
'control',
'standards',
'qaqc',
'field',
'regular',
'audits',
'quality',
'sampling',
'layout',
'ore',
'packets',
'blasted',
'muck',
'define',
'different',
'ore',
'zones',
'mining',
'daily',
'sample',
'collection',
'mine',
'shipment',
'lab',
'monitoring',
'any',
'tasks',
'senior',
'grade',
'control',
'andor',
'production',
'geologist',
'might',
'identify',
'position',
'approximately',
'90',
'field',
'work',
'10',
'office',
'work',
'your',
'work',
'schedule',
'schedule',
'14',
'days',
'work',
'followed',
'14',
'days',
'transportation',
'rest',
'flights',
'departing',
'communities',
'kivalliq',
'region',
'mirabel',
"vald'or",
'quebec',
'travel',
'room',
'board',
'provided',
'agnico',
'eagle',
'to',
'apply',
'position',
'please',
'use',
'following',
'url',
'httpsars2equestcomresponse_idb79a50f3edc987d26ffdf6568a7c1604']},
'ExigencesParticulieres': {21: ['required',
'skills',
'what',
'you',
'need',
'to',
'succeed',
'enrolled',
'graduated',
"bachelor's",
'degree',
'human',
'resources',
'administration',
'management',
'industrial',
'relations',
'related',
'field',
'mining',
'experience',
'asset',
'strong',
'sense',
'organization',
'quick',
'learner',
'experience',
'working',
'multicultural',
'environment',
'asset',
'excellent',
'communication',
'skills',
'english',
'written',
'spoken',
'must',
'strong',
'interpersonal',
'communication',
'team',
'building',
'skills',
'strong',
'computer',
'skills',
'including',
'use',
'word',
'excel',
'powerpoint'],
22: ['required',
'skills',
'what',
'you',
'need',
'to',
'succeed',
'being',
'autonomous',
'proactive',
'valuable',
'must',
'quick',
'learner',
'organizational',
'skills',
'required',
'enrolled',
'graduated',
"bachelor's",
'degree',
'mechanical',
'mining',
'engineering',
'related',
'field',
'mining',
'experience',
'asset',
'underground',
'experience',
'asset',
'experience',
'working',
'multicultural',
'environment',
'asset',
'excellent',
'communication',
'skills',
'english',
'written',
'spoken',
'must',
'strong',
'interpersonal',
'communication',
'team',
'building',
'skills'],
23: ['required',
'skills',
'what',
'you',
'need',
'to',
'succeed',
'valid',
"driver's",
'license',
'enrolled',
'graduated',
"bachelor's",
'degree',
'mining',
'engineering',
'related',
'field',
'mining',
'experience',
'asset',
'experience',
'working',
'multicultural',
'environment',
'asset',
'excellent',
'communication',
'skills',
'english',
'written',
'spoken',
'must',
'strong',
'interpersonal',
'communication',
'team',
'building',
'skills'],
24: ['required',
'skills',
'what',
'you',
'need',
'to',
'succeed',
'enrolled',
'graduated',
"bachelor's",
'degree',
'kind',
'geosciences',
'related',
'fields',
'mining',
'experience',
'asset',
'experience',
'working',
'multicultural',
'environment',
'asset',
'excellent',
'communication',
'skills',
'english',
'written',
'spoken',
'must',
'strong',
'interpersonal',
'communication',
'team',
'building',
'skills']}}
The code for generating the unigrams and bigrams and their counts
from nltk.util import ngrams
from nltk import FreqDist
from collections import Counter
col_list = ['FonctionsStagiaire', 'ExigencesParticulieres']
for col in col_list:
df[col+'_unigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 1)))
#try:
df[col+'_bigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 2)))
#except RuntimeError:
# for i in df.index:
# print(df.index)
df[col+'_unigrams_freq_dist'] = df[col+'_unigrams'].apply(lambda row: list(nltk.FreqDist(row)))
df[col+'_bigrams_freq_dist'] = df[col+'_bigrams'].apply(lambda row: list(nltk.FreqDist(row)))
df[col+'_unigrams_counts'] = df[col+'_unigrams_freq_dist'].apply(lambda row: list(Counter(row).most_common()))
df[col+'_bigrams_counts'] = df[col+'_bigrams_freq_dist'].apply(lambda row: list(Counter(row).most_common()))
更详细的错误
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/nltk/util.py:468, in ngrams(sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol)
467 while n > 1:
--> 468 history.append(next(sequence))
469 n -= 1
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[100], line 12
10 df[col+'_unigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 1)))
11 #try:
---> 12 df[col+'_bigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 2)))
13 #except RuntimeError:
14 # for i in df.index:
15 # print(df.index)
16 df[col+'_unigrams_freq_dist'] = df[col+'_unigrams'].apply(lambda row: list(nltk.FreqDist(row)))
File /opt/conda/lib/python3.10/site-packages/pandas/core/series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
4661 def apply(
4662 self,
4663 func: AggFuncType,
(...)
4666 **kwargs,
4667 ) -> DataFrame | Series:
4668 """
4669 Invoke function on values of Series.
4670
(...)
4769 dtype: float64
4770 """
-> 4771 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File /opt/conda/lib/python3.10/site-packages/pandas/core/apply.py:1123, in SeriesApply.apply(self)
1120 return self.apply_str()
1122 # self.f is Callable
-> 1123 return self.apply_standard()
File /opt/conda/lib/python3.10/site-packages/pandas/core/apply.py:1174, in SeriesApply.apply_standard(self)
1172 else:
1173 values = obj.astype(object)._values
-> 1174 mapped = lib.map_infer(
1175 values,
1176 f,
1177 convert=self.convert_dtype,
1178 )
1180 if len(mapped) and isinstance(mapped[0], ABCSeries):
1181 # GH#43986 Need to do list(mapped) in order to get treated as nested
1182 # See also GH#25959 regarding EA support
1183 return obj._constructor_expanddim(list(mapped), index=obj.index)
File /opt/conda/lib/python3.10/site-packages/pandas/_libs/lib.pyx:2924, in pandas._libs.lib.map_infer()
Cell In[100], line 12, in <lambda>(row)
10 df[col+'_unigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 1)))
11 #try:
---> 12 df[col+'_bigrams'] = df[col].apply(lambda row: list(nltk.ngrams(row, 2)))
13 #except RuntimeError:
14 # for i in df.index:
15 # print(df.index)
16 df[col+'_unigrams_freq_dist'] = df[col+'_unigrams'].apply(lambda row: list(nltk.FreqDist(row)))
RuntimeError: generator raised StopIteration
我无法理解为什么会抛出此错误。我也在其他数据集上运行了相同的代码,并且在每个数据集上都运行良好。
任何帮助将不胜感激。
答: 暂无答案
上一个:按词频排序列表:排序时不输出频率
下一个:我们如何准确估计这种信号的频率?
评论