提问人:drpawelo 提问时间:10/10/2023 更新时间:10/14/2023 访问量:69
Python NLTK 文本离散图有 y 垂直轴是向后/反向顺序
Python NLTK text dispersion plot has y vertical axis is in backwards / reversed order
问:
自上个月以来,NLTK dispersion_plot似乎在我的机器上以相反的顺序排列了 y(垂直)轴。这可能与我的软件版本有关(我在学校虚拟机上)。
版本: NLTK 3.8.1 matplotlib 3.7.2 蟒蛇 3.9.13
法典:
from nltk.draw.dispersion import dispersion_plot
words=['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets=['aa','bbb', 'f', 'cccc']
dispersion_plot(words, targets)
预期:AAA在开头,CCCC在结尾。 实际:这是倒退的!另请注意,F 应该完全不存在 - 相反,BBB 不存在。
结论: Y 轴向后。
答:
我找到了nltk.draw.dispersion的源代码,但似乎有错误。
def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Plot"):
"""
Generate a lexical dispersion plot.
:param text: The source text
:type text: list(str) or iter(str)
:param words: The target words
:type words: list of str
:param ignore_case: flag to set if case should be ignored when searching text
:type ignore_case: bool
:return: a matplotlib Axes object that may still be modified before plotting
:rtype: Axes
"""
try:
import matplotlib.pyplot as plt
except ImportError as e:
raise ImportError(
"The plot function requires matplotlib to be installed. "
"See https://matplotlib.org/"
) from e
word2y = {
word.casefold() if ignore_case else word: y
for y, word in enumerate(reversed(words)) # <--- HERE
}
xs, ys = [], []
for x, token in enumerate(text):
token = token.casefold() if ignore_case else token
y = word2y.get(token)
if y is not None:
xs.append(x)
ys.append(y)
_, ax = plt.subplots()
ax.plot(xs, ys, "|")
ax.set_yticks(list(range(len(words))), words, color="C0") # <--- HERE
ax.set_ylim(-1, len(words))
ax.set_title(title)
ax.set_xlabel("Word Offset")
return ax
if __name__ == "__main__":
import matplotlib.pyplot as plt
from nltk.corpus import gutenberg
words = ["Elinor", "Marianne", "Edward", "Willoughby"]
dispersion_plot(gutenberg.words("austen-sense.txt"), words)
plt.show()
它使用word2y
reversed(words)
for y, word in enumerate(reversed(words))
但后来它使用 using 但它应该使用ax.set_yticks()
words
reversed(words)
ax.set_yticks(list(range(len(words))), words, color="C0")
(或者它应该在不使用 的情况下计算)。word2y
reversed()
我在上面的代码中添加了这些地方。# <--- HERE
它可能需要将其报告为问题。
此时,您可以获取并使用来纠正它。
在您的代码中,它将代替ax
set_yticks
reversed
targets
words
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
完整的工作代码
import matplotlib.pyplot as plt
from nltk.draw.dispersion import dispersion_plot
words = ['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets = ['aa','bbb', 'f', 'cccc']
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
plt.show()
编辑:我似乎几个月前报告了这个问题,他们在 GitHub 上添加了代码 - 它可能在下一个版本中起作用reversed()
色散图工作不正常 ·期刊 #3133 ·NLTK/NLTK系列
色散图无法正常工作 由 Apros7 ·拉取请求 #3134 ·NLTK/NLTK系列
根据@furas答案❤️,我更进一步并添加了一个 if 条件,只有在它们确实被破坏/向后时才会反转 y 刻度。这意味着一旦他们修复了库错误(即将修复),代码仍将有效。
from nltk.draw.dispersion import dispersion_plot
targets=['a', 'b']
filtered_text = ["a","a","b"]
my_plot = dispersion_plot(filtered_text, targets, ignore_case=True)
# THIS IS NEW: if targets are wrong, fix them (reverse them)
if [label.get_text() for label in my_plot.get_yticklabels()] != reversed(targets):
my_plot.set_yticks(list(range(len(targets))), reversed(targets))
plt.show()
上一个:LDA 主题建模生成相同/空主题
评论
targets = reversed(targets)
dispersion_plot(words, reversed(targets))
Y
Y
Y
revesed(words)