如何在python中使用/解析tr11.wc.arff,CSTR.arff数据进行聚类?

How to use/parse tr11.wc.arff, CSTR.arff data in python for clustering?

提问人:Uadip 提问时间:11/10/2023 最后编辑:Uadip 更新时间:11/13/2023 访问量:33

问:

我是一名学生,是新手。 我找到了一篇使用 tr11.wc.arff、tr23.arff 和 CSTR.arff 等进行聚类的论文。(数据可在此处找到:http://sites.labic.icmc.usp.br/ragero/arffs/) 我正在尝试使用相同的数据集并将其用于 python 中的聚类。 但是我无法读取或解析这些数据,而且我只得到了 arff 数据的要点。 tr11.wc.arff 的示例内容为:

% ARFF format training set
@RELATION tr11.mat

@ATTRIBUTE outfit   integer
@ATTRIBUTE hasn integer
@ATTRIBUTE calm integer
@ATTRIBUTE gene integer
.
.
.

@DATA

{28 1,30 9,33 3,258 1,329 1,346 2,351 2,352 1,353 7,367 2,376 2,379 1,381 1,385 4,387 1,391 1,392 2,404 3,405 1,1162 4,1221 1,1460 1,1462 1,1470 1,1498 4,1499 1,1501 1,1502 1,1505 2,1506 2,1563 1,1576 1,1695 1,1708 1,1743 1,1755 1,1779 1,1828 1,1877 1,1915 1,1934 1,1973 1,2008 1,2130 1,2133 1,2149 2,2173 2,2186 1,2202 1,2219 1,2231 2,2235 1,2276 1,2282 1,2284 1,2325 1,2369 2,2376 1,2390 1,2401 3,2431 1,2457 1,2467 2,2498 1,2587 1,2726 2,2744 2,2747 1,2769 2,2774 1,2796 1,3005 1,3025 1,3192 1,3203 1,3207 1,3224 1,3228 3,3267 1,3268 1,3269 1,3270 1,3337 1,3367 1,3384 1,3413 1,3451 2,3472 4,3488 3,3505 1,3524 1,3528 1,3545 1,3546 1,3552 1,3589 3,3623 1,3675 1,3688 1,3690 2,3705 2,3724 3,3727 1,3732 3,3803 3,3814 6,3819 3,3825 1,3826 12,3839 2,3841 2,3846 2,3849 13,3868 1,3870 3,3882 1,3890 1,3917 2,3928 2,3980 4,4022 1,4049 1,4100 7,4137 1,4138 1,4242 1,4346 1,4376 1,4399 3,4405 2,4428 1,4430 3,4455 5,4485 1,4509 1,4520 1,4527 1,4542 3,4600 2,4616 1,4728 1,4770 1,4804 1,4824 15,4854 2,4863 1,4896 1,4901 1,4903 7,4943 1,4952 3,4957 1,5098 1,5110 1,5122 2,5161 1,5170 4,5191 1,5394 1,5401 1,5421 5,5486 1,5489 1,5494 5,5508 1,5512 1,5515 2,5543 1,5578 1,5774 1,5789 1,5810 1,5824 1,5828 2,6113 1,6234 2,6309 1,6429 4}
{30 7,36 1,256 

我使用了不同的库,但无法使用数据。 请帮帮我!

我用过这个

from scipy.io import arff
import numpy as np

file = '/Users/uadip/Documents/tr11.wc.arff'

data, meta = arff.loadarff(file)

# Extract feature names and data
attributes = meta.names()
data = np.array(data.tolist())

并收到此错误

Traceback (most recent call last):
  File "/Users/uadip/my_workspace/python/code_gen/final tests/pg_2.py", line 6, in <module>
    data, meta = arff.loadarff(file)
  File "/Users/uadip/my_workspace/python/code_gen/venv/lib/python3.9/site-packages/scipy/io/arff/_arffread.py", line 802, in loadarff
    return _loadarff(ofile)
  File "/Users/uadip/my_workspace/python/code_gen/venv/lib/python3.9/site-packages/scipy/io/arff/_arffread.py", line 867, in _loadarff
    a = list(generator(ofile))
  File "/Users/uadip/my_workspace/python/code_gen/venv/lib/python3.9/site-packages/scipy/io/arff/_arffread.py", line 865, in generator
    yield tuple([attr[i].parse_data(row[i]) for i in elems])
  File "/Users/uadip/my_workspace/python/code_gen/venv/lib/python3.9/site-packages/scipy/io/arff/_arffread.py", line 865, in <listcomp>
    yield tuple([attr[i].parse_data(row[i]) for i in elems])
  File "/Users/uadip/my_workspace/python/code_gen/venv/lib/python3.9/site-packages/scipy/io/arff/_arffread.py", line 223, in parse_data
    return float(data_str)
ValueError: could not convert string to float: '{28 1'

也使用过其他方法,但做不到。 另一个:

import arff

file = '/Users/uadip/Documents/tr11.wc.arff'
data = arff.load(file)

data_values = []
for row in data:
    row_values = []
    for key in row:
        row_values.append(row[key])
        data_values.append(row_values)

错误:

Traceback (most recent call last):
  File "/Users/uadip/my_workspace/python/code_gen/final tests/pg_2.py", line 7, in <module>
    for row in data:
  File "/Users/uadip/my_workspace/python/code_gen/venv/lib/python3.9/site-packages/arff/__init__.py", line 240, in load
    for item in Reader(fhand):
  File "/Users/uadip/my_workspace/python/code_gen/venv/lib/python3.9/site-packages/arff/__init__.py", line 273, in __iter__
    field_type_text = space_separated[2].strip()
IndexError: list index out of range
python 集群分析 arff

评论

0赞 Péter Szilvási 11/11/2023
您没有包含完整的错误消息。另外,您可以共享文件的内容吗?根据错误消息,您的数据中可能有一个字符串而不是浮点数。pg_2.pyValueError: could not convert string to float: '{28 1'
0赞 Uadip 11/13/2023
嗨,PéterSzilvási,感谢您的回复!我已经用代码更新了描述。此外,我还把 arff 文件的一部分也放在了描述中。
0赞 Uadip 11/17/2023
我参考了这个评论并通过 liac-arff 工作。但是现在我不知道如何使用数据,什么是数据以及如何将其用于转换为 TFIDF 的聚类import arff

答: 暂无答案