Python Pandas：确认 DataFrame A 的子字符串是否是 DataFrame B 中字符串的一部分-解网

问：

我有两个数据帧：

import pandas as pd

countries_list_a = pd.DataFrame({'Country' : ['Australia', 'United Kingdom', 'United States'], 'Code' : ['A', 'B', 'C']})

countries_list_b = pd.DataFrame ({'Name' : ['Jack', 'Maria', 'David'], 'Geo' : ['New York, United States', 'Sydney, Australia', 'London, United Kingdom']})

并且我想迭代检查特定列的每个单元格中的字符串是否可以在特定列的任何单元格中找到。countries_list_acountries_list_b

更具体地说，这些是列：

countries_list_a["Country"]
countries_list_b["Geo"]

并且我想迭代检查每个单元格中的字符串是否可以在的任何单元格中发现。需要注意的是，我需要跟踪索引并将它们分别存储在数组中。如果我尝试像这样手动验证字符串的存在：countries_list_a["Country"]countries_list_b["Geo"]

tmp_country_a = countries_list_a["Country"][0]
tmp_country_b = countries_list_b["Geo"][1]

if (tmp_country_a in tmp_country_b):
 print ("Correct")

一切正常。当然，手动执行此过程不是一种选择。

如果我尝试构建这样的循环：

index_a = np.size(countries_list_a["Country"])
index_b = np.size(countries_list_b["Geo"])

for i in range(index_a):
 tmp_country_a = countries_list_a["Country"][i]
 for j in range(index_b):
  tmp_country_b = countries_list_b["Geo"][j]
  if (tmp_country_a in tmp_country_b):
   print("Correct")

我收到以下错误：

TypeError: argument of type 'int' is not iterable

有什么想法吗？提前致谢

python-3.x pandas 字符串数据帧查找

关于第二条评论，这是该专栏的简化示例。没有特定的顺序，单元格可以是：或。所以我需要检查字符串。关于第三条评论，我需要知道每个国家/地区在哪一行中，以给定的列顺序'Orlando, FL, United States'Idaho, Moscow, United States of America

1赞 mozway 11/14/2023

您能否更新您的示例以反映这种更复杂的格式，如果这是偶然发生的，请添加一个副本以查看应如何处理此问题。不要忘记在给定提供的输入的情况下提供预期的输出。

答：

0赞 BERA 11/17/2023 #1

import pandas as pd

countries_list_a = pd.DataFrame({'Country' : ['Australia', 'United Kingdom', 'United States'], 'Code' : ['A', 'B', 'C']})
#         Country Code
#       Australia    A
#  United Kingdom    B
#  United States     C
countries_list_b = pd.DataFrame ({'Name' : ['Jack', 'Maria', 'David'], 'Geo' : ['New York, United States', 'Sydney, Australia', 'London, United Kingdom']})
 #  Name                      Geo
 # Jack  New York, United States
 # Maria        Sydney, Australia
 # David   London, United Kingdom

geo = countries_list_b.Geo
countries_list_a["country_found"] = [any(country in g for g in geo) for country in countries_list_a.Country]

# print(countries_list_a)
#         Country Code  country_found
#       Australia    A           True
#  United Kingdom    B           True
#  United States     C           True

上一个：使用带有 CBC 模式的 AES 将 UTF-16 加密为 UTF-16

下一个：传递空字符串时计数如何工作？[复制]

Python Pandas：确认 DataFrame A 的子字符串是否是 DataFrame B 中字符串的一部分

Python Pandas: Confirm if a substring of DataFrame A is part of a string in DataFrame B

评论