如何获得条件上的特定行?

How to get specific rows on conditions?

提问人:Ranjan raghav 提问时间:11/12/2018 更新时间:11/12/2018 访问量:24

问:

对于如下所示的数据:

Name      Stage           Start                 End

Hulk        1      21/10/2018 06:34:15    21/10/2018 07:34:15
Hulk        2      21/10/2018 07:34:15    21/10/2018 07:54:15
Hulk        3      21/10/2018 07:58:15    21/10/2018 08:14:15
Hulk        4      21/10/2018 08:14:15    21/10/2018 08:34:15
Sam         A1     21/10/2018 09:34:15    21/10/2018 10:34:15
Sam         A2     21/10/2018 10:34:15    21/10/2018 10:45:15
Sam         A3     21/10/2018 10:45:15    21/10/2018 11:00:15
Sam         A4     21/10/2018 11:00:15    21/10/2018 11:34:15
Bruce       1.1    21/10/2018 11:34:15    21/10/2018 11:45:15
Bruce       1.2    21/10/2018 11:45:15    21/10/2018 12:00:15
Bruce       1.3    21/10/2018 12:00:15    21/10/2018 12:25:15
Bruce       1.4    21/10/2018 12:25:15    21/10/2018 12:45:15
Peter        1     21/10/2018 12:45:15    21/10/2018 01:05:15
Peter        1     21/10/2018 01:05:15    21/10/2018 01:15:15

我怎么能拥有每个喜欢的实例,它们从它开始并持续?firstlastStageName14

数据帧应采用以下方式:

Name      Stage           Start                 End

Hulk        1      21/10/2018 06:34:15    21/10/2018 07:34:15
Hulk        4      21/10/2018 08:14:15    21/10/2018 08:34:15
Sam         A1     21/10/2018 09:34:15    21/10/2018 10:34:15
Sam         A4     21/10/2018 11:00:15    21/10/2018 11:34:15
Bruce       1.1    21/10/2018 11:34:15    21/10/2018 11:45:15
Bruce       1.4    21/10/2018 12:25:15    21/10/2018 12:45:15

我尝试过,但没有得到如上所述的所需数据帧。groupby([Name,Stage])

python-2.7 pandas 数据帧 切片

评论


答:

3赞 jezrael 11/12/2018 #1

首先将 duplicatedstr.contains布尔索引一起使用,以返回必要的行,然后使用 map value_counts仅筛选 2 个行组:

m1 = ~df['Name'].duplicated()
m2 = df['Stage'].str.contains('1')

m3 = ~df['Name'].duplicated(keep='last')
m4 = df['Stage'].str.contains('4')

df1 = df[(m1 & m2) | (m3 & m4)].copy()

df1 = df1[df1['Name'].map(df1['Name'].value_counts()) == 2]
print (df1)
     Name Stage                Start                  End
0    Hulk     1  21/10/2018 06:34:15  21/10/2018 07:34:15
3    Hulk     4  21/10/2018 08:14:15  21/10/2018 08:34:15
4     Sam    A1  21/10/2018 09:34:15  21/10/2018 10:34:15
7     Sam    A4  21/10/2018 11:00:15  21/10/2018 11:34:15
8   Bruce   1.1  21/10/2018 11:34:15  21/10/2018 11:45:15
11  Bruce   1.4  21/10/2018 12:25:15  21/10/2018 12:45:15

评论

1赞 jezrael 11/12/2018
@RavinderSingh13 - 使用 - 分隔符为正则表达式 2 个或更多空格df = pd.read_clipboard(sep='\s{2,}')
1赞 RavinderSingh13 11/12/2018
谢谢 TON 先生,它帮助了我。你真棒。
0赞 Ranjan raghav 11/12/2018
@jezrael 感谢 jezrael 的可爱解决方案,真的很佩服您的概念力量:)