Pandas库学习之二：查找 - 路神

查找

isin(...)

    def isin(self, values) -> "DataFrame":
        """
        Examples
        --------
        >>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
        ...                   index=['falcon', 'dog'])
        >>> df
                num_legs  num_wings
        falcon         2          2
        dog            4          0

        When ``values`` is a list check whether every value in the DataFrame
        is present in the list (which animals have 0 or 2 legs or wings)

        >>> df.isin([0, 2])
                num_legs  num_wings
        falcon      True       True
        dog        False       True

        When ``values`` is a dict, we can pass values to check for each
        column separately:

        >>> df.isin({'num_wings': [0, 3]})
                num_legs  num_wings
        falcon     False      False
        dog        False       True

        When ``values`` is a Series or DataFrame the index and column must
        match. Note that 'falcon' does not match based on the number of legs
        in df2.

        >>> other = pd.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]},
        ...                      index=['spider', 'falcon'])
        >>> df.isin(other)
                num_legs  num_wings
        falcon      True       True
        dog        False      False
        """
        if isinstance(values, dict):
            from pandas.core.reshape.concat import concat

            values = collections.defaultdict(list, values)
            return concat(
                (
                    self.iloc[:, [i]].isin(values[col])
                    for i, col in enumerate(self.columns)
                ),
                axis=1,
            )
        elif isinstance(values, Series):
            if not values.index.is_unique:
                raise ValueError("cannot compute isin with a duplicate axis.")
            return self.eq(values.reindex_like(self), axis="index")
        elif isinstance(values, DataFrame):
            if not (values.columns.is_unique and values.index.is_unique):
                raise ValueError("cannot compute isin with a duplicate axis.")
            return self.eq(values.reindex_like(self))
        else:
            if not is_list_like(values):
                raise TypeError(
                    "only list-like or dict-like objects are allowed "
                    "to be passed to DataFrame.isin(), "
                    f"you passed a '{type(values).__name__}'"
                )
            return self._constructor(
                algorithms.isin(self.values.ravel(), values).reshape(self.shape),
                self.index,
                self.columns,
            )

PS: 筛选具体条件的dataframe

返回含有具体条件的dataframe, 如返回 'A'列中含有 [4,8] 的dataframe( 用逆函数对筛选后的结果取余，起删除指定行作用 )

IN [1]: data
Out[1]: 
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11
 
In [2]: data[data['A'].isin([4,8])] #返回值满足，{A列含有数值[4,8]}
Out[2]: 
   A  B   C   D
1  4  5   6   7
2  8  9  10  11
 
In [3]: data[~data['A'].isin([4,8])] #逆函数 剔除{A列含有数值[4,8]}的dataframe
Out[3]: 
   A  B  C  D
0  0  1  2  3

返回含有多个条件的dataframe, 如返回 'A'列中含有 4, 'A'列中含有 5 的dataframe( 用逆函数对筛选后的结果取余，起删除指定行作用 )

In [4]: data[data['A'].isin([4]) & data['B'].isin([5])] #返回值满足，{A列含有4, B列含有5}
Out[4]: 
   A  B  C  D
1  4  5  6  7
 
#逆函数 剔除{A列含有4, B列含有5}的dataframe, 多个条件的逆函数，一定要记得用()将条件包含起来
In [5]: data[~(data['A'].isin([4]) & data['B'].isin([5]))] 
Out[5]: 
   A  B   C   D
0  0  1   2   3

返回含有条件所在行的行号（Index）

In [6]: list(data[data['A'].isin([4,8])].index)
Out[6]: [1, 2]
    
print(type(df[df['A'].isin([4])].index), df[df['A'].isin([4])].index)
# <class 'pandas.core.indexes.numeric.Int64Index'> Int64Index([1], dtype='int64')
print(list(df[df['A'].isin([4])].index))
# [1]
print(df[df['A'].isin([4])].index[0])
# 1

发表于 2020-12-29 00:28 路神阅读(140) 评论(0) 编辑收藏举报