alex_bn_lee

导航

[990] Functions of pandas

Ref: pandas-cookbook


Series.isxxxx()

Series.isin(): Whether elements in Series are contained in values.

top_oceania_wines = reviews[
    (reviews.country.isin(['Australia', 'New Zealand']))
    & (reviews.points >= 95)

Series.str.islower(): Check whether all characters in each string are lowercase.

Series.str.isalpha(): Check whether all characters are alphabetic.

Series.str.isnumeric(): Check whether all characters are numeric.

Series.str.isalnum(): Check whether all characters are alphanumeric.

Series.str.isdigit(): Check whether all characters are digits.

Series.str.isdecimal(): Check whether all characters are decimal.

Series.str.isspace(): Check whether all characters are whitespace.

Series.str.islower(): Check whether all characters are lowercase.

Series.str.isupper(): Check whether all characters are uppercase.

Series.str.istitle(): Check whether all characters are titlecase.


Series.str.xxxx()

Series.str.contains(): Test if pattern or regex is contained within a string of a Series or Index.

data[data.Department.str.contains("HR")]

Series.str.capitalize(): Convert strings in the Series/Index to be capitalized. (The first letter)

Series.str.lower(): Converts all characters to lowercase.

Series.str.upper(): Converts all characters to uppercase.

Series.str.title(): Converts first character of each word to uppercase and remaining to lowercase.


if-then

An if-then on one column

df = pd.DataFrame(
    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
)

df
Out[2]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

df.loc[df.AAA >= 5, "BBB"] = -1

df
Out[4]: 
   AAA  BBB  CCC
0    4   10  100
1    5   -1   50
2    6   -1  -30
3    7   -1  -50

An if-then with assignment to 2 columns:

In [5]: df.loc[df.AAA >= 5, ["BBB", "CCC"]] = 555

In [6]: df
Out[6]: 
   AAA  BBB  CCC
0    4   10  100
1    5  555  555
2    6  555  555
3    7  555  555

Add another line with different logic, to do the -else

In [7]: df.loc[df.AAA < 5, ["BBB", "CCC"]] = 2000

In [8]: df
Out[8]: 
   AAA   BBB   CCC
0    4  2000  2000
1    5   555   555
2    6   555   555
3    7   555   555

Building criteria

Select with multi-column criteria

In [19]: df = pd.DataFrame(
   ....:     {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
   ....: )
   ....: 

In [20]: df
Out[20]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

…and (without assignment returns a Series)

In [21]: df.loc[(df["BBB"] < 25) & (df["CCC"] >= -40), "AAA"]
Out[21]: 
0    4
1    5
Name: AAA, dtype: int64

…or (without assignment returns a Series)

In [22]: df.loc[(df["BBB"] > 25) | (df["CCC"] >= -40), "AAA"]
Out[22]: 
0    4
1    5
2    6
3    7
Name: AAA, dtype: int64

…or (with assignment modifies the DataFrame.)

In [23]: df.loc[(df["BBB"] > 25) | (df["CCC"] >= 75), "AAA"] = 999

In [24]: df
Out[24]: 
   AAA  BBB  CCC
0  999   10  100
1    5   20   50
2  999   30  -30
3  999   40  -50

Selection

Dataframes

Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.

In [46]: data = {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}

In [47]: df2 = pd.DataFrame(data=data, index=[1, 2, 3, 4])  # Note index starts at 1.

In [48]: df2.iloc[1:3]  # Position-oriented
Out[48]: 
   AAA  BBB  CCC
2    5   20   50
3    6   30  -30

In [49]: df2.loc[1:3]  # Label-oriented
Out[49]: 
   AAA  BBB  CCC
1    4   10  100
2    5   20   50
3    6   30  -30

Column selection, addition, deletion

You can treat a DataFrame semantically like a dict of like-indexed Series objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict operations:

In [72]: df["one"]
Out[72]: 
a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

In [73]: df["three"] = df["one"] * df["two"]

In [74]: df["flag"] = df["one"] > 2

In [75]: df
Out[75]: 
   one  two  three   flag
a  1.0  1.0    1.0  False
b  2.0  2.0    4.0  False
c  3.0  3.0    9.0   True
d  NaN  4.0    NaN  False

Columns can be deleted or popped like with a dict:

In [76]: del df["two"]

In [77]: three = df.pop("three")

In [78]: df
Out[78]: 
   one   flag
a  1.0  False
b  2.0  False
c  3.0   True
d  NaN  False

When inserting a scalar value, it will naturally be propagated to fill the column:

In [79]: df["foo"] = "bar"

In [80]: df
Out[80]: 
   one   flag  foo
a  1.0  False  bar
b  2.0  False  bar
c  3.0   True  bar
d  NaN  False  bar

When inserting a Series that does not have the same index as the DataFrame, it will be conformed to the DataFrame’s index:

In [81]: df["one_trunc"] = df["one"][:2]

In [82]: df
Out[82]: 
   one   flag  foo  one_trunc
a  1.0  False  bar        1.0
b  2.0  False  bar        2.0
c  3.0   True  bar        NaN
d  NaN  False  bar        NaN

You can insert raw ndarrays but their length must match the length of the DataFrame’s index.

By default, columns get inserted at the end. DataFrame.insert() inserts at a particular location in the columns:

In [83]: df.insert(1, "bar", df["one"])

In [84]: df
Out[84]: 
   one  bar   flag  foo  one_trunc
a  1.0  1.0  False  bar        1.0
b  2.0  2.0  False  bar        2.0
c  3.0  3.0   True  bar        NaN
d  NaN  NaN  False  bar        NaN

 

posted on 2024-04-30 13:09  McDelfino  阅读(17)  评论(0)    收藏  举报