• 博客园logo
  • 会员
  • 众包
  • 新闻
  • 博问
  • 闪存
  • 赞助商
  • HarmonyOS
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录

nunca

但行好事 莫问前程
  • 博客园
  • 联系
  • 订阅
  • 管理

公告

View Post

pandas.factorize()

pandas官网  http://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html

pandas.factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None)

Encode the object as an enumerated type or categorical variable.

作用是将object型变量转换成枚举型或者类别型

Prameters:

values : sequence

A 1-D seqeunce. Sequences that aren’t pandas objects are coereced to ndarrays before factorization.

sort : bool, default False

Sort uniques and shuffle labels to maintain the relationship.

order

Deprecated since version 0.23.0: This parameter has no effect and is deprecated.

na_sentinel : int, default -1

Value to mark “not found”.

size_hint : int, optional

Hint to the hashtable sizer.

Returns:

labels : ndarray

An integer ndarray that’s an indexer into uniques. uniques.take(labels) will have the same values as values.

uniques : ndarray, Index, or Categorical

The unique valid values. When values is Categorical, uniques is a Categorical. When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.

Note:Even if there’s a missing value in values, uniques will not contain an entry for it.

Example

1、 pd.factorize(values)

>>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> labels
array([0, 0, 1, 2, 0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)

2、 pd.factorize(values, sort = True)

>>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> labels
array([1, 1, 0, 2, 1])
>>> uniques
array(['a', 'b', 'c'], dtype=object)

3、Missing values are indicated in labels with na_sentinel (-1 by default). Note that missing values are never included in uniques.

>>> labels, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> labels
array([ 0, -1,  1,  2,  0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)

4、Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays). When factorizing pandas objects, the type of uniques will differ. For Categoricals, a Categorical is returned.

>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> labels, uniques = pd.factorize(cat)
>>> labels
array([0, 0, 1])
>>> uniques
[a, c]
Categories (3, object): [a, b, c]
Notice that 'b' is in uniques.categories, desipite not being present in cat.values.

5、For all other pandas objects, an Index of the appropriate type is returned.

>>> cat = pd.Series(['a', 'a', 'c'])
>>> labels, uniques = pd.factorize(cat)
>>> labels
array([0, 0, 1])
>>> uniques
Index(['a', 'c'], dtype='object')

 

既然无论如何时间都会过去,为什么不选择做些有意义的事情呢

posted on 2018-05-28 09:13  乐晓东随笔  阅读(884)  评论(0)    收藏  举报

刷新页面返回顶部
 
博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3