Stay Hungry,Stay Foolish!

python - data integrator

class -- 普通数据聚合

https://python.land/objects-and-classes#What_is_a_Python_object

类可以用来管理聚合数据,但是本身类病是不是为数据管理设计。

 

class Car:
    speed = 0
    started = False
    def start(self):
        self.started = True
        print("Car started, let's ride!")
    def increase_speed(self, delta):
        if self.started:
            self.speed = self.speed + delta
            print('Vrooooom!')
        else:
            print("You need to start the car first")
    def stop(self):
        self.speed = 0
        print('Halting')

 

命名元组(namedtuple) --- 简单数据聚合

https://www.runoob.com/note/25726

更加简略的数据聚合方式。

 

from collections import namedtuple

# 定义一个namedtuple类型User,并包含name,sex和age属性。
User = namedtuple('User', ['name', 'sex', 'age'])

# 创建一个User对象
user = User(name='Runoob', sex='male', age=12)

# 获取所有字段名
print( user._fields )

# 也可以通过一个list来创建一个User对象,这里注意需要使用"_make"方法
user = User._make(['Runoob', 'male', 12])

print( user )
# User(name='user1', sex='male', age=12)

# 获取用户的属性
print( user.name )
print( user.sex )
print( user.age )

# 修改对象属性,注意要使用"_replace"方法
user = user._replace(age=22)
print( user )
# User(name='user1', sex='male', age=21)

# 将User对象转换成字典,注意要使用"_asdict"
print( user._asdict() )
# OrderedDict([('name', 'Runoob'), ('sex', 'male'), ('age', 22)])

 

dataclass -- 专用的功能强大的数据聚合工具

https://blog.jetbrains.com/pycharm/2018/04/python-37-introducing-data-class/

 

类方式的数据聚合,初始接口中定义,接口内复制,统一参数名称出现三次,较为繁琐。

class StarWarsMovie:

   def __init__(self,
                title: str,
                episode_id: int,
                opening_crawl: str,
                director: str,
                producer: str,
                release_date: datetime,
                characters: List[str],
                planets: List[str],
                starships: List[str],
                vehicles: List[str],
                species: List[str],
                created: datetime,
                edited: datetime,
                url: str
                ):

       self.title = title
       self.episode_id = episode_id
       self.opening_crawl= opening_crawl
       self.director = director
       self.producer = producer
       self.release_date = release_date
       self.characters = characters
       self.planets = planets
       self.starships = starships
       self.vehicles = vehicles
       self.species = species
       self.created = created
       self.edited = edited
       self.url = url

       if type(self.release_date) is str:
           self.release_date = dateutil.parser.parse(self.release_date)

       if type(self.created) is str:
           self.created = dateutil.parser.parse(self.created)

       if type(self.edited) is str:
           self.edited = dateutil.parser.parse(self.edited)

 

python3.7引入dataclass

参数名称只出现一次。

@dataclass
class StarWarsMovie:
   title: str
   episode_id: int
   opening_crawl: str
   director: str
   producer: str
   release_date: datetime
   characters: List[str]
   planets: List[str]
   starships: List[str]
   vehicles: List[str]
   species: List[str]
   created: datetime
   edited: datetime
   url: str

 

tutorial:

https://realpython.com/python-data-classes/

https://www.cnblogs.com/apocelipes/p/10284346.html

https://python.land/python-data-classes

 

嵌套数据支持

https://github.com/MerleLiuKun/my-python/blob/master/sundries/dataclass/demo_with_dataclasses_json.py

"""
    使用 dataclasses_json 的 demo
"""
from dataclasses import dataclass, field
from typing import List

from dataclasses_json import DataClassJsonMixin


@dataclass
class Base(DataClassJsonMixin):
    pass


@dataclass
class Cover(Base):
    cover_id: str = field(repr=False, )
    offset_x: str = None
    offset_y: str = None
    source: str = None
    id: str = None


@dataclass
class Point(Base):
    x: int = None
    y: int = None


@dataclass
class Page(Base):
    id: str = None
    about: str = field(default=None, repr=False)
    birthday: str = field(default=None, repr=False)
    name: str = None
    username: str = None
    fan_count: int = field(default=None, repr=False)
    cover: Cover = field(default=None, repr=False)
    point_list: List[Point] = field(default=None, repr=False)


if __name__ == '__main__':
    data = {
        "id": "20531316728",
        "about": "The Facebook Page celebrates how our friends inspire us, support us, and help us discover the world when we connect.",
        "birthday": "02/04/2004",
        "name": "Facebook",
        "username": "facebookapp",
        "fan_count": 214643503,
        "cover": {
            "cover_id": "10158913960541729",
            "offset_x": 50,
            "offset_y": 50,
            "source": "https://scontent.xx.fbcdn.net/v/t1.0-9/s720x720/73087560_10158913960546729_8876113648821469184_o.jpg?_nc_cat=1&_nc_ohc=bAJ1yh0abN4AQkSOGhMpytya2quC_uS0j0BF-XEVlRlgwTfzkL_F0fojQ&_nc_ht=scontent.xx&oh=2964a1a64b6b474e64b06bdb568684da&oe=5E454425",
            "id": "10158913960541729"
        },
        "point_list": [
            {"x": 1, "y": 2},
            {"x": 3, "y": 4},
        ]
    }
    p = Page.from_dict(data)
    print(p)
    print(p.cover)
    print(p.point_list)

    print(p.to_dict())
    print(p.to_json())

 

 

第三方库

attrs

https://www.attrs.org/en/stable/

成熟灵活。 NASA项目在用。

attrs is the Python package that will bring back the joy of writing classes by relieving you from the drudgery of implementing object protocols (aka dunder methods). Trusted by NASA for Mars missions since 2020!

Its main goal is to help you to write concise and correct software without slowing down your code.

 

@define
class Point:
    x: float
    y: float

    @classmethod
    def from_row(cls, row):
        return cls(row.x, row.y)

pt = Point.from_row(row)

 

tutorial:

https://python.land/python-attrs#Python_attrs_converter_example

 

pydantic

https://pydantic-docs.helpmanual.io/

特色在运行时校验。使用范围广泛。成熟稳定。

Data validation and settings management using python type annotations.

pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid.

Define how data should be in pure, canonical python; validate it with pydantic.

 

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel


class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: Optional[datetime] = None
    friends: List[int] = []


external_data = {
    'id': '123',
    'signup_ts': '2019-06-01 12:22',
    'friends': [1, 2, '3'],
}
user = User(**external_data)
print(user.id)
#> 123
print(repr(user.signup_ts))
#> datetime.datetime(2019, 6, 1, 12, 22)
print(user.friends)
#> [1, 2, 3]
print(user.dict())
"""
{
    'id': 123,
    'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
    'friends': [1, 2, 3],
    'name': 'John Doe',
}
"""

 

 

dataclass vs attrs vs pydantic

https://stefan.sofa-rockers.org/2020/05/29/attrs-dataclasses-pydantic/

 

 

 

https://jackmckew.dev/dataclasses-vs-attrs-vs-pydantic.html

 

 

https://mpkocher.github.io/2019/05/22/Dataclasses-in-Python-3-7/

目前来看 attrs 和 pydantic 功能更加强大。

dataclass -- 不需要装第三方库

attrs -- 使用了cpython,性能高

pydantic -- 校验功能好。

Initially, I was a intrigued by the addition of dataclasses to the standard library. However, after a deeper dive into the dataclasses, it’s not clear to me that these are particularly useful for Python developers. I believe third-party solutions such as attrs or pydantic might be a better fit due to their validation hooks and richer feature sets. It will be interesting to see the adoption of dataclasses by both the Python core as well as third-party developers.

 

posted @ 2022-07-08 23:03  lightsong  阅读(88)  评论(0编辑  收藏  举报
Life Is Short, We Need Ship To Travel