python - data integrator
class -- 普通数据聚合
https://python.land/objects-and-classes#What_is_a_Python_object
类可以用来管理聚合数据,但是本身类病是不是为数据管理设计。
class Car: speed = 0 started = False def start(self): self.started = True print("Car started, let's ride!") def increase_speed(self, delta): if self.started: self.speed = self.speed + delta print('Vrooooom!') else: print("You need to start the car first") def stop(self): self.speed = 0 print('Halting')
命名元组(namedtuple) --- 简单数据聚合
https://www.runoob.com/note/25726
更加简略的数据聚合方式。
from collections import namedtuple # 定义一个namedtuple类型User,并包含name,sex和age属性。 User = namedtuple('User', ['name', 'sex', 'age']) # 创建一个User对象 user = User(name='Runoob', sex='male', age=12) # 获取所有字段名 print( user._fields ) # 也可以通过一个list来创建一个User对象,这里注意需要使用"_make"方法 user = User._make(['Runoob', 'male', 12]) print( user ) # User(name='user1', sex='male', age=12) # 获取用户的属性 print( user.name ) print( user.sex ) print( user.age ) # 修改对象属性,注意要使用"_replace"方法 user = user._replace(age=22) print( user ) # User(name='user1', sex='male', age=21) # 将User对象转换成字典,注意要使用"_asdict" print( user._asdict() ) # OrderedDict([('name', 'Runoob'), ('sex', 'male'), ('age', 22)])
dataclass -- 专用的功能强大的数据聚合工具
https://blog.jetbrains.com/pycharm/2018/04/python-37-introducing-data-class/
类方式的数据聚合,初始接口中定义,接口内复制,统一参数名称出现三次,较为繁琐。
class StarWarsMovie: def __init__(self, title: str, episode_id: int, opening_crawl: str, director: str, producer: str, release_date: datetime, characters: List[str], planets: List[str], starships: List[str], vehicles: List[str], species: List[str], created: datetime, edited: datetime, url: str ): self.title = title self.episode_id = episode_id self.opening_crawl= opening_crawl self.director = director self.producer = producer self.release_date = release_date self.characters = characters self.planets = planets self.starships = starships self.vehicles = vehicles self.species = species self.created = created self.edited = edited self.url = url if type(self.release_date) is str: self.release_date = dateutil.parser.parse(self.release_date) if type(self.created) is str: self.created = dateutil.parser.parse(self.created) if type(self.edited) is str: self.edited = dateutil.parser.parse(self.edited)
python3.7引入dataclass
参数名称只出现一次。
@dataclass class StarWarsMovie: title: str episode_id: int opening_crawl: str director: str producer: str release_date: datetime characters: List[str] planets: List[str] starships: List[str] vehicles: List[str] species: List[str] created: datetime edited: datetime url: str
tutorial:
https://realpython.com/python-data-classes/
https://www.cnblogs.com/apocelipes/p/10284346.html
https://python.land/python-data-classes
嵌套数据支持
https://github.com/MerleLiuKun/my-python/blob/master/sundries/dataclass/demo_with_dataclasses_json.py
""" 使用 dataclasses_json 的 demo """ from dataclasses import dataclass, field from typing import List from dataclasses_json import DataClassJsonMixin @dataclass class Base(DataClassJsonMixin): pass @dataclass class Cover(Base): cover_id: str = field(repr=False, ) offset_x: str = None offset_y: str = None source: str = None id: str = None @dataclass class Point(Base): x: int = None y: int = None @dataclass class Page(Base): id: str = None about: str = field(default=None, repr=False) birthday: str = field(default=None, repr=False) name: str = None username: str = None fan_count: int = field(default=None, repr=False) cover: Cover = field(default=None, repr=False) point_list: List[Point] = field(default=None, repr=False) if __name__ == '__main__': data = { "id": "20531316728", "about": "The Facebook Page celebrates how our friends inspire us, support us, and help us discover the world when we connect.", "birthday": "02/04/2004", "name": "Facebook", "username": "facebookapp", "fan_count": 214643503, "cover": { "cover_id": "10158913960541729", "offset_x": 50, "offset_y": 50, "source": "https://scontent.xx.fbcdn.net/v/t1.0-9/s720x720/73087560_10158913960546729_8876113648821469184_o.jpg?_nc_cat=1&_nc_ohc=bAJ1yh0abN4AQkSOGhMpytya2quC_uS0j0BF-XEVlRlgwTfzkL_F0fojQ&_nc_ht=scontent.xx&oh=2964a1a64b6b474e64b06bdb568684da&oe=5E454425", "id": "10158913960541729" }, "point_list": [ {"x": 1, "y": 2}, {"x": 3, "y": 4}, ] } p = Page.from_dict(data) print(p) print(p.cover) print(p.point_list) print(p.to_dict()) print(p.to_json())
第三方库
attrs
https://www.attrs.org/en/stable/
成熟灵活。 NASA项目在用。
attrs
is the Python package that will bring back the joy of writing classes by relieving you from the drudgery of implementing object protocols (aka dunder methods). Trusted by NASA for Mars missions since 2020!Its main goal is to help you to write concise and correct software without slowing down your code.
@define class Point: x: float y: float @classmethod def from_row(cls, row): return cls(row.x, row.y) pt = Point.from_row(row)
tutorial:
https://python.land/python-attrs#Python_attrs_converter_example
pydantic
https://pydantic-docs.helpmanual.io/
特色在运行时校验。使用范围广泛。成熟稳定。
Data validation and settings management using python type annotations.
pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid.
Define how data should be in pure, canonical python; validate it with pydantic.
from datetime import datetime from typing import List, Optional from pydantic import BaseModel class User(BaseModel): id: int name = 'John Doe' signup_ts: Optional[datetime] = None friends: List[int] = [] external_data = { 'id': '123', 'signup_ts': '2019-06-01 12:22', 'friends': [1, 2, '3'], } user = User(**external_data) print(user.id) #> 123 print(repr(user.signup_ts)) #> datetime.datetime(2019, 6, 1, 12, 22) print(user.friends) #> [1, 2, 3] print(user.dict()) """ { 'id': 123, 'signup_ts': datetime.datetime(2019, 6, 1, 12, 22), 'friends': [1, 2, 3], 'name': 'John Doe', } """
dataclass vs attrs vs pydantic
https://stefan.sofa-rockers.org/2020/05/29/attrs-dataclasses-pydantic/
https://jackmckew.dev/dataclasses-vs-attrs-vs-pydantic.html
https://mpkocher.github.io/2019/05/22/Dataclasses-in-Python-3-7/
目前来看 attrs 和 pydantic 功能更加强大。
dataclass -- 不需要装第三方库
attrs -- 使用了cpython,性能高
pydantic -- 校验功能好。
Initially, I was a intrigued by the addition of
dataclasses
to the standard library. However, after a deeper dive into thedataclasses
, it’s not clear to me that these are particularly useful for Python developers. I believe third-party solutions such asattrs
orpydantic
might be a better fit due to their validation hooks and richer feature sets. It will be interesting to see the adoption ofdataclasses
by both the Python core as well as third-party developers.