pydantic resolve解決巢狀資料結構生成痛點分析

2023-04-08 06:00:48

案例

以論壇為例，有個介面返回貼文(posts)資訊，然後呢，來了新需求，說需要顯示貼文的 author 資訊。

此時會有兩種選擇：

在 posts 的 query 中 join 查詢 author 資訊，在返回 post 中新增諸如 author_id, author_name 之類的欄位。

{'post': 'v2ex', 'author_name': 'tangkikodo'}

根據 posts 的 ids ，單獨查詢 author 列表，然後把 author 物件迴圈新增到 post 物件中。

{'post':'v2ex', 'author': {'name': 'tangkikod'}}

方法 1 中，需要去修改 query, 還需要修改post的schema. 如果未來要加新欄位，例如使用者頭像的話，會需要修改兩處。

方法 2 需要手動做一次拼接。之後增減欄位都是在 author 物件的範圍內修改。

所以相對來說, 方法 2 在未來的可維護性會比較好。用巢狀物件的方式可以更好的擴充套件和維護。

方法2 的返回結構

[
  {
    "id": 1,
    "post": "v2ex",
    "author": {
      "name": "tangkikodo",
      "id": 1
    }
  },
  {
    "id": 2,
    "post": "v3ex",
    "author": {
      "name": "tangkikodo2",
      "id": 1
    }
  }
]

然而需求總是會變化，突然來了一個新的且奇怪的需求，要在 author 資訊中新增資料，顯示他最近瀏覽過的貼文。返回體變成了：

[
  {
    "id": 1,
    "post": "v2ex",
    "author": {
      "name": "tangkikodo",
      "recent_views": [
        {
          "id": 2,
          "post": "v3ex"
        },
        {
          "id": 3,
          "post": "v4ex"
        }
      ]
    }
  }
]

那這個時候該怎麼弄呢？血壓是不是有點上來了。

根據之前的方法 2, 通常的操作是在獲取到authors資訊後, 關聯查詢author的recent_posts, 拼接回authors, 再將 authors 拼接回posts。流程類似層層查詢再層層回拼。虛擬碼類似：

# posts query
posts = query_all_posts()
# authors query
authors_ids = fetch_unique_author_id(posts)  
authors = query_author(author_ids)
recent_view_posts = fetch_recent_review_posts(author_ids)  # 新需求
recent_view_maps = calc_view_mapping(recent_view_posts)    # 新需求
# authors attach
authors = [attach_posts(a, recent_view_maps) for a in authors]
author_map = calc_author_mapping(authors)
# posts attach
posts = [attach_author(p, author_map) for p in posts]

莫名的會聯想到callback hell, 新增新的層級都會在程式碼中間部分切入。

反正想想就挺麻煩的對吧。要是哪天再巢狀一層呢? 程式碼改起來有點費勁, 如果你此時血壓有點高，那請繼續往下看。

那，有別的辦法麼？這裡有個小輪子也許能幫忙。

解決方法

祭出一個小輪子： allmonday/pydantic-resolve

以剛才的例子，要做的事情抽象成兩部分:

定義 dataloader ，負責查詢和group資料。前半部分是從資料庫查詢，後半部分是將資料轉成 pydantic 物件後返回。虛擬碼，看個大概意思就好。

class AuthorLoader(DataLoader):
    async def batch_load_fn(self, author_ids):
        async with async_session() as session:
            # query authors
            res = await session.execute(select(Author).where(Author.id.in_(author_ids)))
            rows = res.scalars().all()
            # transform into pydantic object
            dct = defaultdict(dict)
            for row in rows:
                dct[row.author_id] = AuthorSchema.from_orm(row)
            # order by author_id
            return [dct.get(k, None) for k in author_ids]
class RecentViewPostLoader(DataLoader):
    async def batch_load_fn(self, view_ids):
        async with async_session() as session:
            res = await session.execute(select(Post, PostVisit.visitor_id)  # join 瀏覽中間表
                .join(PostVist, PostVisit.post_id == Post.id)
                .where(PostVisit.user_id.in_(view_ids)
                .where(PostVisit.created_at &lt; some_timestamp)))
            rows = res.scalars().all()
            dct = defaultdict(list)
            for row in rows:
                dct[row.visitor_id].append(PostSchema.from_orm(row))  # group 到 visitor
            return [dct.get(k, []) for k in view_ids]

定義 schema, 並且注入依賴的 DataLoaders, LoaderDepend 會管理好loader 的非同步上下文快取。

class RecentPostSchema(BaseModel):
    id: int
    name: str
    class Config:
        orm_mode = True
class AuthorSchema(BaseModel):
    id: int
    name: str
    img_url: str
    recent_views: Tuple[RecentPostSchema, ...] = tuple()
    def resolve_recent_views(self, loader=LoaderDepend(RecentViewPostLoader)):  
        return loader.load(self.id)
    class Config:
        orm_mode = True
class PostSchema(BaseModel):
    id: int
    author_id: int
    name: str
    author: Optional[AuthorSchema] = None
    def resolve_author(self, loader=LoaderDepend(AuthorLoader)):
         return loader.load(self.author_id)
    class Config:
        orm_mode = True

然後呢？

然後就沒有了，接下來只要做個 post 的查詢, 再簡單地...resolve 一下，任務就完成了。

posts = (await session.execute(select(Post))).scalars().all()
posts = [PostSchema.from_orm(p) for p in posts]
results = await Resolver().resolve(posts)

在拆分了 loader 和 schema 之後，對資料地任意操作都很簡單，新增任意新的schema 都不會破壞原有的程式碼。

完整的案例可以檢視 6_sqlalchemy_loaderdepend_global_filter.py

如果之前使用過aiodataloader 的話會知道，開發需要手動維護loader在每個request 中的初始化過程，但在 pydantic-resolve 中你完全不用操心非同步上下文的建立，不用維護DataLoader的範例化, 一切都在pydantic-resolve的管理之中。

就完事了。如果必須說有啥缺點的話。。必須用 async await 可能算一個。

該專案已經在我司的生產環境中使用，並且保持了100%的測試覆蓋率。歡迎大家嚐鮮體驗，如果遇到問題歡迎發issue，我會盡快修復。

以上就是pydantic resolve解決巢狀資料結構生成痛點分析的詳細內容，更多關於pydantic resolve巢狀資料結構的資料請關注it145.com其它相關文章！

pydantic resolve解決巢狀資料結構生成痛點分析

目錄

案例

解決方法

熱門文章