# Integrated Vietnam Politicians Dataset Notes

Generated from:

- `raw/240911. VCP data by XIII Congress.xlsx`, sheet `Data`
- `output/spreadsheet/cc14.csv`

Outputs:

- `people.csv`: one row per person. Includes prior-study columns with `prior_` prefix, normalized identifiers, Central Committee term flags from VI to XIV, Politburo term flags from VI to XIV, and National Assembly term flags from XI to XV where inferable.
- `memberships.csv`: long-form membership table with one row per person-institution-term.
- `matches.csv`: audit table for matching the 14th Central Committee source to the prior-study dataset.

Matching logic:

- Names are normalized by lowercasing, removing Vietnamese diacritics, converting `đ` to `d`, and collapsing punctuation/spacing.
- A unique normalized-name match is accepted unless both sources have birth years and they conflict.
- Ambiguous normalized-name matches are accepted only when the current source birth year identifies exactly one prior-study record.
- Rejected or unmatched current-source records are kept as `current_14th_only` people, not dropped.

Important limitations:

- Prior-study Central Committee terms are expanded from `startcc` and `endcc`; `Incumbent` is treated as XIII because the source was last updated before the XIV Congress.
- Prior-study Politburo and National Assembly term flags are inferred from available start/end years, so boundary-year cases may need manual confirmation.
- The 14th source provides richer current profile text; previous-term alternate Central Committee status is parsed only where the profile text explicitly says `(dự khuyết)`.
