# Data Model — Tian Mira BaZi Corpus

## Key Concepts

### Source Record
An individual birth data entry from an upstream dataset.
- **Astro-Databank C**: 3,604 records (Rodden Rating AA)
- **VedAstro 15k**: 15,790 records (Rodden Rating AA claimed)
- **Total source records**: 19,394

### Canonical Person/Profile
A deduplicated person identity in the unified corpus.
When two source records are confirmed to refer to the same person,
they produce a single canonical profile.
- **Total canonical profiles**: 18,255

### Expert Calculation Payload
A complete BaZi calculation result (JSONL) computed by Tian Mira advanced_v2.
Each canonical profile has one payload.
- **Total expert payloads**: 18,255

### Shared Canonical Calculation
When an Astro-Databank C record and a VedAstro record are confirmed
identical, the ADB C calculation serves as the canonical payload.
The VedAstro record references it without duplicating the payload.
- **Shared references**: 1,139

### Source-Specific Calculation Variant
When two source records for the same person have different birth inputs,
each may require its own calculation.
- **Current variants needed**: 0

### Cross-Source Link
A verified or unverified relationship between an ADB C record and a VedAstro record.
- **Confirmed**: 1,139
- **Non-merged (uncertain/false)**: 97

## Formulas

```
Source records:           3,604 + 15,790 = 19,394
Canonical profiles:       19,394 − 1,139  = 18,255
Expert payloads:          18,255
VedAstro references:      14,651 (distinct) + 1,139 (shared) = 15,790
```

## Important Distinctions

- **19,394 source records** ≠ 19,394 different people (1,139 are duplicates)
- **18,255 canonical profiles** ≠ 18,255 records from a single source
- **15,790 VedAstro source records** ≠ 15,790 independent expert files
  (1,139 share calculations with ADB C)
