Crosswalk Complexity in Classification Bridges
Context
When classification systems change, whether from SIC to NAICS, ICD-9 to ICD-10, or any taxonomy migration, historical data must be bridged to enable longitudinal analysis.
Problem
The relationship between old and new classification codes is rarely one-to-one. Codes split, merge, and shift in meaning. Naive crosswalks create false precision or lose important distinctions.
The Mapping Patterns
One-to-one: Rare. The old category maps cleanly to exactly one new category. Simple but uncommon.
One-to-many (splits): A single old code divides into multiple new codes. Example: one SIC manufacturing code becomes three NAICS codes distinguishing different production methods.
Many-to-one (merges): Multiple old codes combine into a single new category. Historical granularity is lost.
Many-to-many: Most complex. Old codes split and merge simultaneously. No clean mapping exists.
Solution Approaches
Unweighted crosswalks: List all possible mappings without preference. Useful for reference but requires analyst judgment for each use case.
Weighted crosswalks: Assign probability weights based on empirical data (e.g., what proportion of establishments with old code X actually have new code Y). More actionable but weights may not transfer across contexts.
Harmonized categories: Create a coarser intermediate classification that both systems can map to unambiguously. Sacrifices granularity for consistency.
Time-split analysis: Analyze pre-transition and post-transition periods separately. Acknowledge the break in series rather than creating false continuity.
Consequences
Without careful bridging: Time series appear to show dramatic shifts that are actually classification artifacts. Researchers draw false conclusions about economic change.
With proper bridging: Longitudinal analysis remains valid but with acknowledged uncertainty bands around transition periods.
Related: [None yet]