Cross-round harmonisation

Comparing indicators across multiple survey rounds is essential for longitudinal policy evaluation. However, structural differences across the survey instruments pose a major barrier. This vignette walks through how ihsMW streamlines cross-round harmonisation and analysis.

1. The Harmonisation Challenge

Over the years, the Malawi Integrated Household Survey (IHS) questionnaires have evolved. Variables are added, retired, renamed, or relocated to different modules.

For instance, the household size indicator is named hhsize in some files, hh_size in others, or is represented by variables counting household members. Similarly, nominal consumption expenditure variables and agricultural crop names frequently change, making direct cross-round comparisons error-prone and tedious.

2. The Crosswalk

To resolve this, ihsMW bundles a static crosswalk database containing mappings for over 5,800 variables across IHS2, IHS3, IHS4, and IHS5. It acts as a translation layer, mapping round-specific variable names to consistent, harmonised names.

You can inspect the crosswalk programmatically:

library(ihsMW)

# Read the crosswalk
crosswalk <- read.csv(system.file("extdata", "ihs_crosswalk.csv", package = "ihsMW"))
head(crosswalk)

3. Basic Harmonisation

Use ihs_harmonise() to standardise column names in raw dataframes:

library(haven)

# Load IHS4 raw household module
ihs4_raw <- read_dta("path/to/IHS4/hh_mod_a_filt.dta")

# Harmonise to standard names
ihs4_clean <- ihs_harmonise(ihs4_raw, round = "IHS4")

4. Multi-round Panel Work

To compile a cross-round dataset for longitudinal analysis, load the data from each round, harmonise them separately, and bind them together:

library(dplyr)

# Load and harmonise IHS4
ihs4_raw <- read_dta("path/to/IHS4/hh_mod_a_filt.dta")
ihs4_harm <- ihs_harmonise(ihs4_raw, round = "IHS4")

# Load and harmonise IHS5
ihs5_raw <- read_dta("path/to/IHS5/hh_mod_a_filt.dta")
ihs5_harm <- ihs_harmonise(ihs5_raw, round = "IHS5")

# Bind rows - ihs_harmonise adds an `ihs_round` column automatically
pooled_data <- bind_rows(ihs4_harm, ihs5_harm)

5. Merging Modules

Within a single survey round, information is split across multiple modules (e.g., household demographics, agriculture, food consumption). Use ihs_merge() to merge these dataframes:

# Load household demographics and crop harvest modules
hh_demog <- read_dta("path/to/IHS5/hh_mod_a_filt.dta") |> ihs_harmonise("IHS5")
hh_agri <- read_dta("path/to/IHS5/ag_mod_i.dta") |> ihs_harmonise("IHS5")

# Merge modules - automatically detects common ID columns (e.g., case_id)
merged_data <- ihs_merge(hh_demog, hh_agri)

ihs_merge() checks if the join type results in unexpected row expansion and issues a warning if many-to-many joins occur.

6. Price Deflation

When comparing monetary values (e.g., household consumption, crop sales) across different years, nominal values must be deflated to account for inflation. ihs_deflate() uses bundled Malawi Consumer Price Index (CPI) data to convert nominal values to real values, with 2019 (IHS5 baseline) as the default reference year:

# Deflate expenditure variables to 2019 real prices
real_data <- ihs_deflate(
  data = pooled_data,
  value_cols = c("consumption_nominal", "food_exp_nominal")
)
# This creates new columns: `consumption_nominal_real` and `food_exp_nominal_real`

7. Quality Checks

To verify that the crosswalk mappings are valid and check how many variables are successfully mapped, use ihs_crosswalk_check():

ihs_crosswalk_check()

Additionally, ihs_panel_ids() returns the standard ID columns (e.g. household ID, individual ID, enumeration area ID, strata, weights) for a given round to help you construct panel keys or verify design structures:

# Get standard ID columns for IHS5
ihs_panel_ids("IHS5")