Comparing indicators across multiple survey rounds is essential for
longitudinal policy evaluation. However, structural differences across
the survey instruments pose a major barrier. This vignette walks through
how ihsMW streamlines cross-round harmonisation and
analysis.
1. The Harmonisation Challenge
Over the years, the Malawi Integrated Household Survey (IHS) questionnaires have evolved. Variables are added, retired, renamed, or relocated to different modules.
For instance, the household size indicator is named
hhsize in some files, hh_size in others, or is
represented by variables counting household members. Similarly, nominal
consumption expenditure variables and agricultural crop names frequently
change, making direct cross-round comparisons error-prone and
tedious.
2. The Crosswalk
To resolve this, ihsMW bundles a static crosswalk
database containing mappings for over 5,800 variables across IHS2, IHS3,
IHS4, and IHS5. It acts as a translation layer, mapping round-specific
variable names to consistent, harmonised names.
You can inspect the crosswalk programmatically:
library(ihsMW)
# Read the crosswalk
crosswalk <- read.csv(system.file("extdata", "ihs_crosswalk.csv", package = "ihsMW"))
head(crosswalk)3. Basic Harmonisation
Use ihs_harmonise() to standardise column names in raw
dataframes:
library(haven)
# Load IHS4 raw household module
ihs4_raw <- read_dta("path/to/IHS4/hh_mod_a_filt.dta")
# Harmonise to standard names
ihs4_clean <- ihs_harmonise(ihs4_raw, round = "IHS4")4. Multi-round Panel Work
To compile a cross-round dataset for longitudinal analysis, load the data from each round, harmonise them separately, and bind them together:
library(dplyr)
# Load and harmonise IHS4
ihs4_raw <- read_dta("path/to/IHS4/hh_mod_a_filt.dta")
ihs4_harm <- ihs_harmonise(ihs4_raw, round = "IHS4")
# Load and harmonise IHS5
ihs5_raw <- read_dta("path/to/IHS5/hh_mod_a_filt.dta")
ihs5_harm <- ihs_harmonise(ihs5_raw, round = "IHS5")
# Bind rows - ihs_harmonise adds an `ihs_round` column automatically
pooled_data <- bind_rows(ihs4_harm, ihs5_harm)5. Merging Modules
Within a single survey round, information is split across multiple
modules (e.g., household demographics, agriculture, food consumption).
Use ihs_merge() to merge these dataframes:
# Load household demographics and crop harvest modules
hh_demog <- read_dta("path/to/IHS5/hh_mod_a_filt.dta") |> ihs_harmonise("IHS5")
hh_agri <- read_dta("path/to/IHS5/ag_mod_i.dta") |> ihs_harmonise("IHS5")
# Merge modules - automatically detects common ID columns (e.g., case_id)
merged_data <- ihs_merge(hh_demog, hh_agri)ihs_merge() checks if the join type results in
unexpected row expansion and issues a warning if many-to-many joins
occur.
6. Price Deflation
When comparing monetary values (e.g., household consumption, crop
sales) across different years, nominal values must be deflated to
account for inflation. ihs_deflate() uses bundled Malawi
Consumer Price Index (CPI) data to convert nominal values to real
values, with 2019 (IHS5 baseline) as the default reference year:
# Deflate expenditure variables to 2019 real prices
real_data <- ihs_deflate(
data = pooled_data,
value_cols = c("consumption_nominal", "food_exp_nominal")
)
# This creates new columns: `consumption_nominal_real` and `food_exp_nominal_real`7. Quality Checks
To verify that the crosswalk mappings are valid and check how many
variables are successfully mapped, use
ihs_crosswalk_check():
Additionally, ihs_panel_ids() returns the standard ID
columns (e.g. household ID, individual ID, enumeration area ID, strata,
weights) for a given round to help you construct panel keys or verify
design structures:
# Get standard ID columns for IHS5
ihs_panel_ids("IHS5")