The ihsMW package is a dedicated toolkit designed to
clean, harmonise, and aggregate household survey data from the Malawi
Integrated Household Survey (IHS) series. It is built to support
researchers and analysts working with the IHS2 (2004/05), IHS3
(2010/11), IHS4 (2016/17), and IHS5 (2019/20) datasets.
1. Installation
You can install the stable release of ihsMW from
CRAN:
install.packages("ihsMW")Or install the development version from GitHub:
# Install using pak
pak::pak("vituk123/ihsMW")
# Or using remotes
remotes::install_github("vituk123/ihsMW")2. The IHS Landscape
The Malawi National Statistical Office (NSO) conducts the Integrated Household Survey (IHS) periodically to track poverty, household expenditure, agriculture, and other socio-economic indicators. The primary rounds include:
- IHS2: 2004–2005
- IHS3: 2010–2011
- IHS4: 2016–2017
- IHS5: 2019–2020
Due to licensing restrictions, the raw microdata cannot be
redistributed directly within R packages. Researchers must first
register and manually download the survey data in Stata
(.dta) format from the World Bank Microdata
Library.
Once downloaded, place the files in a structured folder hierarchy on your local machine.
3. Loading and Harmonising
Each round of the IHS uses different variable names for the same
question. For example, household size is recorded under different column
names depending on the round. ihsMW uses a comprehensive
crosswalk to harmonise these variable names.
To load and harmonise a raw survey file:
4. Searching Variables
To find variables mapped in the crosswalk, use
ihs_search(). You can search by keywords or labels:
# Search for consumption-related variables
ihs_search("consumption")
# Search for age within a specific round
ihs_search("age", round = "IHS5")To view a summary of the crosswalk coverage and flag variables
needing review, use ihs_crosswalk_check():
5. Data Cleaning
ihsMW provides tools to clean standard survey anomalies,
handle missing value codes, and winsorize extreme values:
# Convert standard survey missing codes (-99, -98, etc.) to NA
df_clean <- ihs_standardize_missing(harmonised_data)
# Winsorize outliers (e.g. food expenditure) stratified by urban/rural
df_winsor <- ihs_winsorize(df_clean, value_col = "food_exp", strata_col = "urban")
# Run the master cleaning wrapper which applies both steps and logs changes
df_cleaned <- ihs_clean(
data = harmonised_data,
missing_cols = c("food_exp", "nonfood_exp"),
winsorize_cols = "food_exp",
strata_col = "urban"
)6. Unit Conversion
Agricultural modules in the IHS allow households to report harvest
quantities in non-standard units (e.g., pails, basins, ox-carts, bags)
rather than standard kilograms. ihsMW bundles official NSO
conversion factors to convert these quantities to standard
kilograms:
# Convert quantities reported in non-standard units to kilograms
crop_data <- data.frame(
crop_code = c(1, 2),
unit_code = c(3, 4),
quantity = c(10, 5),
region = c(1, 2)
)
crop_data_kg <- ihs_convert_units(
data = crop_data,
crop_col = "crop_code",
unit_col = "unit_code",
qty_col = "quantity",
region_col = "region"
)7. Aggregation
To aggregate member-level or agricultural plot-level data up to the
household level, use ihs_aggregate():
# Aggregate individual-level education to household level
hh_edu <- ihs_aggregate(
data = member_data,
id_cols = "case_id",
val_cols = c("years_education", "completed_primary")
)