This page explores metric distributions before any transformations. Use this to inform how we might want to deal with outliers or normalize data at the metric level.
1 Distributions
Highly skewed distributions might be good candidates for Box-Cox transformations or Winsorization. The figure below shows metrics with a skew > 2 in red, while those with a skew < 2 are in blue.
Code
pacman::p_load( ggplot2, purrr, ggpubr, psych, tibble)# Load metrics_df (created in refined_framework page)metrics_df <-readRDS('data/metrics_df.rds')# Get skews of variablesskewed <- psych::describe(metrics_df[, -1]) %>%as.data.frame() %>%rownames_to_column('variable_name') %>% dplyr::select(variable_name, skew) %>% dplyr::filter(abs(skew) >2) %>%pull(variable_name)plots <-map(names(metrics_df)[-1], \(var){# color based on skewnessif (var %in% skewed) { fill <-'red' color <-'darkred' } else { fill <-'lightblue' color <-'royalblue' }# Make plot for variable metrics_df %>%ggplot(aes(x =!!sym(var))) +geom_density(fill = fill,color = color,alpha =0.5 ) +theme_classic() +theme(plot.margin =unit(c(rep(0.5, 4)), 'cm'))}) # Arrange them in 4 columnsggarrange(plotlist = plots,ncol =4,nrow =33)
Distributions of metrics at the state level.
It seems most of our metrics fall along respectable somewhat-normal distributions. 29 metrics are skewed out of 130 total. They include several variables related to local farm economies (agrotourism sales as a percentage of total sales, direct to consumer sales as a percentage of total sales, and value added sales as a percentage of total sales), as well as a couple of the TreeMap 2016 variables (dead standing carbon and live trees) and GHG emissions from agriculture (CH4 and CO2, with an honorable mention for N2O). Just about the whole collection of ERS metrics are skewed, including imports and exports, indemnities, and real estate expenses.
There might be a case for transformations here, or it might make more sense to do it at the indicator level. Another option is to weight our metrics to take into account population, number of farms, acres of farmland, GDP, or some other appropriate variable for each metric. Béné et al. (2019) used Box Cox transformations for highly skewed indicators before normalizing all indicators with Min Max transformations.
Béné, Christophe, Steven D. Prager, Harold A. E. Achicanoy, Patricia Alvarez Toro, Lea Lamotte, Camila Bonilla, and Brendan R. Mapes. 2019. “Global Map and Indicators of Food System Sustainability.”Scientific Data 6 (1): 279. https://doi.org/10.1038/s41597-019-0301-5.