menu-iconMore mobile-close-icon
Skip Navigation
Skip Navigation
Browse Measures
Expert Commentaries
Measure Matrix
Selecting and Using Measures
Compare Measures
Frequently Asked Questions
Submit Measures
Visit the U.S. Department of Health and Human Services Measure Inventory.

November 3, 2008


Composite Measures: Matching the Method to the Purpose

By: Michael Shwartz, PhD and Arlene S. Ash, PhD

A composite score derived from a comprehensive set of individual health care quality indicators could be useful, especially if it reflected the extent to which management has created a quality culture, one with processes and behaviors that result in consistently-delivered high quality patient care. Such a score could assist third-party payers, consumers and others in comparing providers, and help policy makers in designing and evaluating programs that provide a continuum of high quality, cost-efficient care.

Traditional psychometric theory suggests that scales (in our case, a composite measure of quality) should be constructed from individual indicators that are highly correlated in order to increase validity of the scale's representation of the underlying construct. In the case of quality measures, however, individual indicators are not highly correlated. For example, in their analysis of the Hospital Quality Alliance process of care measures, Jha et al.(1) found that "a high quality of care for acute myocardial infarction closely predicted a high quality of care for congestive heart failure but not for pneumonia." Berlowitz et al.,(2) focusing on risk-adjusted rates of pressure ulcer development, functional decline, behavioral decline, and mortality in nursing homes, concluded that "nursing homes performing well on one outcome may perform poorly on other outcomes" and noted that "other studies examining nursing home outcomes have also found a poor correlation between different quality measures." Gandhi et al.,(3) after examining multiple quality measures in ambulatory care facilities, concluded "Sites that did well on one set of measures often did not do well on other measures. This was reflected in ... the nonsignificant correlations between scores for performance on HEDIS-like measures ... patient satisfaction and compliance with asthma and diabetes guidelines." The absence of strong correlation, particularly across quality domains such as patient satisfaction and adherence to evidence-based guidelines, lead some to caution against combining indicators into a single scale or composite score. However, the literature on the relationship between a construct and its indicators suggests that this conclusion may be misguided.

A construct, such as a quality culture, can be viewed as the cause of individual indicators, that is, the quality indicators (e.g., percent adherence to a process measure) reflect or manifest the extent to which the organization has achieved a quality culture. Alternatively, the construct can be viewed as formed from the indicators, usually by taking a weighted or unweighted average. The first type of relationship is called reflective and the second formative.(4)

These construct types can be distinguished in several ways.(5,6) A reflective construct exists independent of its indicators. Educational testing, in which a student's underlying ability in a certain area, such as math ability, is reflected in the answers to a series of test questions, is a classic example of a reflective construct. Other examples include personal traits like attitudes and personality that exist independent of the specific questions asked in order to obtain insight about them. In contrast, a formative scale is defined by its specific indicators. For example, a person's socio-economic status (SES) is measured by a combination of education, income, occupation and residence. If any one of these measures increases, so does SES. However, a person's SES does not exist independently of the indicators used to measure it. Another example is the Human Development Index (HDI), introduced by the United Nations Development Program. The HDI, which combines indicators of longevity, knowledge and standard of living, provides a broader perspective on a country's development than a simple economic measure such as per capita Gross Domestic Product. In assessing health care, quality might be considered as a reflective or formative construct. The literature on high performing health care organizations often implies a reflective construct, in the sense that an underlying quality culture (a reflective construct) is reflected in everything that the organization does; quality of care scales created by combining indicators like access, patient satisfaction, and adherence to clinical guidelines are best thought of as formative scales.

In a reflective scale, we expect the indicators to be positively correlated. As a result, indicators can be interchanged with others that reflect the same construct without changing the construct's meaning. Indeed, one often searches for a relatively small set of relevant indicators to adequately reflect the underlying construct without sacrificing validity. In a formative model, the indicators define the construct. As a result, one wants a sufficiently broad set of indicators to capture all domains of interest. Correlation is not a concern. However, whereas a reflective score is intended to measure a single, underlying reality, a formative score needs to be disaggregated into its components in order to understand the information it contains.

A reflective construct assumes that causality flows from the construct to the indicators, while in a formative model causality flows from the indicators to the construct. Thus, for example, people score well on various tests of strength because they are strong, but "they do not become wealthy or educated because they are of high socio-economic status."(7) Methodologically, factor analysis, which attempts to identify underlying constructs based on the nature of correlation among the variables, assumes a reflective model; principal components analysis, which combines indicators that are not necessarily correlated in a way to explain as much variation in the data as possible, assumes a formative model. Since reflective indicators are expected to have a high correlation, the reliability of composite measures can be assessed empirically with measures such as factor loadings, communalities, and Cronbach's alpha. There are no comparable criteria for assessing the reliability of formative constructs. Formative constructs depend principally on face validity: do they make sense to users?

The direction of causality has important implications for the nature of interventions.(8) If quality is hypothesized as a reflective construct, the goal of interventions should be movement toward a quality culture, one from which broad benefits flow naturally and are reflected in the quality indicators. Overemphasis of interventions focused solely on individual indicators may divert resources from more useful activities in the same sense that in education, "teaching to the test" may cause important areas of education to be neglected. However, if designed correctly, improvement programs targeted to individual indicators can be used as a catalyst for more broad-based change.(9) If quality is hypothesized as a formative construct, interventions should be focused on specific areas related to the quality indicators that make up the construct without necessarily expecting that interventions in one domain (e.g., access) will have an impact on other domains (e.g., adherence to clinical guidelines). As noted, having a broad set of indicators is important so that no major areas of the domain escape management attention.

Indicators must be combined to create a composite score. Although established analytical techniques exist (e.g., factor analysis and latent variable models) for creating reflective scales from individual indicators, there is rarely an objective basis for assigning weights necessary to calculate a formative scale from individual indicators. To create a formative scale of quality one would like to assign weights in proportion to the importance of each indicator on patient health. However, there is no empirical basis for doing this. As a default, equal weights are often used. For example, the HDI gives equal weights to component indices measuring longevity, knowledge and standard of living. Other times, differential weights reflect the judgment of policy makers, perhaps advised by expert panels. For example, the Veterans Health Administration weights the three dimensions of patient access, clinical quality, and patient satisfaction 30%, 50% and 20%, respectively. It is worth noting that although equal weights might have some intuitive appeal, they are no less arbitrary than alternatives.

In the clinical area, Feinstein(10) makes the same distinction in the nature of constructs, though using different names: "clinimetric" (formative) versus "psychometric or statistical" (reflective) approaches for converting an "intangible clinical phenomenon into a formally specified measurement." He cites the Apgar score (a construct measuring the condition of a newborn baby) to illustrate clinimetrics. Its components were selected using clinical judgment rather than statistical criteria. Dr. Apgar attempted to capture a range of factors that were individually and collectively important to the health status of newborns. Her goal was not to select a set of intercorrelated variables. As Feinstein notes: "Regardless of the intrinsic psychometric results, however, the rejection of diversity is a direct contradiction of clinical goals in forming composite indexes and rating scales. A clinician wants to combine different elements, not different expressions for essentially the same thing." To create the composite, equal weights were assigned to each indicator. The simplicity and transparency of this choice enhanced its popularity and contributed to its face validity: "Clinicians accustomed to implicitly rating the condition of newborn babies promptly recognized that the explicit score had the excellent face validity of clinical common sense."

Like Feinstein, we do not claim that formative composite measures of quality are superior. Both reflective and formative composite measures have a useful role to play. However, it is important to understand both types, and the distinction between them, when evaluating composite measures and designing interventions to improve quality.


Michael Shwartz, PhD
Boston University, MA

Arlene S. Ash, PhD
Boston University, MA


The views and opinions expressed are those of the author and do not necessarily state or reflect those of the National Quality Measures Clearinghouse™ (NQMC), the Agency for Healthcare Research and Quality (AHRQ), or its contractor, ECRI Institute.

Potential Conflicts of Interest

Dr. Shwartz is a deputy editor of Medical Care. He also has business/professional affiliations with Boston University and the VA.

Drs. Shwartz and Ash declared no potential financial conflicts of interest with respect to this commentary.


  1. Jha AK, Zhonghe L, Orav EJ, et al. Care in U.S. hospitals - The Hospital Quality Alliance Program. New England Journal of Medicine. 2005; 353: 265-274.
  2. Berlowitz DR, Rosen AK, Wang F, et al. Purchasing or providing nursing home care: Can quality of care data provide guidance. Journal of the American Geriatric Society. 2005; 53: 603-608.
  3. Gandhi TK, Cook EF, Puopolo AL, et al. Inconsistent report cards: Assessing the comparability of various measures of the quality of ambulatory care. Medical Care. 2002; 40: 155-165.
  4. Edwards JR and Bagozzi RP. On the nature and direction of relationships between constructs and measures. Psychological Methods. 2000; 5: 155-174.
  5. Jarvis CB, MacKenzie SB and Podsakoff PM. A critical review of construct indicators and measurement model misspecification in marketing and consumer research. Journal of Consumer Research. 2003; 30: 199-218.
  6. Coltman TR, Devinney TM, Midgley DF, et al. Formative versus reflective measurement models: Two applications of erroneous measurement. Forthcoming Journal of Business Research.
  7. Nunnally JC and Bernstein IH. Psychometric Theory. 3rd edition. New York: McGraw-Hill, p 449.
  8. Rhodes RE, Plotnikoff RC and Spence JC. Creating parsimony at the expense of precision? Conceptual and applied issues of aggregating belief-based constructs in physical activity research. Health Education Research. 2004; 19: 392-405.
  9. VanDeusen Lukas C, Holmes SK, Cohen AB, et al. Transformational change in health care systems: An organizational model. Health Care Management Review. 2007; 32: 309-320.
  10. Feinstein AR. Multi-item "instruments" vs. Virginia Apgar's principles of clinimetrics. Archives of Internal Medicine. 1999; 159: 125-128.