1 Introduction
Explaining the predictions of neural networks through higher level concepts
Kim et al. (2018); Ghorbani et al. (2019); Brocki and Chung (2019); HamidiHaines et al. (2018) enables model interpretation on data with complex manifold structure such as images. It also allows the use of domain knowledge during the explanation process. The conceptbased explanation has been used for medical imaging Cai et al. (2019), breast cancer histopathology Graziani et al. (2018), cardiac MRIs Clough et al. (2019), and meteorology Sprague et al. (2019).When the set of concepts is carefully selected, we can estimate a model in which the discriminative information flow from the feature vectors
through the concept vectors and reach the labels . To this end, we train two models for prediction of the concept vectors from the features denoted by and the labels from the predicted concept vector . This estimation process ensures that for each prediction we have the reasons for the prediction stated in terms of the predicted concept vector .However, in reality, noise and confounding information (due to e.g. nondiscriminative context) can influence both of the feature and concept vectors, resulting in confounded correlations between them. Figure 1 provides an evidence for noise and confounding in the CUB2002011 dataset Wah et al. (2011). We train two predictors for the concepts vectors based on features and labels and we compare the Spearman correlation coefficients between their predictions and the true ordinal value of the concepts. Having concepts for which is more accurate than could be due to noise, or due to hidden variables independent of the labels that spuriously correlated and , leading to undesirable explanations that include confounding or noise.
In this work, using the Concept Bottleneck Models (CBM) Koh et al. (2020); Losch et al. (2019) we demonstrate a method for removing the counfounding and noise (debiasing) the explanation with concept vectors and extend the results to Testing with Concept Activation Vectors (TCAV) Kim et al. (2018) technique. We provide a new causal prior graph to account for the confounding information and concept completeness Yeh et al. (2019). We describe the identifiability challenges in our causal prior graph and propose a twostage estimation procedure. Our twostage estimation technique defines and predicts debiased concepts such that the predictive information of the features maximally flow through them.
We show that using the labels as instrumental variables, we can successfully remove the impact of the confounding and noise from the predicted concept vectors. The first stage of our proposed procedure has three steps: (1) debias the concept vectors using the labels, (2) predict the debiased concept vectors using the features, and (3) use the predict concept vectors in the second step to predict the labels. In the second stage, we find the residual predictive information in the features that are not in the concepts. We validate the proposed method using a synthetic dataset and the CUB2002011 dataset.
2 Methodology
Notations.
We follow the notation of Goodfellow et al. (2016) and denote random vectors by bold font letters and their values by bold symbols . The notation
is a probability measure on
and is the infinitesimal probability mass at . We use to denote the the prediction of given . In the graphical models, we show the observed and unobserved variables using filled and hollow circles, respectively. To avoid clutter in the equations, without loss of generality, we state the relationships with additive noise.Problem Statement.
We assume that during the training phase, we are given triplets for data points. In addition to the regular features and labels , we are given a human interpretable concepts vector for each data point. Each element of the concept vector measures the degree of existence of the corresponding concept in the features. Thus, the concept vector typically have binary or ordinal values. Our goal is to learn to predict as a function of and use for explaining the predictions. Performing in two steps, we first learn a function and then learn another function . The prediction is the explanation for our prediction . During the test time, only the features are given and the prediction+explanation algorithm predicts both and .
2.1 A New Causal Prior Graph for CBMs
Figure 1(a) shows the ideal situation in explanation via highlevel concepts. The generative model corresponding to Figure 1(a) states that for generating each feature we first randomly draw the label . Given the label, we draw the concepts . Given the concepts, we draw the features. The hierarchy in this graph is from nodes with less detailed information (labels) to more detailed ones (features, images).
This model in Figure 1(a) is an explanation for the phenomenon in Figure 1, because the noise in generation of the concepts allows the — edge to be stronger than the — edge. However, another (nonmutually exclusive) explanation for this phenomenon is the existence of hidden confounders shown in Figure 1(b). In this graphical model, represents the confounders and represents the unconfounded concepts. Note that we assume that the confounders and labels are independent when and are not observed.
Another phenomenon captured in Figure 1(b) is the lack of concept completeness Yeh et al. (2019). It describes the situation when the features, compared to the concepts, have additional predictive information about the labels.
The nonlinear structural equations corresponding to the causal prior graph in Figure 1(b) are as follows
(1)  
(2)  
(3) 
for some vector functions , and . We have and . Our definition of in Eq. (2) does not restrict , because we simply attribute the difference between and to a function of the latent confounder and noise.
Our causal prior graph in Figure 1(b) corresponds to a generative process in which to generate an observed triplet we first draw a label and a confounder vector independently. Then we draw the discriminative concepts based on the label and generate the features jointly based on the concepts, label, and the confounder. Finally, we draw the observed concept vector based on the drawn concept and confounder vectors.
Both causal graphs reflect our assumption that the direction of causality is from the labels to concepts and then to the features, , to ensure that and are marginally independent in Figure 1(b). This direction also correspond to moving from more abstract class labels to concepts to detailed features. During estimation, we fit the functions in the direction, because finding the statistical strength of an edge does not depend on its direction.
Estimation of the model in Figure 1(b) is challenging. Because of the structure of the latent confounders, this model is unidentifiable (Pearl, 2009, Chapter 3). Our solution is to first ignore the edge and estimate the , then estimate the residuals of the regression using the edge. Our twostage estimation technique ensures that the predictive information of the features maximally flow through the concepts. In the next sections, we focus on the first stage and using the instrumental variables to eliminate the noise and confounding in estimation of the link.
2.2 Instrumental Variables
Background on Instrumental Variables.
In causal inference, instrumental variables Stock (2015); Pearl (2009) denoted by are commonly used to find the causal impact of a variable on when and are jointly influenced by an unobserved confounder (i.e., ). The key requirement is that should be correlated with but independent of the confounding variable (i.e. and ). The commonly used 2stage least squares first regresses in terms of to obtain followed by regression of in terms of . Because of independence between and , is also independent of . Thus, in the second regression the confounding impact of is eliminated. Our goal is to use the instrumental variable trick again to remove the confounding factors impacting features and concept vectors.
Instrumental Variables for CBMs.
In our causal graph in Figure 1(b), the label is a valid instrument for the study of the relationship between concepts and features . We predict as a function of and use it in place of the concepts in the concept bottleneck models. The graphical model corresponding to this procedure is shown in Figure 1(c), where the link is eliminated. In particular, given the independence relationship , we have . This is the basis for our debiasing method in the next section.
2.3 The Estimation Method.
Our estimation uses the observation that in graph 1(b) the label vector is a valid instrument for removing the correlations due to . Combining Eqs. (1) and (2) we have . Taking expectation with respect to , we have
(4) 
The last step is because both and are independent of . Thus, two term is constant in terms of and and can be eliminated after estimation. Eq. (4) allows us to remove the impact of and and estimate the denoised and debiased . We find using a neural network trained on pairs and use them as pseudoobservations in place of . Given our debiased prediction for the discriminative concepts , we can perform the CBMs’ twosteps of and estimation.
Because we use expected values of in place of during the learning process (i.e., ), the debiased concept vectors have values within the ranges of original concept vectors . Thus, we do not lose the human readability with the debiased concept vectors.
Modeling Uncertainty in Prediction of Concepts.
Our empirical observations show that prediction of the concepts from the features can be highly uncertain. Hence, we present a CBM estimator that takes into account the uncertainties in prediction of the concepts. We take the conditional expectation of the labels given features as follows
(5) 
where is the probability function, parameterized by , that captures the uncertainty in prediction of labels from features. The function predicts labels from the debiased concepts.
In summary, we perform the following steps to estimate Eq. (5):

Train a neural network using pairs.

Train a neural network as an estimator for using pairs .

Use pairs to estimate function by fitting to .

[Optional] Fit a neural network to the residuals of step 3. The function captures the residual information in . Compare the improvement in prediction accuracy over the accuracy in step 3 to quantify the degree of concept incompleteness.
Steps 1–3 describe the first stage of estimating the and step 4 describe the second stage of estimating the residual link . In step 3, we approximate the integral using Monte Carlo approach by drawing from the distribution estimated in step 2. Because we first predict the labels using the concepts and then fit the to the residuals, we ensure that the predictive information maximally go through the debiased concepts. The last step is optional, because our goal is to compare the predictive power of the features going through the concepts (step 3) with the unrestricted features (step 4). We can omit step 4 and learn an unrestricted predictive model and use it for comparison.
A Special Case and Application to TCAV.
Choosing a simple multivariate Gaussian distribution
, , we can show that the above steps are simplified as follows:
Learn by predicting .

Learn by predicting .

Learn by predicting .

[Optional] Learn to predict the residues .
The above special case suggests us a simple method for debiasing the results of TCAV Kim et al. (2018) analysis. The TCAV method is attractive, because unlike CBMs, it analyzes the existing neural networks and does not need to define a new model. We can use the first step to remove the bias due to the confounding and perform TCAV among vectors, instead of vectors.
Prior Work on Causal ConceptBased Explanation.
Among the existing works on causal conceptbased explanation, Goyal et al. (2019) proposes a different causal prior graph to model the spurious correlations among the concepts and remove them using conditional variational autoencoders. In contrast, we aim at handling noise and spurious correlations between the features and concepts using the labels as instruments. Which work is more appropriate for a problem depending on the assumptions underlying that problem.
3 Experiments
3.1 Synthetic Data Experiments
We create a synthetic dataset according to the following steps:

Generate vectors with elements distributed according to unit normal distribution .

Generate vectors with elements distributed according to scaled normal distribution .

Generate vectors with elements distributed according to scaled normal distribution .

Generate matrices with elements distributed according to scaled normal distribution .

Compute for .

Compute for .

Compute for .
In Figure 3, we plot the correlation between the true unconfounded and noiseless concepts and the estimated concept vectors with the regular twostep procedure (without debiasing) and our debiasing method, as a function of sample size . The results show that the bias due to confounding does not vanish as we increase the sample size and our debiasing technique can make the results closer to the true discriminative concepts.
3.2 CUB Data Experiments
Dataset and preprocessing.
We evaluate the performance of the proposed approach on the CUB2002011 dataset Wah et al. (2011)
. The dataset includes 11788 pictures (in 5994/5794 train/test partitions) of 200 different types of birds, annotated both for the bird type and 312 different concepts about each picture. The concept annotations are binary, whether the concept exists or not. However, for each statement, a fourlevel certainty score has been also assigned: 1: not visible, 2: guessing, 3: probably, and 4: definitely. We combine the binary annotation and the certainty score to create a 7level ordinal variable as the annotation for each image as summarized in Table
1. For simplicity, we map the 7level ordinal values to uniformly spaced valued in the interval. We randomly choose 15% of the training set and hold out as the validation set.Annotation  Certainty  Ordinal Score  Numeric Map 

Doesn’t Exist  definitely  0  0 
Doesn’t Exist  probably  1  
Doesn’t Exist  guessing  2  
Doesn’t Exist  not visible  3  
Exists  not visible  3  
Exists  guessing  4  
Exists  probably  5  
Exists  definitely  6 
The result in Figure 1.
To compare the association strength between and with the association strength between and we train two predictors of concepts and
. We use PyTorch’s pretrained ResNet152 network
He et al. (2016)for prediction of the concepts from the images. Because the annotations are ordinal numbers, we use the Spearman correlation to find the association strengths. Because
is a categorical variable,
is simply the average concept annotation scores per each class. The concept ids in the xaxis are sorted in terms of increasing values of .The top ten concepts with the largest values of are ‘has back color::green’, ‘has upper tail color::green’, ‘has upper tail color::orange’, ‘has upper tail color::pink’, ‘has back color::rufous’, ‘has upper tail color::purple’, ‘has back color::pink’, ‘has upper tail color::iridescent’, ‘has back color::purple’, ‘has back color::iridescent’. These concepts are all related to color and can be easily confounded by the context of the images.
Training details for Eq. (5).
We model the distribution of concept logits as independent Gaussians with their means equal to the ResNet152 logit outputs. We estimate the variance for each dimension by using the logits of the true annotation scores that are clamped into
to avoid large logit numbers. In each iteration of the training algorithm, we draw 25 samples from the . Predictor of labels from concepts (the function in Eq. (5)) is a threelayer feedforward neural network with hidden layer sizes (312, 312, 200). There is a skip connection from the input to the penultimate layer. We model the residual function
with another pretrained ResNet152 function. All algorithms are trained with Adam optimization algorithm Kingma and Ba (2014).Quantitative experiments.
Comparing to the baseline algorithm, our debiasing technique increases the average Spearman correlation between and from 0.406 to 0.508. For the above 10 concepts, our algorithm increases the average Spearman correlation from 0.283 to 0.389. Our debiasing algorithm also improves the generalization in prediction of the image labels. It improves the top5 accuracy of predicting the images from 39.5% to 49.3%.
Analysis of the results.
In Figure 4, we show 12 images for which the and are significantly different. A common pattern among the examples is that the context of the image does not allow accurate annotations by the annotators. In images 3, 4, 5, 6, 7, 11, and 12 in Figure 4 the ten colorrelated concepts listed above are all set to 0.5, indicating that the annotators have failed in annotation. However, our algorithm correctly identifies that for example Ivory Gulls do not have greencolored backs by predicting which is closer to than the true .
Another pattern is the impact of the color of the environment on the accuracy of the annotations. For example, the second image from the left is an image of Pelagic cormorant, whose back and upper tail colors are unlikely to be green with perclass average of and , respectively. However, because of the color of the image and the reflections, the annotator has assigned to both of ‘has back color::green’ and ‘has upper tail color::green’ concepts. Our algorithm predicts and for these two features respectively, which are closer to the perclass average.
4 Conclusions and Future Works
Studying the conceptbased explanation techniques, we provided evidences for potential existence of an unobserved latent variable, independent of the labels, that creates associations between the features and concepts. We proposed a new causal prior graph that models the impact of the noise and latent confounding fron the estimated concepts. We showed that using the labels as instruments, we can remove the impact of the context from the explanations. Our experiments showed that our debiasing technique not only improves the quality of the explanations, but also improve the accuracy of predicting labels through the concepts. As future work, we will investigate other instrumental variable techniques to find the most accurate debiasing method.
References
 Concept saliency maps to visualize relevant features in deep generative models. arXiv:1910.13140. Cited by: §1.
 Humancentered tools for coping with imperfect algorithms during medical decisionmaking. In CHI, Cited by: §1.
 Global and local interpretability for cardiac mri classification. In MICCAI, Cited by: §1.
 Towards automatic conceptbased explanations. In NeurIPS, pp. 9273–9282. Cited by: §1.
 Deep learning. MIT press. Cited by: §2.

Explaining classifiers with causal concept effect (cace)
. arXiv:1907.07165. Cited by: §2.3. 
Regression concept vectors for bidirectional explanations in histopathology.
In
Understanding and Interpreting Machine Learning in Medical Image Computing Applications
, pp. 124–132. Cited by: §1.  Interactive naming for explaining deep neural networks: a formative study. arXiv:1812.07150. Cited by: §1.
 Deep residual learning for image recognition. In CVPR, Cited by: §3.2.
 Interpretability beyond feature attribution: quantitative testing with concept activation vectors (tcav). In ICML, pp. 2668–2677. Cited by: §1, §1, §2.3.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
 Concept Bottleneck Models. In ICML, Cited by: §1.
 Interpretability beyond classification output: semantic bottleneck networks. arXiv:1907.10882. Cited by: §1.
 Causality. Cambridge university press. Cited by: §2.1, §2.2.
 Interpretable ai for deep learning based meteorological applications. In American Meteorological Society Annual Meeting, Cited by: §1.
 Instrumental variables in statistics and econometrics. International Encyclopedia of the Social & Behavioral Sciences. Cited by: §2.2.
 The CaltechUCSD Birds2002011 Dataset. Technical report Technical Report CNSTR2011001, California Institute of Technology. Cited by: Figure 1, §1, §3.2.
 On conceptbased explanations in deep neural networks. arXiv:1910.07969. Cited by: §1, §2.1.
Comments
There are no comments yet.