Creating reliable econometric models of the CJEU case law: a response to criticisms (by Arrebola, Mauricio & Jimenez)

One of the most satisfactory activities in academia is to engage in debate and discussion. Only by subjecting ideas to tough scrutiny can we advance in our knowledge. Thus, I am extremely pleased that Carlos Arrebola, Julia Mauricio and Hector Jimenez have reacted so quickly to my criticism of their recent paper (here) and come back with a thoughtful and forceful rebuttal. I am posting it below. You will see that there are important points of disagreement that will probably require two (or more) follow-up studies in the future. Seems like I need to brush up my econometrics...

Creating reliable econometric models of the CJEU case law:
a response to Sanchez-Graells’ criticisms

by Carlos Arrebola, Julia Mauricio and Hector Jimenez


In a recent study, we used econometric methodology to quantify the degree of influence of the Advocate General on the Court of Justice. Based on data collected from 20 years of actions for annulment, we concluded that the Court is 67% more likely to annul an act if the Advocate General suggests so in her opinion. In a post last Tuesday, Sanchez-Graells examined our paper. As he said, our conclusion is ‘bold [...] and controversial [for its] implications’, and as such it should be subject to ‘tough scrutiny’. We most definitely agree on both the importance of our claim and the need to test it rigorously. As we stated in our paper, if the conclusions are true, the role of the Advocate General within the Court might need to be reconsidered in order to secure judicial independence.

However, Sanchez-Graells voiced several criticisms regarding our econometric model that prevent him from accepting the validity of our results. We greatly welcome the debate, and appreciate the comments in his post, although we ultimately disagree. While we acknowledge that quantitative methodology is not perfect, we argue that our results are a reliable estimation of the influence of the Advocate General (hereinafter, “AG”) on the Court. If not in the specific number of 67% increased probability of a judicial outcome, our results are at least an indication that the influence relationship is positive, as it is shown by the six different econometric models estimated in our study. In the spirit of discussion and debate of this blog, we address Sanchez-Graells’ criticisms along with several other factors that, in our opinion, should have been taken into account when assessing our paper’s reliability.

1. The impossibility of using Randomised Controlled Trials

In his post, Sanchez-Graells suggests that we were too quick to discard the possibility of testing the hypothesis of the influence of the AG on the Court using Randomised Controlled Trials (“RCTs”). For a layperson, RCTs are the type of scientific methodology used in many areas of science to study causality. One of the main examples where RCTs are used is medicine. In order to prove the validity of a new drug, several groups of patients with similar features are randomly selected. Normally, one of those groups would be the control group. The control group would receive a placebo, instead of the actual drug. In this way, the researchers can easily infer whether the health outcome is caused only by the drug. If both the group taking the placebo and the group taking the drug had the same reaction, it would be clear that some external factor other than the drug had caused it. If, on the other hand, the group taking the drug and the placebo group reacted differently (for example, in the case of an illness, if the group taking the drug was the only one to recover), it could be said with certain confidence that the drug caused the recovery.

In our paper, we suggested that RCTs are not a possibility because it would require using the Court of Justice as a laboratory, experimenting with cases, judges and AGs. Nevertheless, Sanchez-Graells argued that we should have considered those cases in which the AG does not participate as our “control group”. This is a misconception about how RCTs are designed. A vital feature in the design of RCTs is making sure that the observations that included in the sample are randomly drawn. This is because, ideally, you would like every observation to be identical, so that the only factor that affects it is the treatment that you are examining in the experiment. In the case of medicine-related RCTs, you want patients with the same characteristics, symptoms, etc., so that whatever happens after taking the drug can only be traced back to the drug. In our study, we would need the same case to be repeated several times, with the same legal problem to be solved by the same judges, having access to the same amount of precedent, lawyers with the same ability to plead cases, etc. Only having that could we then observe what would happen if we took the element of the Advocate General out of the equation. However, cases are never the same. Unlike illnesses, where patients tend to have the same symptoms, cases are much more complex. Legal problems rarely have the same surrounding circumstances.

So, if we followed Sanchez-Graells’ suggestion, we would be ignoring a set of external factors that actually affect the outcome of a case. We would be wrongly attributing it to the Advocate General’s intervention, when actually it could be something else. That is, if we had two cases, one with an AG’s opinion, and one without, in which the Court reached different results, we could not say that the Advocate General caused that different result. It could be that the case had different facts, and that is why the Court decided differently. Or, it could well be that the judges were presented with different arguments by the parties, and it was the lawyers, and not the AGs, who persuaded the Court. Furthermore, Sanchez-Graells’ suggestion is unfeasible because there is a clear bias. As he explained, the cases in which the CJEU considers that there are not going to be problematic legal issues, they decide not to have an AG opinion. It means that from the very beginning of the case they are sensing that it might have an easy or clear legal solution. In other words, Sanchez-Graells is suggesting that we compare in our analysis a simple cold, with a more complicated condition, such as cancer, and that we can thus establish whether radiotherapy has any impact on health. The outcome to such a query would have a misleading result, because the colds would have a rate of recovery close to 100%, whether the cancer would be lower. However, that would not tell us anything about the effectiveness of radiotherapy. In the same way, if a case deals with unproblematic legal issues, the opinion of the AG will probably not do much to affect the Court, because the Court would have come to that conclusion by itself without any external influence. We cannot simply compare those two scenarios without losing information. After all, there would not be any “random” selection of groups, clearly not fulfilling the requirements to conduct a RCT.

For that reason, the only way to approximately estimate causality is to use regressions, in which you can account for as many variables as possible that may influence the Court, including the Advocate General, and including variables that will account for how easy it is to solve a case or clear a case is. That way we will know the exact magnitude of the variable AG on the Court.

2. Designing a reliable regression

Once we establish that the most accurate measure is a regression model accounting for variables that affect the outcome of the Court, the difficulty arises in deciding which variables to include and how to code them. It is in this respect that we think Sanchez-Graells raises his most valid criticism of our study. We acknowledge that our variables are not perfect. We will never be able to establish causality without a shadow of a doubt. This is simply because, as we said, we will always miss variables that affect the case that we will not be able to track, codify and insert in our database. Taking this to an extreme and absurd example, we will never be able to verify whether the judge in the deliberating room had a headache and wanted to go home soon, rushing her decision. However, the fact that we will always miss variables does not mean that our model cannot be reliable. We still include a number of important variables that can explain a substantial amount of what goes on in the courtroom. There are different ways in econometrics to determine the extent to which a model, albeit missing variables, is an accurate depiction of reality. For our study, these measures suggest that the model is indeed reliable. We will come back to this in a moment.

Another aspect of coding variables is, as Sanchez-Graells comments, the oversimplification. In our study, we used actions for annulment, where the outcomes of a case can be (i) annulment, (ii) partial annulment, (iii) dismissal of the case, or (iv) inadmissibility of the case. We decided to simplify this variable by looking only at whether the Court decided to annul (in any of its forms) or not. But, the oversimplification is necessary to make it more reliable, because in order to have a dataset capable of yielding significant results, we need to have a representative sample. In our case, we only had data for a very small number of partial annulments. Including them as a separate variable from total annulment would have only created “noise” in our model, making the results less significant, statistically speaking.

Sanchez-Graells especially criticises our grouping of dismissal and inadmissibility cases together, because he says that dismissing a case and declaring it inadmissible are very different things. However, that discussion in his post is unnecessary, because as he himself notes later on, our results ‘cannot be interpreted regarding inverse AG recommendations (ie recommendations to inadmit/dismiss)’. Our results are only relevant for decisions to annul or partially annul; we do not make any claim about other type of cases, which Sanchez-Graells also criticises.

However, the fact that we decided to look at the question in terms of what happens if the AG suggests to annul the act, rather than if she suggests to dismiss it or declare it inadmissible, does not affect the reliability of our results. In fact, the only thing that Sanchez-Graells is postulating is a new hypothesis. He is saying that, in his opinion, we would have got other results if we had constructed the model differently. That is a point that we cannot falsify without fiddling for a few more weeks with our data in the econometrics software. But, we invite people, and we ourselves may do it in the future, to carry out other studies, with the same or different data to check that the results are not affected if we look at things in a different way; by, for example, looking at what happens if the AG suggests dismissal, or what happens if we gather data from other periods of time. Nonetheless, the reliability of the results that we presented is a separate issue.

So, if we have acknowledged that we are not going to be able to include every variable, and that our data is only a sample, why are we confident in our results? In the paper we explain it more technically, but, basically, there are econometric measures that indicate that the model that we have created is accurate when the estimation that we get from the model is compared with actual data from reality. That is the reason why we know it is a fairly reliable model.

3. Final caveat

Whilst reading Sanchez-Graells’ words, we could not avoid feeling something we felt many times before. Lawyers are more comfortable sticking to arguing with words.  We feel somehow threatened by this terra incognita called econometrics. There seems to be a certain reticence to attempting to use mathematics to help us in our enquiries. It is worth saying that we are not accusing Sanchez-Graells of not wanting to engage with quantitative methodology. In fact, we know that he has used some statistics previously, and we would not expect a “more economic approach” type of person to disregard this evidence-based methodology.

We want to end this post with a final note about quantitative methodology. We want to say that although judicial proceedings and legal arguments cannot always be equated to numbers, and other methodologies are extremely valuable to legal research questions, quantitative analysis can help elucidate complex legal questions. As many other subjects in social sciences did before us, statistics can become a tool at the service of legal researchers. In this sense, it is worth reminding the readers that, a few centuries ago, economics was equally a merely discursive subject, and anyone who has read the Wealth of Nations can be a witness to that.  But, now, economics and mathematics cannot be separated. Therefore, we would encourage researchers to embrace statistics and econometrics, and see how they can help with their enquiries. Quantitative analysis tries to be evidence-based and objective. Therefore, anyone who believes in the benefits of science will prefer a claim based on quantitative methodology to a hypothesis made, to follow the words of Sanchez-Graells, on the basis of ‘anecdotal impression’.

The difficulties in an econometric analysis of CJEU case law -- a propos Arrebola, Mauricio & Jiménez Portilla (2016)

Carlos Arrebola and Ana Julia Mauricio (PhD students at the University of Cambridge), together with  Héctor Jiménez Portilla (of the Overseas Development Institute (ODI)) have published an interesting and thought-provoking  paper (*) where they try to measure the influence of the Advocate General (AG) on the Court of Justice of the European Union (CJEU) [for a short summary of their paper, see here]. This is an area where EU law scholars have been struggling to find an objective way to measure/prove/dimiss any claim of AG influence over the CJEU--as Arrebola et al clearly stress in their excellent literature review.

In a nutshell, Arrebola et al claim that their 'findings suggest that the CJEU is approximately 67 percent more likely to annul an act (or part of it) if the AG advises the Court to annul than if it advises the Court to dismiss the case or declare it inadmissible. In their view, these results raise several questions as regards judicial independence and the relevance of the figure of the Advocate General, providing a grounded basis for future discussions and judicial reform.'

Their claim is as intuitively appealing as it is bold (and controversial, in terms of the implications they derive) and, in my view, it deserves a tough scrutiny of the way they reached this conclusion. The following are some of the doubts that I have had while reading the paper, which I am limiting to the three main doubts I am struggling with. Overall, these doubts leave me with the impression that, unfortunately, the paper does not actually deliver on its main goal of contributing 'to a more comprehensive understanding of the role of the Advocate General in the makeup of the Court of Justice of the European Union'.

Their model in a nutshell
Let me frame my doubts in an stylised summary of their econometric model. In short, they have looked at 'data from 20 years of actions for annulment procedures before the Court of Justice. Every case from January 1994 to January 2014 has been included, with the exception of appeals from the General Court and those cases that do not have an AG opinion. We collected a total of 285 observations. For these cases, we have examined the behaviour of the Court and the Advocate General as regards to their decision to annul or not to annul the legal act in question' (p. 15). 

They have coded these cases to examine the relationship between two main variables: the recommendation of the AG and the final decision of the CJEU. There are other variables they take into account, but those do not affect my analysis, so I am sticking to the two main variables for simplicity of argument. They explain why they have chosen annulment cases in the following terms: 'we have created two dichotomous (also called dummy or binary) variables: ECJannulment and AGannulment. ECJannulment is the one that we have considered as the dependent variable. It takes the value of 1 if the Court decided to annul or partially annul an act, and 0 if it dismissed the case or deemed it inadmissible. AGannulment is the variable that we have considered independent. It takes the value of 1 if the Advocate General issued an opinion recommending the Court to annul or partially annul an act, and 0 if it recommended dismissing the case or declaring it inadmissible' (p. 15). 

With this information, they have run a 'probit model [which] is a regression that explains the predicted probability of the dependent variable adopting the value 1. In our case, it outputs the predicted probability of the Court annulling an act, subject to the value given to the other variables included. Therefore, the probit model provides a simple way to interpret the results in terms of predicted probability from 0 to 1. Instead, if we had chosen a linear regression model, the result would not be enclosed between 0 and 1, making the interpretation impossible, as it could yield some predicted probabilities to be negative or above the unit' (p. 25). This is what allows them to reach their main finding that 'when the Advocate General recommends annulment, the Court is 67 per cent more likely to annul' (p. 30).

My main doubts
Firstly, I am not sure that the model the authors use is the best suited to the analysis of such a complex issue as the influence of the AG on the CJEU. One of the reasons (probably the main reason) why the authors decide to use a probit model is that they consider that it is not possible to establish a group of annulment cases that can work as a control (ie what they call the impossibility of conducting a randomised controlled trial). They consider that this would be the best way to avoid selection bias, but that in their study 'it is not possible to create a randomised controlled trial to define the causal effect of the AG opinion on the Court of Justice. This would require having the ability to design empirical experiments using the Court of Justice as a laboratory, which is unfeasible in practice' (p. 13, with more details in fn 54).

I disagree with their view about the impossibility to use a randomised controlled trial. There is a group of annulment procedures where no AG Opinion was submitted, and this could be used as a control group. It is important to note that, according to the Statute of the CJEU, '[w]here it considers that the case raises no new point of law, the Court may decide, after hearing the Advocate General, that the case shall be determined without a submission from the Advocate General' [Art 20(5)]. 

This is organised according to the Rules of Procedure of the CJEU, according to which 'The preliminary report shall contain proposals as to whether particular measures of organisation of procedure, measures of inquiry or, if appropriate, requests to the referring court or tribunal for clarification should be undertaken, and as to the formation to which the case should be assigned. It shall also contain the Judge-Rapporteur’s proposals, if any, as to whether to dispense with a hearing and as to whether to dispense with an Opinion of the Advocate General pursuant to the fifth paragraph of Article 20 of the Statute. The Court shall decide, after hearing the Advocate General, what action to take on the proposals of the Judge-Rapporteur' [Art 59(2) and (3)].

Therefore, the annulment cases where there is no AG Opinion are an important instrument for potential control tests. These cases only come to be decided without an AG Opinion because both the CJEU (rectius, the Judge-Rapporteur) and the AG agreed that the case raised no new point of law. Thus, there is no indication that the AG can influence the CJEU on any other point than the existence or not of new issues to be considered. Admittedly, there could already be scope for some indication of the AG (and the CJEU's) position on the substance of the case in this first judgement. However, I would think that running controls on the basis of these cases could be useful.

In these cases, the CJEU (at least formally), decided whether to admit or dismiss, annul (totally or partially) the case without submission of the AG. If there was a significant divergence of the probability of annulment between these two groups of cases, the argument that the author's raise in the paper would be strengthened. On the contrary, if the CJEU showed the same likelihood of annulling/dismissing regardless of the existence or not of an AG Opinion, the claim would be significantly weakened. I do not imagine this to the ultimate test for the arguments raised in the paper, but I would see it as an important one.

Secondly, I am skeptical of the way in which the authors simplify the setting for annulment procedures. They construct them as binary: that is, the only options available to the AG and the CJEU are to either declare the case inadmissible/dismiss it (0) or annul the provision in question totally/partially (1). I understand the need to simplify decisions to annul grouping together full and partial annulments (which they explain in p. 17). I remain unconvinced by their arguments regarding declarations of inadmissibility and dismissals. They simply indicate that 'inadmissibility and dismissal are sometimes used as interchangeable terms, although technically the substance of the case is not analysed in cases of inadmissibility, whilst it is in cases that are dismissed' (p. 17). However, they do not consider this a major issue and proceed with the grouping of both types of results as a single outcome of the case.

The difficulty I have with this strategy is that the rules on admissibility/inadmissibility are procedural in nature and they set up a first filter for cases to come to a full analysis. It can also be argued that they are much simpler than the rules applicable to the potential annulment of the challenged provisions, which depend on much more complex assessments of both procedural and substantive EU law. Thus, grouping decisions on (procedural) inadmissibility with those on dismissal of the annulment claim after a full analysis seems to create a significant conceptual problem. At this point, it may be worth stressing that the authors had mentioned that 
we have decided to estimate regressions including other variables that could potentially be biasing the results if we only looked at what the Advocate General said and whether the Court followed the Advocate General’s position. In particular, one of the bias factors is the clarity of the law in a given case. For example, the Court and the Advocate General could reach the same result in a case not because the Court decided to follow the AG opinion, but because the law was clear on what the outcome should be, and there was no room for different interpretations. Therefore, not accounting for the clarity of the case could overestimate our measure of the influence of the Advocate General (pp. 14-15, emphasis added). 
My problem is that the authors seems to have forgotten to include this very bias-check in the way they have constructed their variables. By grouping (relatively simpler) procedural checks with (relatively more complex) full assessments, they have created a variable that is very hard to reconcile with reality outside of their model.

Thirdly, even within the context of their model, I am not sure what to make of their results. Their findings indicate that, when the AG recommends the annulment of an act, the CJEU is almost 67 per cent more likely to annul the act than if the AG had not proposed its annulment (ie, had she advocated for either inadmissibility or dismissal). I have trouble interpreting this number due to the conceptual issue mentioned above (ie, conflation of inadmission and dismissal), which makes the recommendation of the AG (as coded) ambiguous. This makes me wary of the claim that 'even if the number of 67 per cent of increased probability is called into question, it is difficult to deny that there is some level of influence' of the AG on the CJEU (p. 34, emphasis added), and that 'our analysis shows that there is some component in the making of a decision that is simply attributed to what the Advocate General recommended' (p. 35, emphasis added).

From the numbers in the paper, I have been unable to work out the effect that an AG recommendation to inadmit/dismiss has on the CJEU's willingness to do so. Intuitively, I would expect that, if by itself the Opinion of the AG is such a relevant factor as the paper claims, then the CJEU should also be more inclined to inadmit/dismiss when the AG submitted such a recommendation. However, in that case, I would not necessarily find the causal explanation between the AG recommendation and the CJEU's decision persuasive. An alternative interpretation not linked to the influence of the AG over the CJEU would need to be dispelled: ie the zeal with which the CJEU keeps control of its docket. The intuition would be that the CJEU may be engaged in an interpretation of inadmissibility rules that prevents a floodgate of claims, which could well override whatever position the AG decides to take. In my personal opinion, and based on anecdotal impression, this is what has been happening regarding annulment procedures promoted by unprivileged applicants (with all the issues that the Plaumann, UPA, Inuit, saga have created; see here).

In the end, the difficulty I have is that their results do not necessarily make a lot of intuitive sense because they cannot (or at least not immediately) be interpreted regarding inverse AG recommendations (ie recommendations to inadmit/dismiss) and their effect on the CJEU. Somehow, there seems to be an implicit assumption that 'influence' of the AG is stronger if it prompts the CJEU to annul than if it prompts the CJEU to inadmit/dismiss. If all of this is incorrect, then my only residual criticism is that the paper could have been made more accessible for non-statisticians.

Conclusion
Overall, I remain unconvinced that the results of Arrebola et al significantly contribute 'to a more comprehensive understanding of the role of the Advocate General in the makeup of the Court of Justice of the European Union'. Thus, I am not prepared to engage with the implications in terms of judicial independence and potential (further) reform of the CJEU that they draw (pp. 34-38). Given the disagreement with their methodology and the diversity of views as to how to interpret their results, I have contacted Carlos Arrebola and offered him to reply to my criticisms in a guest post. He has kindly accepted. Keep an eye out for it in the coming days.

(*) The full reference for the paper is: C Arrebola, AJ Mauricio and H Jiménez Portilla, 'An Econometric Analysis of the Influence of the Advocate General on the Court of Justice of the European Union' (January 12, 2016). Cambridge Journal of Comparative and International Law, Vol. 5, No. 1, Forthcoming; University of Cambridge Faculty of Law Research Paper No. 3/2016. Available at SSRN: http://ssrn.com/abstract=2714259.