Further thoughts on data and policy indicators a-propos two recent papers on procurement regulation & competition: comments re (Tas: 2019a&b)

April 22, 2019

The EUI Robert Schuman Centre for Advanced Studies’ working papers series has two interesting recent additions on the economic analysis of procurement regulation and its effects on competition, efficiency and value for money. Both papers are by BKO Tas.

The first paper: ‘Bunching Below Thresholds to Manipulate Public Procurement’ explores the effects of a contracting authority’s ‘bunching strategy’ to seek to exercise more discretion by artificially estimating the value of future contracts just below the thresholds that would trigger compliance with EU procurement rules. This paper is relevant to the broader discussion on the usefulness and adequacy of current EU (and WTO GPA) value thresholds (see eg the work of Telles, here and here), as well as on the regulatory decisions that EU Member States face on whether to extend the EU rules to ‘below-threshold’ contracts.

The second paper: ‘Effect of Public Procurement Regulation on Competition and Cost-Effectiveness’ uses the World Bank’s ‘Benchmarking Public Procurement’ quality scores to empirically test the positive effects of improved regulation quality on competition and value for money, measured as increases in the number of bidders and the probability that procurement price is lower than estimated cost. This paper is relevant in the context of recent discussions about the usefulness or not of procurement benchmarks, and regarding the increasing concern about reduced number of bids in EU-regulated public tenders.

In this blog post, I reflect on the methodology and insights of both papers, paying particular attention to the fact that both papers build on datasets and/or indexes (TED, the WB benchmark) that I find rather imperfect and unsuitable for this type of analysis (regarding TED, in the context of the Single Market Scoreboard for Public Procurement (SMPP) that builds upon it, see here; regarding the WB benchmark, see here). Therefore, not all criticisms below are to the papers themselves, but rather to the distortions that skewed, incomplete or misleading data and indicators can have on more refined analysis that builds upon them.

Bunching Below Thresholds to Manipulate Procurement (Tas: 2019a)

It is well-known that the EU procurement rules are based on a series of jurisdictional triggers and that one of them concerns value thresholds—currently regulated in Arts 4 & 5 of Directive 2014/24/EU. Contracts with an estimated value above those thresholds are subjected to the entire EU procurement regulation, whereas contracts of a lower value are solely subjected to principles-based requirements where they are of ‘cross-border interest’. Given the obvious temptation/interest in keeping procurement shielded from EU requirements, the EU Directives have included an anti-circumvention rule aimed at preventing Member States from artificially splitting contracts in order to keep their award below the relevant jurisdictional thresholds (Art 5(3) Dir 2014/24). This rule has been interpreted expansively by the Court of Justice of the European Union (see eg here).

‘Bunching Below Thresholds to Manipulate Public Procurement’ examines the effects of a practice that would likely infringe the anti-circumvention rule, as it assesses a strategy of ‘bunching estimated costs just below thresholds’ ‘to exercise more discretion in public procurement’. The paper develops a methodology to identify contracting authorities ‘that have higher probabilities of bunching estimated values below EU thresholds’ (ie manipulative authorities) and finds that ‘[m]anipulative authorities have significantly lower probabilities of employing competitive procurement procedure. The bunching manipulation scheme significantly diminishes cost-effectiveness of public procurement. On average, prices of below threshold contracts are 18-28% higher when the authority has an elevated probability of bunching.’ These are quite striking (but perhaps not surprising) results.

The paper employs a regression discontinuity approach to determine the likelihood of bunching. In order to do that, the paper relies on the TED database. The paper is certainly difficult to read and hardly intelligible for a lawyer, but there are some issues that raise important questions. One concerns the authors’ (mis)understanding of how the WTO GPA and the EU procurement rules operate, in particular when the paper states that ‘Contracts covered by the WTO GPA are subject to additional scrutiny by international organizations and authorities (sic). Accordingly, contracts covered by the WTO GPA are less likely to be manipulated by EU authorities’ (p. 12). This is simply an acritical transplant of considerations made by the authors of a paper that examined procurement in the Czech Republic, where the relevant threshold between EU covered and non-EU covered procurement would make sense. Here, the distinction between WTO GPA and EU-covered procurement simply makes no sense, given that WTO GPA and EU thresholds are coordinated. This alone raises some issues concerning the tests designed by the author to check the robustness of the hypothesis that bunching leads to inefficiency in procurement expenditure.

Another issue concerns the way in which the author equates open procedures to a ‘first price auction mechanism’ (which they are not exactly) and dismisses other procedures (notably, the restricted procedure) as incapable of ensuring value for money or, more likely, as representative of a higher degree of discretion for the contracting authority—which is a highly questionable assumption.

More importantly, I am not sure that the author understood what is in the TED database and, crucially, what is not there (see section 2 of Tas (2019a) for methodology and data description). Albeit not very clearly, the author presents TED as a comprehensive database of procurement notices—ie, as if 100% of procurement expenditure by Member States was recorded there. However, in the specific context of bunching below thresholds, the TED database is very likely to be incomplete.

Contracting authorities tendering contracts below EU thresholds are under no obligation to publish a contract notice (Art 49 Dir 2014/24). They could publish voluntarily, in particular in the form of a voluntary ex ante transparency (VEAT) notice, but that would make no sense from the perspective of a contracting authority that seeks to avoid compliance with EU rules by bunching (ie manipulating) the estimated contract value, as that would expose it to potential litigation. Most authorities that are bunching their procurement needs (or, in simple terms) avoiding compliance with the EU rules will not be reflected in the TED database at all, or will not be identified by the methodology used by Tas (2019a), as they will not have filed any notices for contracts below thresholds.

How is it possible that TED includes notices regarding contracts below the EU thresholds, then? Well, this is anybody’s guess, but mine is that a large proportion of those notices will be linked to either countries with a tradition of full transparency (over-reporting), to contracts where there are any doubts about the potential cross-border interest (sometimes assessed over-cautiously), or will be notices with mistakes, where the estimated value of the contract is erroneously indicated as below thresholds.

Even if my guess was incorrect and all notices for contracts with a value below thresholds were accurate and justified by the existence of a potential cross-border interest, the database cannot be considered complete. One of the issues raised (imperfectly) by the Single Market Scoreboard (indicator [3] publication rate) is the relatively low level of procurement that is advertised in TED compared to the (putative/presumptive) total volume of procurement expenditure by the Member States. Without information on the conditions of the vast majority of contract awards (below thresholds, unreported, etc), any analysis of potential losses of competitiveness / efficiency in public expenditure (due to bunching or otherwise) is bound to be misleading.

Moreover, Tas (2019a) is premised on the hypothesis that procurement below EU thresholds allows for significantly more discretion than procurement above those thresholds. However, this hypothesis fails to recognise the variety of transposition strategies at Member State level. While some countries have opted for less stringent below EU threshold regimes, others have extended the EU rules to the entirety of their procurement (or, perhaps, to contracts up to and including much lower values than the EU thresholds, to the exception of some class of ‘micropurchases’). This would require the introduction of a control that could refine Tas’ analysis and distinguish those cases of bunching that do lead to more discretion and those that do not (at least formally)—which could perhaps distinguish between price effects derived from national-only transparency from those of more legally-dubious maneuvering.

In my view, regardless of the methodology and the math underpinning the paper (which I am in no position to assess in detail), once these data issues are taken into account, the story the paper tries to tell breaks down and there are important shortcomings in its empirical strategy that, in my view, raise significant issues around the strength of its findings—assessed not against the information in TED, but against the (largely unknown, unrecorded) reality of procurement in the EU.

I have no doubt that there is bunching in practice, and that the intuition that it raises procurement costs must be right, but I have serious doubts about the possibility to reliably identify bunching or estimate its effects on the basis of the information in TED, as most culprits will not be included and the effects of below threshold (national) competition only will mostly not be accounted for.

(Good) Regulation, Competition & Cost-Effectiveness (Tas: 2019b)

It is also a very intuitive hypothesis that better regulation should lead to better procurement outcomes and, consequently, that more open and robust procurement rules should lead to more efficiency in the expenditure of public funds. As mentioned above, Tas (2019b) explores this hypothesis and seeks to empirically test it using the TED database and the World Bank’s Benchmarking Public Procurement (in its 2017 iteration, see here). I will not repeat my misgivings about the use of the TED database as a reliable source of information. In this second part, I will solely comment on the use of the WB’s benchmark.

The paper relies on four of the WB’s benchmark indicators (one further constructed by Djankov et al (2017)): the ‘bid preparation score, bid and contract management score, payment of suppliers score and PP overall index’. The paper includes a useful table with these values (see Tas (2019b: Table 4)), which allows the author to rank the countries according to the quality of their procurement regulation. The findings of Tas (2019b) are thus entirely dependent on the quality of the WB’s benchmark and its ability to capture (and distinguish) good procurement regulation.

In order to test the extent to which the WB’s benchmark is a good input for this sort of analysis, I have compared it to the indicator that results from the European Commission’s Single Market Scoreboard for Public Procurement (SMSPP, in its 2018 iteration). The comparison is rather striking …

Clearly, both sets of indicators are based on different methodologies and measure relatively different things. However, they are both intended to express relevant regulators’ views on what constitutes ‘good procurement regulation’. In my view, both of them fail to do so for reasons already given (see here and here).

The implications for work such as Tas (2019b) is that the reliability of the findings—regardless of the math underpinning them—is as weak as the indicators they are based on. Likely, plugging the same methods to the SMSPP instead of the WB’s index would yield very different results—perhaps, that countries with very low quality of procurement regulation (as per the SMSPP index) achieve better economic results, which would not be a popular story with policy-makers… and the results with either index would also be different if the algorithms were not fed by TED, but by a more comprehensive and reliable database.

So, the most that can be said is that attempts to empirically show effects of good (or poor) procurement regulation remain doomed to fail or , in perhaps less harsh terms, doomed to tell a story based on a very skewed, narrow and anecdotal understanding of procurement and an incomplete recording of procurement activity. Believe those stories at your own peril…

Data and procurement policy: some thoughts on the Single Market Scoreboard for public procurement

April 16, 2019

There is a growing interest in the use of big data to improve public procurement performance and to strengthen procurement governance. This is a worthy endeavour and, like many others, I am concentrating my research efforts in this area. I have not been doing this for too long. However, soon after one starts researching the topic, a preliminary conclusion clearly emerges: without good data, there is not much that can be done. No data, no fun. So far so good.

It is thus a little discouraging to confirm that, as is widely accepted, there is no good data architecture underpinning public procurement practice and policy in the EU (and elsewhere). Consequently, there is a rather limited prospect of any real implementation of big data-based solutions, unless and until there is a significant investment in the creation of a proper data foundation that can enable advanced analysis and policy-making. Adopting the Open Contracting Data Standard for the European Union would be a good place to start. We could then discuss to what extent the data needs to be fully open (hint: it should not be, see here and here), but let’s save that discussion for another day.

What a recent twitter threat has reminded me is that there is a bigger downside to the existence of poor data than being unable to apply advanced big data analytics: the formulation of procurement policy on the basis of poor data and poor(er) statistical analysis.

This reflection emerged on the basis of the 2018 iteration of the Single Market Scoreboard for Public Procurement (the SMSPP), which is the closest the European Commission is getting to data-driven policy analysis, as far as I can see. The SMSPP is still work in progress. As such, it requires some close scrutiny and, in my view, strong criticism. As I will develop in the rest of this post, the SMSPP is problematic not solely in the way it presents information—which is clearly laden by implicit policy judgements of the European Commission—but, more importantly, due to its inability to inform either cross-sectional (ie comparative) or time series (ie trend) analysis of public procurement policy in the single market. Before developing these criticisms, I will provide a short description of the SMSPP (as I understand it).

The Single Market Scoreboard for Public Procurement: what is it?

The European Commission has developed the broader Single Market Scoreboard (SMS) as an instrument to support its effort of monitoring compliance with internal market law. The Commission itself explains that the “scoreboard aims to give an overview of the practical management of the Single Market. The scoreboard covers all those areas of the Single Market where sufficient reliable data are available. Certain areas of the Single Market such as financial services, transport, energy, digital economy and others are closely monitored separately by the responsible Commission services“ (emphasis added). The SMS organises information in different ways, such as by stage in the governance cycle; by performance per Member State; by governance tool; by policy area or by state of trade integration and market openness (the latter two are still work in progress).

The SMS for public procurement (SMSPP) is an instance of SMS by policy area. It thus represents the Commission’s view that the SMSPP is (a) based on sufficiently reliable data, as it is fed from the database resulting from the mandatory publications of procurement notices in the Tenders Electronic Daily (TED), and (b) a useful tool to provide an overview of the functioning of the single market for public procurement or, in other words of the ‘performance’ of public procurement, defined as a measure of ‘whether purchasers get good value for money‘.

The SMSPP determines the overall performance of a given Member States by aggregating a number of indicators. Currently, the SMSPP is based on 12 indicators (it used to be based on a smaller number, as discussed below): [1] Single bidder; [2] No calls for bids; [3] Publication rate; [4] Cooperative procurement; [5] Award criteria; [6] Decision speed; [7] SME contractors; [8] SME bids; [9] Procedures divided into lots; [10] Missing calls for bids; [11] Missing seller registration numbers; [12] Missing buyer registration numbers. As the SMSPP explains, the addition of these indicators results in the measure of ‘overall performance’, which

is a sum of scores for all 12 individual indicators (by default, a satisfactory performance in an individual indicator increases the overall score by one point while an unsatisfactory performance reduces it by one point). The 3 most important are triple-weighted (Single bidder, No calls for bids and Publication rate). This is because they are linked with competition, transparency and market access–the core principles of good public procurement. Indicators 7-12 receive a one-third weighting. This is because they measure the same concepts from different perspectives: participation by small firms (indicators 7-9) and data quality (indicators 10-12).

The most recent snapshot of overall procurement performance is represented in the map below, which would indicate that procurement policy is rather disfunctional—as most EEA countries do not seem to be doing very well.

Source: European Commission, 2018 Single Market Scorecard for Public Procurement (based on 2017 data).

In my view, this use of the available information is very problematic: (a) to begin with, because the data in TED can hardly be considered ‘sufficiently reliable‘. The database in TED has problems of various sorts because it is a database that is constructed as a result of the self-declaration of data by the contracting authorities of the Member States, which makes its content very dishomogeneous and difficult to analyse, including significant problems of under-inclusiveness, definitional fuzziness and the lack of filtering of errors—as recognised, repeatedly, in the methodology underpinning the SMSPP itself. This should make one take the results of the SMSPP with more than a pinch of salt. However, these are not all the problems implicit in the SMSPP.

More importantly: (b) the definition of procurement performance and the ways in which the SMSPP seeks to assess it are far from universally accepted. They are rather judgement-laden and reflect the policy biases of the European Commission without making this sufficiently explicit. This issue requires further elaboration.

The SMSPP as an expression of policy-making: more than dubious judgements

I already criticised the Single Market Scoreboard for public procurement three years ago, mainly on the basis that some of the thresholds adopted by the European Commission to establish whether countries performed well or poorly in relation to a given indicator were not properly justified or backed by empirical evidence. Unfortunately, this remains the case and the Commission is yet to make a persuasive case for its decision that eg, in relation to indicator [4] Cooperative procurement, countries that aggregate 10% or more of their procurement achieve good procurement performance, while countries that aggregate less than 10% do not.

Similar issues arise with other indicators, such as [3] Publication rate, which measures the value of procurement advertised on TED as a proportion of national Gross Domestic Product (GDP). It is given threshold values of more than 5% for good performance and less than 2.5% for poor performance. The Commission considers that this indicator is useful because ‘A higher score is better, as it allows more companiesto bid, bringing better value for money. It also means greater transparency, as more information is available to the public.’ However, this is inconsistent with the fact that the SMSPP methodology stresses that it is affected by the ‘main shortcoming … that it does not reflect the different weight that government spending has in the economy of a particular’ Member State (p. 13). It also fails to account for different economic models where some Member States can retain a much larger in-house capability than others, as well as failing to reflect other issues such as fiscal policies, etc. Moreover, the SMSPP includes a note that says that ‘Due to delays in data availability, these results are based on 2015 data (also used in the 2016 scoreboard). However, given the slow changes to this indicator, 2015 results are still relevant.‘ I wonder how is it possible to establishes that there are ‘slow changes’ to the indicator where there is no more current information. On the whole, this is clearly an indicator that should be dropped, rather than included with such a phenomenal number of (partially hidden) caveats.

On the whole, then, the SMSPP and a number of the indicators on which it is based is reflective of the implicit policy biases of the European Commission. In my view, it is disingenuous to try to save this by simply stressing that the SMSPP and its indicators

Like all indicators, however, they simplify reality. They are affected by country-specific factors such as what is actually being bought, the structure of the economies concerned, and the relationships between different tendering options, none of which are taken into account. Also, some aspects of public procurement have been omitted entirely or covered only indirectly, e.g. corruption, the administrative burden and professionalism. So, although the Scoreboard provides useful information, it gives only a partial view of EU countries' public procurement performance.

I would rather argue that, in these conditions, the SMSPP is not really useful. In particular, because it fails to enable analysis that could offer some valuable insights even despite the shortcomings of the underlying indicators: first, a cross-sectional analysis by comparing different countries under a single indicator; second, a trend analysis of evolution of procurement “performance” in the single market and/or in a given country.

The SMSPP and cross-sectional analysis: not fit for purpose

This criticism is largely implicit in the previous discussion, as the creation of indicators that are not reflective of ‘country-specific factors such as what is actually being bought, the structure of the economies concerned, and the relationships between different tendering options’ by itself prevents meaningful comparisons across the single market. Moreover, a closer look at the SMSPP methodology reveals that there are further issues that make such cross-sectional analysis difficult. To continue the discussion concerning indicator [4] Cooperative procurement, it is remarkable that the SMSPP methodology indicates that

[In previous versions] the only information on cooperative procurement was a tick box indicating that "The contracting authority is purchasing on behalf of other contracting authorities". This was intended to mean procurement in one of two cases: "The contract is awarded by a central purchasing body" and "The contract involves joint procurement". This has been made explicit in the [current methodology], where these two options are listed instead of the option on joint procurement. However, as always, there are exceptions to how uniformly this definition has been accepted across the EU. Anecdotally, in Belgium, this field has been interpreted as meaning that the management of the procurement procedure has been outsource[d] (e.g. to a legal company) -which explains the high values of this indicator for Belgium.

In simple terms, what this means is that the data point for Belgium (and any other country?) should have been excluded from analysis. In contrast, the SMSPP presents Belgium as achieving a good performance under this indicator—which, in turn, skews the overall performance of the country (which is, by the way, one of the few achieving positive overall performance… perhaps due to these data issues?).

This should give us some pause before we decide to give any meaning to cross-country comparisons at all. Additionally, as discussed below, we cannot (simply) rely on year-on-year comparisons of the overall performance of any given country.

The SMSPP and time series analysis: not fit for purpose

Below is a comparison of the ‘overall performance’ maps published in the last five iterations of the SMSPP.

Source: own elaboration, based on the European Commission’s Single Market Scoreboard for Public Procurement for the years 2014-2018 (please note that this refers to publication years, whereas the data on which each of the reports is based correspond… — Source: own elaboration, based on the European Commission’s Single Market Scoreboard for Public Procurement for the years 2014-2018 (please note that this refers to publication years, whereas the data on which each of the reports is based corresponds to the previous year).

One would be tempted to read these maps as representing a time series and thus as allowing for trend analysis. However, that is not the case, for various reasons. First, the overall performance indicator has been constructed on the basis of different (sub)indicators in different iterations of the SMSPP:

the 2014 iteration was based on three indicators: bidder participation; accessibility and efficiency.
the 2015 SMSPP included six indicators: single bidder; no calls for bids; publication rate; cooperative procurement; award criteria and decision speed.
the 2016 SMSPP also included six indicators. However, compared to 2015, the 2016 SMSPP omitted ‘publication rate’ and instead added an indicator on ‘reporting problems’.
the 2017 SMSPP expanded to 9 indicators. Compared to 2016, the 2017 SMSPP reintroduced ‘publication rate’ and replaced ‘reporting problems’ for indicators on ‘missing values’, ‘missing calls for bids’ and ‘missing registration numbers’.
the 2018 SMSPP, as mentioned above, is based on 12 indicators. Compared to 2017, the 2018 SMSPP has added indicators on ‘SME contractors’, ‘SME bids’ and ‘procedures divided into lots’. It has also deleted the indicator ‘missing values’ and disaggregated the ‘missing registration numbers’ into ‘missing seller registration numbers’ and ‘missing buyer registration numbers’.

It is plain that there are no two consecutive iterations of the SMSPP based on comparable indicators. Moreover, the way that the overall performance is determined has also changed. While the SMSPP for 2014 to 2017 established the overall performance as a ‘deviation from the average’ of sorts, whereby countries were given ‘green’ for overall marks above 90% of the average mark, ‘yellow’ for overall marks between 80 and 90% of the average mark, and ‘red’ for marks below 80% of the average mark; in the 2018 SMSPP, ‘green’ indicates a score above 3, ‘yellow’ indicates a score below 3 and above -3, and ‘red’ indicates a score below -3. In other words, the colour coding for the maps has changed from a measure of relative performance to a measure of absolute performance—which, in fairness, could be more meaningful.

As a result of these (and, potentially, other) issues, the SMSPP is clearly unable to support trend analysis, either at single market or country level. However, despite the disclaimers in the published documents, this remains a risk (to the extent that anyone really engages with the SMSPP).

Overall conclusion

The example of the SMSPP does not augur very well for the adoption of data analytics-based policy-making. This is a case where, despite acknowledging shortcomings in the methodology and the data, the Commission has pressed on, seemingly on the premise that ‘some data (analysis) is better than none’. However, in my view, this is the wrong approach. To put it plainly, the SMSPP is rather useless. However, it may create the impression that procurement data is being used to design policy and support its implementation. It would be better for the Commission to stop publishing the SMSPP until the underlying data issues are corrected and the methodology is streamlined. Otherwise, the Commission is simply creating noise around data-based analysis of procurement policy, and this can only erode its reputation as a policy-making body and the guardian of the single market.