Creating (positive) friction in AI procurement

I had the opportunity to participate in the Inaugural AI Commercial Lifecycle and Procurement Summit 2024 hosted by Curshaw. This was a very interesting ‘unconference’ where participants offered to lead sessions on topics they wanted to talk about. I led a session on ‘Creating friction in AI procurement’.

This was clearly a counterintuitive way of thinking about AI and procurement, given that the ‘big promise’ of AI is that it will reduce friction (eg through automation, and/or delegation of ‘non-value-added’ tasks). Why would I want to create friction in this context?

The first clarification I was thus asked for was whether this was about ‘good friction’ (as opposed to old bad ‘red tape’ kind of friction), which of course it was (?!), and the second, what do I mean by friction.

My recent research on AI procurement (eg here and here for the book-long treatment) has led me to conclude that we need to slow down the process of public sector AI adoption and to create mechanisms that bring back to the table the ‘non-AI’ option and several ‘stop project’ or ‘deal breaker’ trumps to push back against the tidal wave of unavoidability that seems to dominate all discussions on public sector digitalisation. My preferred solution is to do so through a system of permissioning or licencing administered by an independent authority—but I am aware and willing to concede that there is no political will for it. I thus started thinking about second-best approaches to slowing public sector AI procurement. This is how I got to the idea of friction.

By creating friction, I mean the need for a structured decision-making process that allows for collective deliberation within and around the adopting institution, and which is supported by rigorous impact assessments that tease out second and third order implications from AI adoption, as well as thoroughly interrogating first order issues around data quality and governance, technological governance and organisational capability, in particular around risk management and mitigation. This is complementary—but hopefully goes beyond—emerging frameworks to determine organisational ‘risk appetite’ for AI procurement, such as that developed by the AI Procurement Lab and the Centre for Inclusive Change.

The conversations the focus on ‘good friction’ moved in different directions, but there are some takeaways and ideas that stuck with me (or I managed to jot down in my notes while chatting to others), such as (in no particular order of importance or potential):

  • the potential for ‘AI minimisation’ or ‘non-AI equivalence’ to test the need for (specific) AI solutions—if you can sufficiently approximate, or replicate, the same functional outcome without AI, or with a simpler type of AI, why not do it that way?;

  • the need for a structured catalogue of solutions (and components of solutions) that are already available (sometimes in open access, where there is lots of duplication) to inform such considerations;

  • the importance of asking whether procuring AI is driven by considerations such as availability of funding (is this funded if done with AI but not funded, or hard to fund at the same level, if done in other ways?), which can clearly skew decision-making—the importance of considering the effects of ‘digital industrial policy’ on decision-making;

  • the power (and relevance) of the deceptively simple question ‘is there an interdisciplinary team to be dedicated to this, and exclusively to this’?;

  • the importance of knowledge and understanding of the tech and its implications from the beginning, and of expertise in the translation of technical and governance requirements into procurement requirements, to avoid ‘games of chance’ whereby the use of ‘trendy terms’ (such as ‘agile’ or ‘responsible’) may or may not lead to the award of the contract to the best-placed and best-fitting (tech) provider;

  • the possibility to adapt civic monitoring or social witnessing mechanisms used in other contexts, such as large infrastructure projects, to be embedded in contract performance and auditing phases;

  • the importance of understanding displacement effects and whether deploying a solution (AI or automation, or similar) to deal with a bottleneck will simply displace the issue to another (new) bottleneck somewhere along the process;

  • the importance of understanding the broader organisational changes required to capture the hoped for (productivity) gains arising from the tech deployment;

  • the importance of carefully considering and resourcing the much needed engagement of the ‘intelligent person’ that needs to check the design and outputs of the AI, including frontline workers and those at the receiving end of the relevant decisions or processes and the affected communities—the importance of creating meaningful and effective deliberative engagement mechanisms;

  • relatedly, the need to ensure organisational engagement and alignment at every level and every step of the AI (pre)procurement process (on which I would recommend reading this recent piece by Kawakami and colleagues);

  • the need to assess the impacts of changes in scale, complexity, and error exposure;

  • the need to create adequate circuit-breakers throughout the process.

Certainly lots to reflect on and try to embed in future research and outreach efforts. Thanks to all those who participated in the conversation, and to those interested in joining it. A structured way to do so is through this LinkedIn group.

Where does the proposed EU AI Act place procurement?

Thinking about some of the issues raised in the earlier post ‘Can the robot procure for you?,’ I have now taken a close look at the European Commission’s Proposal for an Artificial Intelligence Act (AIA) to see how it approaches the use of AI in procurement procedures. It may (not) come as a surprise that the AI Act takes an extremely light-touch approach to the regulation of AI uses in procurement and simply subjects them to (yet to be developed) voluntary codes of conduct. I will detail my analysis of why this is the case in this post, as well as some reasons why I do not find it satisfactory.

Before getting to the details, it is worth stressing that this is reflective of a broader feature of the AIA: its heavy private sector orientation. When it comes to AI uses by the public sector, other than prohibiting some massive surveillance by the State (both for law enforcement and to generate a system of social scoring) and classifying as high-risk the most obvious AI uses by the law enforcement and judicial authorities (all of which are important, of course), the AIA remains silent on the use of AI in most administrative procedures, with the only exception of those concerning social benefits.

This approach could be generally justified by the limits to EU competence and, in particular, those derived from the principle of administrative self-organisation of the Member States. However, given the very broad approach taken by the Commission on the interpretation and use of Article 114 TFEU (which is the legal basis for the AIA, more below), this is not entirely consistent. It could rather be that the specific uses of AI by the public sector covered in the proposal reflect the increasingly well-known problematic uses of (biased) AI solutions in narrow aspects of public sector activity, rather than a broader reflection on the (still unknown, or still unimplemented) uses that could be problematic.

While the AIA is ‘future-proofed’ by including criteria for the inclusion of further use cases in its ‘high-risk’ category (which determines the bulk of compliance obligations), it is difficult to see how those criteria are suited to a significant expansion of the regulatory constraints to AI uses by the public sector, including in procurement. Therefore, as a broader point, I submit that the proposed AIA needs some revision to make it more suited to the potential deployment of AI by the public sector. To reflect on that, I am co-organising a webinar on ’Digitalization and AI decision-making in administrative law proceedings’, which will take place on 15 Nov 2021, 1pm UK (save the date, registration and more details here). All welcome.

Background on the AIA

Summarising the AIA is both difficult and has already been done (see eg this quick explainer of the Centre for Data Innovation, and for an accessible overview of the rationale and regulatory architecture of the AIA, this master class by Prof Christiane Wendehorst). So, I will just highlight here a few issues linked to the analysis of procurement’s position within its regulatory framework.

The AIA seeks to establish a proportionate approach to the regulation of AI deployment and use. While its primary concern is with the consolidation of the EU Single Digital Market and the avoidance of regulatory barriers to the circulation of AI solutions, its preamble also points to the need to ensure the effectiveness of EU values and, crucially, the fundamental rights in the Charter of Fundamental Rights of the EU.

Importantly for the purposes of our discussion, recital (28) AIA stresses that ‘The extent of the adverse impact caused by the AI system on the fundamental rights protected by the Charter is of particular relevance when classifying an AI system as high-risk. Those rights include ... right to an effective remedy and to a fair trial [Art 47 Charter] … [and] right to good administration {Art 41 Charter]’.

The AIA seeks to create such a proportionate approach to the regulation of AI by establishing four categories of AI uses: prohibited, high-risk, limited risk requiring transparency measures, and minimal risk. The two categories that carry regulatory constraints or compliance obligations are those concerning high-risk (Arts 8-15 AIA), and limited risk requiring transparency measures (Art 52 AIA, which also applies to some high-risk AI). Minimal risk AI uses are left unregulated, although the AIA (Art 69) seeks to promote the development of codes of conduct intended to foster voluntary compliance with the requirements applicable to high-risk AI systems.

Procurement within the AIA

Procurement AI practices could not be classified as prohibited uses (Art 5 AIA), except in the difficult to imagine circumstances in which they deployed subliminal techniques. It is also difficult to see how they could fall under the regime applicable to uses requiring special transparency (Art 52) because it only applies to AI systems intended to interact with natural persons, which must be ‘designed and developed in such a way that natural persons are informed that they are interacting with an AI system, unless this is obvious from the circumstances and the context of use.’ It would not be difficult for public buyers using external-facing AI solutions (eg chatbots seeking to guide tenderers through their e-submissions) to make it clear that the tenderers are interacting with an AI solution. And, even if not, the transparency obligations are rather minimal.

So, the crux of the issue rests on whether procurement-related AI uses could be classified as high-risk. This is regulated in Art 6 AIA, which cross-refers to Annex III AIA. The Annex contains a numerus clausus of high-risk AI uses, which is however susceptible of amendment under the conditions specified in Art 7 AIA. Art 6/Annex III do not contain any procurement-related AI uses. The only type of AI use linked to administrative procedures concerns ‘AI systems intended to be used by public authorities or on behalf of public authorities to evaluate the eligibility of natural persons for public assistance benefits and services, as well as to grant, reduce, revoke, or reclaim such benefits and services’ (Annex III(5)(a) AIA).

Clearly, then, procurement-related AI uses are currently left to the default category of those with minimal risk and, thus, subjected only to voluntary self-regulation via codes of conduct.

Could this change in the future?

Art 7 AIA establishes the following two cumulative criteria: (a) the AI systems are intended to be used in any of the areas listed in points 1 to 8 of Annex III; and (b) the AI systems pose a risk of harm to the health and safety, or a risk of adverse impact on fundamental rights, that is, in respect of its severity and probability of occurrence, equivalent to or greater than the risk of harm or of adverse impact posed by the high-risk AI systems already referred to in Annex III.

The first hurdle in getting procurement-related AI uses included in Annex III in the future is formal and concerns the interpretation of the categories listed therein. There are only two potential options: nesting them under uses related to ‘Access to and enjoyment of essential private services and public services and benefits’, or uses related to ‘Administration of justice and democratic processes’. It could (theoretically) be possible to squeeze them in one of them (perhaps the latter easier than the former), but this is by no means straightforward and, given the existing AI uses in each of the two categories, I would personally be disinclined to engage in such broad interpretation.

Even if that hurdle was cleared, the second hurdle is also challenging. Art 7(2) AIA establishes the criteria to assess that an AI use poses a sufficient ‘risk of adverse impact on fundamental rights’. Of those criteria, there are three that in my view would make it very difficult to classify procurement-related AI uses as high-risk. Those criteria require the European Commission to consider:

(c) the extent to which the use of an AI system has already caused … adverse impact on the fundamental rights or has given rise to significant concerns in relation to the materialisation of such … adverse impact, as demonstrated by reports or documented allegations submitted to national competent authorities;

(d) the potential extent of such harm or such adverse impact, in particular in terms of its intensity and its ability to affect a plurality of persons;

(e) the extent to which potentially harmed or adversely impacted persons are dependent on the outcome produced with an AI system, in particular because for practical or legal reasons it is not reasonably possible to opt-out from that outcome;

(g) the extent to which the outcome produced with an AI system is easily reversible …;

Meeting these criteria would require for the relevant AI systems to basically be making independent or fully automated decisions (eg on award of contract, or exclusion of tenderers), so that their decisions would be seen to affect the effectiveness of Art 41 and 47 Charter rights; as well as a (practical) understanding that those decisions cannot be easily reversed. Otherwise, the regulatory threshold is so high that most likely procurement-related AI uses (screening, recommender systems, support to human decision-making (eg automated evaluation of tenders), etc) are unlikely to be considered to pose a sufficient ‘risk of adverse impact on fundamental rights’.

Could Member States go further?

As mentioned above, one of the potential explanations for the almost absolute silence on the use of AI in administrative procedures in the AIA could be that the Commission considers that this aspect of AI regulation belongs to each of the Member States. If that was true, then Member States could further than the code of conduct self-regulatory approach resulting from the AIA regulatory architecture. An easy approach would be to eg legally mandate compliance with the AIA obligations for high-risk AI systems.

However, given the internal market justification of the AIA, to be honest, I have my doubts that such a regulatory intervention would withstand challenges on the basis of general EU internal market law.

The thrust of the AIA competential justification (under Art 114 TFEU, see point 2.1 of the Explanatory memorandum) is that

The primary objective of this proposal is to ensure the proper functioning of the internal market by setting harmonised rules in particular on the development, placing on the Union market and the use of products and services making use of AI technologies or provided as stand-alone AI systems. Some Member States are already considering national rules to ensure that AI is safe and is developed and used in compliance with fundamental rights obligations. This will likely lead to two main problems: i) a fragmentation of the internal market on essential elements regarding in particular the requirements for the AI products and services, their marketing, their use, the liability and the supervision by public authorities, and ii) the substantial diminishment of legal certainty for both providers and users of AI systems on how existing and new rules will apply to those systems in the Union.

All of those issues would arise if each Member State adopted its own rules constraining the use of AI for administrative procedures not covered by the AIA (either related to procurement or not), so the challenge to that decentralised approach on grounds of internal market law by eg providers of procurement-related AI solutions capable of deployment in all Member States but burdened with uneven regulatory requirements seems quite straightforward (if controversial), especially given the high level of homogeneity in public procurement regulation resulting from the 2014 Public Procurement Package. Not to mention the possibility of challenging those domestic obligation on grounds that they go further than the AIA in breach of Art 16 Charter (freedom to conduct a business), even if this could face some issues resulting from the interpretation of Art 51 thereof.

Repositioning procurement (and other aspects of administrative law) in the AIA

In my view, there is a case to be made for the repositioning of procurement-related AI uses within the AIA, and its logic can apply to other areas of administrative law/activity with similar market effects.

The key issue is that the development of AI solutions to support decision-making in the public sector not only concerns the rights of those directly involved or affected by those decisions, but also society at large. In the case of procurement, eg the development of biased procurement evaluation or procurement recommender systems can have negative social effects via its effects on the market (eg on value for money, to mention the most obvious) that are difficult to identify in single tender procurement decisions.

Moreover, it seems that the public administration is well-placed to comply with the requirements of the AIA for high-risk AI systems as a matter of routine procedure, and the arguments on the need to take a proportionate approach to the regulation of AI so as not to stifle innovation lose steam and barely have any punch when it comes to imposing them on the public sector user. Further, to a large extent, the AIA requirements seem to me mostly aligned with the requirements for running a proper (and challenge proof) eProcurement system, and they would also facilitate compliance with duties of good administration when specific decisions are challenged.

Therefore, on balance, I see no good reason not to expand the list in Annex III AIA to include the use of AI systems in all administrative procedures, and in particular in public procurement and in other regulatory sectors where ex post interventions to correct market distortions resulting from biased AI implementations can simply be practically impossible. I submit that this should be done before its adoption.

Some thoughts on evaluation framing, based on my academic experience with REF2021

gradusnik-1280x720.jpg

As an academic, you are frequently required to evaluate other people’s work—and, for obvious reasons, your work is also permanently being assessed. After having completed quite a few evaluation tasks in the last few months, and having received feedback on my own work, I have generated some thoughts that I will seek to organise here.

These thoughts mainly concern a hypothesis or hunch I have developed, which would posit that framing the evaluation in ways that seek to obtain more information about the reasons for a specific ‘grade’ can diminish the quality of the evaluation. This is due to the fact that the evaluator can externalise the uncertainty implicit in the qualitative evaluation. Let me try to explain.

Some background

Academic evaluations come in different flavours and colours. There is the rather obvious assessment of students’ work (ie marking). There are the also well-known peer-review assessments of academic papers. The scales used (eg 1 to 10 for students, or a four-point scale involving rejection/major corrections (aka revise & resubmit)/minor corrections/acceptance for paper) for these type of evaluations are generally well-known in each relevant context, and can be applied with varying degrees of opacity or the reviewers’ and the reviewees’ identities.

There are perhaps less well-known evaluations of colleagues’ work for promotion purposes, as well as the evaluation of funding proposals or the assessment of academic outputs for other (funding-related) purposes, such as the REF2021 in the case of English universities.

The REF2021 provides the framework in which I can structure my thoughts more easily.

Internal REF2021 evaluations

REF2021 is an exercise whereby academic outputs (among other things) are rated on a five point scale—from unclassified (= 0) to a maximum of 4*. The rating is supposed to be based on a holistic assessment of the three notoriously (let’s say, porous) concepts of ‘originality, significance and rigour'—although there are lengthy explanations on their intended interpretation.

The difficult evaluation dynamic that the REF2021 has generated is a guessing game whereby universities try to identify which of the works produced by their academics (during the eligible period) are most likely to be ranked at 4* by the REF panel, as that is where the money is (and perhaps more importantly, the ‘marker of prestige’ that is supposed to follow the evaluation, which in turn feeds into university rankings… etc).

You would think that the relevant issue when asked to assess a colleague’s work (whether anonymously or not, let’s leave that aside) for ‘REF-purposes’ would be for you to express your academic criterion in the same way as the experts in the panel will. That is, giving it a mark of 0 to 4*. That gives you five evaluation steps and you need to place the work in one of them. This is very difficult and there is a mix of conflicting loyalties, relative expertise gaps, etc that will condition that decision. That is why the evaluation is carried out by (at least) two independent evaluators, with possible intervention of a third (or more) in case of significant discrepancies.

Having to choose a specific rating between 0 and 4* forces the evaluator to internalise any uncertainties in its decision. This is a notoriously invidious exercise and the role of internal REF evaluator is unenviable.

It also creates a difficulty for decision-makers tasked with establishing the overall REF submission—in the best case scenario, thus having to chose the ‘best 4*’ of a pool of academic outputs internally assessed at 4* that exceeds the maximum allowed submissions. Decision-makers have nothing but the rating (4*) on which to choose. So it is tempting to introduce additional mechanisms to gather more information from the internal assessors in order to perform comparisons.

Change in the evaluation framing

Some of the information the decision-makers would want to gather concerns ‘how strong’ is the rating given by the evaluator with some more granularity. A temptation is to transform the 5-point scale (0 to 4) into a 9 (or even 10) point scale by halving each step (0, 0.5*, 1* etc up to 4* — or even 4.5* or 4*+)—and there are, of course, possibilities to create more steps. Another temptation is to disaggregate the rating and ask for separate marks for each of the criteria (originality, significance and rigour), with or without an overall rating.

Along the same lines, the decision-makers may also want to know how confident the evaluator is of its rating. This can be captured through narrative comments, or asking the evaluator to indicate its confidence in any scale (from low to high confidence, with as many intermediate steps as you could imagine). While all of this may create more information about the evaluation process—as well as fuel the indecision or overconfidence of the evaluator, as the case may be—I would argue that it does not result in a better rating for the purposes of the REF2021.

A more complex framing of the decision allows the evaluator to externalise the uncertainty in its decision, in particular by allowing it to avoid hard choices by using ‘boundary steps’ in the evaluation scale, as well as disclosing its level of confidence on the rating. When a 4* that had ‘only just made it’ in the mind of the evaluator morphs into a 3.5* with a moderate to high level of confidence and a qualitative indication that the evaluation could be higher, the uncertainty squarely falls with the decision-maker and not the evaluator.

As well as for other important governance reasons that need not worry us now, this is problematic in the specific REF2021 setting because of the need to reconcile more complex internal evaluations with the narrower and more rigid criteria to be applied by the external evaluators. Decision-makers faced with the task of identifying the specific academic outputs to be submitted need to deal with the uncertainty externalised by the evaluators, which creates an additional layer of uncertainty, in particular as not all evaluators will provide homogenous (additional) information (think eg of the self-assessment of the degree of confidence).

I think this also offers broader insights into the different ways in which the framing of the evaluation affects it.

Tell me the purpose and I’ll tell you the frame

I think that one of the insights that can be extracted is that the framing of the evaluation changes the process and that different frames should be applied depending on the main purpose of the exercise—beyond reaching the best possible evaluation (as that depends on issues of expertise that do not necessarily change due to framing).

Where the purpose of the exercise is to extract the maximum information from the evaluator in a standardised manner, an evaluation frame that forces commitment amongst a limited range of possible outcomes seems preferable due to the internalisation of the uncertainty in the agent that can best assess it (ie the evaluator).

Conversely, where the purpose of the exercise is to monitor the way the evaluator carries out its assessment, then a frame that generates additional information can enhance oversight, but generates fuzziness in the grades. It can also create a different set of incentives for the evaluator, depending on additional circumstances, such as identification of the evaluator and or the author of the work being evaluated, whether this is a repeated game (thus triggering reputational issues) etc.

Therefore, where the change of frame alters the dynamics and the outputs of the evaluation process, there is a risk that an evaluation system initially designed to extract expert judgment ends up being perceived as a mechanism to judge the expert. The outcomes cannot be expected to simply improve, despite the system becoming (apparently) more decision-maker friendly.

An interesting different take on public procurement decision-making (reference to Crowder, 2015)

I have just read an interesting piece of research that sheds different light on public procurement decision-making processes. That short, accessible and interesting piece, [M Crowder, "Public procurement: the role of cognitive heuristics" (2015) 35(2) Public Money & Management 127-34] explores the cognitive heuristics of public procurement processes. As the abstract makes clear
Public procurement processes have been extensively studied, but previous research has not sought to explain public procurement in terms of cognitive heuristics. This paper examines the award of a large public sector contract and outlines how the decisions were made. Heuristics were used throughout the process. Three heuristics—EBA [elimination by aspects], conjunctive, and WADD [weighted additive]—were used in combination to reduce the number of bidders for the contract from a somewhat unmanageable 63 down to four. This paper allows the underlying stages to be viewed from this perspective and therefore it explores procurement in a way that sheds new light on the processes involved.
The paper is easy to follow if one has some experience in public procurement evaluation or, absent that, some knowledge of the rules on exclusion, qualitative selection and short-listing of tenderers [for a summary of the rules under the new Directive 2014/24, see A Sanchez Graells, “Exclusion, Qualitative Selection and Short-listing”, in F Lichère, R Caranta & S Treumer (eds), Modernising Public Procurement. The New Directive, vol. 6 European Procurement Law Series (Copenhagen, DJØF, 2014) 97-129]. As the conclusions stress, the paper shows
that procurement decisions can be explained in terms of cognitive heuristics. The EBA heuristic makes a decision on the basis of a single aspect; the conjunctive heuristic makes a decision on the basis that a number of requirements are all met; and the WADD heuristic makes a decision by weighing up various factors and offsetting the good against the bad. This was reflected in the procurement under study, where the number of bidders under consideration was reduced in precisely this way.
The paper offers a good perspective to complement our understanding of procurement decision-making and provokes some thoughts on how to better regulate these processes in order to avoid weaknesses derived from cognitive biases. 

This is an area that promises to open roads towards interdisciplinary efforts to incorporate the insights of psychology and other sciences into legal research on public procurement. And this seems to me like an area of high research potential, so it may be worth keeping an eye on it!