G7 Guiding Principles and Code of Conduct on Artificial Intelligence -- some comments from a UK perspective

On 30 October 2023, G7 leaders published the Hiroshima Process International Guiding Principles for Advanced AI system (the G7 AI Principles), a non-exhaustive list of guiding principles formulated as a living document that builds on the OECD AI Principles to take account of recent developments in advanced AI systems. The G7 stresses that these principles should apply to all AI actors, when and as applicable to cover the design, development, deployment and use of advanced AI systems.

The G7 AI Principles are supported by a voluntary Code of Conduct for Advanced AI Systems (the G7 AI Code of Conduct), which is meant to provide guidance to help seize the benefits and address the risks and challenges brought by these technologies.

The G7 AI Principles and Code of Conduct came just two days before the start of the UK’s AI Safety Summit 2023. Given that the UK is part of the G7 and has endorsed the G7 Hiroshima Process and its outcomes, the interaction between the G7’s documents, the UK Government’s March 2023 ‘pro-innovation’ approach to AI and its aspirations for the AI Safety Summit deserves some comment.

G7 AI Principles and Code of Conduct

The G7 AI Principles aim ‘to promote safe, secure, and trustworthy AI worldwide and will provide guidance for organizations developing and using the most advanced AI systems, including the most advanced foundation models and generative AI systems.’ The principles are meant to be cross-cutting, as they target ‘among others, entities from academia, civil society, the private sector, and the public sector.’ Importantly, also, the G7 AI Principles are meant to be a stop gap solution, as G7 leaders ‘call on organizations in consultation with other relevant stakeholders to follow these [principles], in line with a risk-based approach, while governments develop more enduring and/or detailed governance and regulatory approaches.’

The principles include the reminder that ‘[w]hile harnessing the opportunities of innovation, organizations should respect the rule of law, human rights, due process, diversity, fairness and non-discrimination, democracy, and human-centricity, in the design, development and deployment of advanced AI system’, as well as a reminder that organizations developing and deploying AI should not undermine democratic values, harm individuals or communities, ‘facilitate terrorism, enable criminal misuse, or pose substantial risks to safety, security, and human rights’. State (AI users) are reminder of their ‘obligations under international human rights law to promote that human rights are fully respected and protected’ and private sector actors are called to align their activities ‘with international frameworks such as the United Nations Guiding Principles on Business and Human Rights and the OECD Guidelines for Multinational Enterprises’.

These are all very high level declarations and aspirations that do not go much beyond pre-existing commitments and (soft) law norms, if at all.

The G7 AI Principles comprises a non-exhaustive list of 11 high-level regulatory goals that organizations should abide by ‘commensurate to the risks’—ie following the already mentioned risk-based approach—which introduces a first element of uncertainty because the document does not establish any methodology or explanation on how risks should be assessed and tiered (one of the primary, and debated, features of the proposed EU AI Act). The principles are the following, prefaced by my own labelling between square brackets:

  1. [risk identification, evaluation and mitigation] Take appropriate measures throughout the development of advanced AI systems, including prior to and throughout their deployment and placement on the market, to identify, evaluate, and mitigate risks across the AI lifecycle;

  2. [misuse monitoring] Patterns of misuse, after deployment including placement on the market;

  3. [transparency and accountability] Publicly report advanced AI systems’ capabilities, limitations and domains of appropriate and inappropriate use, to support ensuring sufficient transparency, thereby contributing to increase accountability.

  4. [incident intelligence exchange] Work towards responsible information sharing and reporting of incidents among organizations developing advanced AI systems including with industry, governments, civil society, and academia.

  5. [risk management governance] Develop, implement and disclose AI governance and risk management policies, grounded in a risk-based approach – including privacy policies, and mitigation measures, in particular for organizations developing advanced AI systems.

  6. [(cyber) security] Invest in and implement robust security controls, including physical security, cybersecurity and insider threat safeguards across the AI lifecycle.

  7. [content authentication and watermarking] Develop and deploy reliable content authentication and provenance mechanisms, where technically feasible, such as watermarking or other techniques to enable users to identify AI-generated content.

  8. [risk mitigation priority] Prioritize research to mitigate societal, safety and security risks and prioritize investment in effective mitigation measures.

  9. [grand challenges priority] Prioritize the development of advanced AI systems to address the world’s greatest challenges, notably but not limited to the climate crisis, global health and education.

  10. [technical standardisation] Advance the development of and, where appropriate, adoption of international technical standards.

  11. [personal data and IP safeguards] Implement appropriate data input measures and protections for personal data and intellectual property.

Each of the principles is accompanied by additional guidance or precision, where possible, and this is further developed in the G7 Code of Conduct.

In my view, the list is a bit of a mixed bag.

There are some very general aspirations or steers that can hardly be considered principles of AI regulation, for example principle 9 setting a grand challenges priority and, possibly, principle 8 setting a risk mitigation priority beyond the ‘requirements’ of principle 1 on risk identification, evaluation and mitigation—which thus seems to boil down to the more specific steer in the G7 Code of Conduct for (private) organisations to ‘share research and best practices on risk mitigation’.

Quite how these principles could be complied by current major AI developers seems rather difficult to foresee, especially in relation to principle 9. Most developers of generative AI or other AI applications linked to eg social media platforms will have a hard time demonstrating their engagement with this principle, unless we accept a general justification of ‘general purpose application’ or ‘dual use application’—which to me seems quite unpalatable. What is the purpose of this principle if eg it pushes organisations away from engaging with the rest of the G7 AI Principles? Or if organisations are allowed to gloss over it in any (future) disclosures linked to an eventual mechanism of commitment, chartering, or labelling associated with the principles? It seems like the sort of purely political aspiration that may have been better left aside.

Some other principles seem to push at an open door, such as principle 10 on the development of international technical standards. Again, the only meaningful detail seems to be in the G7 Code of Conduct, which specifies that ‘In particular, organizations also are encouraged to work to develop interoperable international technical standards and frameworks to help users distinguish content generated by AI from non-AI generated content.’ However, this is closely linked to principle 7 on content authentication and watermarking, so it is not clear how much that adds. Moreover, this comes to further embed the role of industry-led technical standards as a foundational element of AI regulation, with all the potential problems that arise from it (for some discussion from the perspective of regulatory tunnelling, see here and here).

Yet other principles present as relatively soft requirements or ‘noble’ commitments issues that are, in reality, legal requirements already binding on entities and States and that, in my view, should have been placed as hard obligations and a renewed commitment from G7 States to enforce them. These include principle 11 on personal data and IP safeguards, where the G7 Code of Conduct includes as an apparent after thought that ‘Organizations should also comply with applicable legal frameworks’. In my view, this should be starting point.

This reduces the list of AI Principles ‘proper’. But, even then, they can be further grouped and synthesised, in my view. For example, principles 1 and 5 are both about risk management, with the (outward-looking) governance layer of principle 5 seeking to give transparency to the (inward-looking) governance layer in principle 1. Principle 2 seems to simply seek to extend the need to engage with risk-based management post-market placement, which is also closely connected to the (inward-looking) governance layer in principle 1. All of them focus on the (undefined) risk-based approach to development and deployment of AI underpinning the G7’s AI Principles and Code of Conduct.

Some aspects of the incident intelligence exchange also relate to principle 1, while some other aspects relate to (cyber) security issues encapsulated in principle 6. However, given that this principle may be a placeholder for the development of some specific mechanisms of collaboration—either based on cyber security collaboration or other approaches, such as the much touted aviation industry’s—it may be treated separately.

Perhaps, then, the ‘core’ AI Principles arising from the G7 document could be trimmed down to:

  • Life-cycle risk-based management and governance, inclusive of principles 1, 2, and 5.

  • Transparency and accountability, principle 3.

  • Incident intelligence exchange, principle 4.

  • (Cyber) security, principle 6.

  • Content authentication and watermarking, principle 7 (though perhaps narrowly targeted to generative AI).

Most of the value in the G7 AI Principles and Code of Conduct thus arises from the pointers for collaboration, the more detailed self-regulatory measures, and the more specific potential commitments included in the latter. For example, in relation to the potential AI risks that are identified as potential targets for the risk assessments expected of AI developers (under guidance related to principle 1), or the desirable content of AI-related disclosures (under guidance related to principle 3).

It is however unclear how these principles will evolve when adopted at the national level, and to what extent they offer a sufficient blueprint to ensure international coherence in the development of the ‘more enduring and/or detailed governance and regulatory approaches’ envisaged by G7 leaders. It seems for example striking that both the EU and the UK have supported these principles, given that they have relatively opposing approaches to AI regulation—with the EU seeking to finalise the legislative negotiations on the first ‘golden standard’ of AI regulation and the UK taking an entirely deregulatory approach. Perhaps this is in itself an indication that, even at the level of detail achieved in the G7 AI Code of Conduct, the regulatory leeway is quite broad and still necessitates significant further concretisation for it to be meaningful in operational terms—as evidenced eg by the US President’s ‘Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence’, which calls for that concretisation and provides a good example of the many areas for detailed work required to translate high level principles into actionable requirements (even if it leaves enforcement still undefined).

How do the G7 Principles compare to the UK’s ‘pro-innovation’ ones?

In March 2023, the UK Government published its white paper ‘A pro-innovation approach to AI regulation’ (the ‘UK AI White Paper’; for a critique, see here). The UK AI White Paper indicated (at para 10) that its ‘framework is underpinned by five principles to guide and inform the responsible development and use of AI in all sectors of the economy:

  • Safety, security and robustness

  • Appropriate transparency and explainability

  • Fairness

  • Accountability and governance

  • Contestability and redress’.

A comparison of the UK and the G7 principles can show a few things.

First, that there are some areas where there seems to be a clear correlation—in particular concerning (cyber) security as a self-standing challenge requiring a direct regulatory focus.

Second, that it is hard to decide at which level to place incommensurable aspects of AI regulation. Notably, the G7 principles do not directly refer to fairness—while the UK does. However, the G7 Principles do spend some time in the preamble addressing the issue of fairness and unacceptable AI use (though in a woolly manner). Whether placing this type of ‘requirement’ at a level or other makes a difference (at all) is highly debatable.

Third, that there are different ways of ‘packaging’ principles or (soft) obligations. Just like some of the G7 principles are closely connected or fold into each other (as above), so do the UK’s principles in relation to the G7’s. For example, the G7 packaged together transparency and accountability (principle 3), while the UK had them separated. While the UK explicitly mentioned the issue of AI explainability, this remains implicit in the G7 principles (also in principle 3).

Finally, in line with the considerations above, that distinct regulatory approaches only emerge or become clear once the ‘principles’ become specific (so they arguably stop being principles). For example, it seems clear that the G7 Principles aspire to higher levels of incident intelligence governance and to a specific target of generative AI watermarking than the UK’s. However, whether the G7 or the UK principles are equally or more demanding on any other dimension of AI regulation is close to impossible to establish. In my view, this further supports the need for a much more detailed AI regulatory framework—else, technical standards will entirely occupy that regulatory space.

What do the G7 AI Principles tell us about the UK’s AI Safety Summit?

The Hiroshima Process that has led to the adoption of the G7 AI Principles and Code of Conduct emerged from the Ministerial Declaration of The G7 Digital and Tech Ministers’ Meeting of 30 April 2023, which explicitly stated that:

‘Given that generative AI technologies are increasingly prominent across countries and sectors, we recognise the need to take stock in the near term of the opportunities and challenges of these technologies and to continue promoting safety and trust as these technologies develop. We plan to convene future G7 discussions on generative AI which could include topics such as governance, how to safeguard intellectual property rights including copyright, promote transparency, address disinformation, including foreign information manipulation, and how to responsibly utilise these technologies’ (at para 47).

The UK Government’s ambitions for the AI Safety Summit largely focus on those same issues, albeit within the very narrow confines of ‘frontier AI’, which it has defined as ‘highly capable general-purpose AI models that can perform a wide variety of tasks and match or exceed the capabilities present in today’s most advanced models‘. While the UK Government has published specific reports to focus discussion on (1) Capabilities and risks from frontier AI and (2) Emerging Processes for Frontier AI Safety, it is unclear how the level of detail of such narrow approach could translate into broader international commitments.

The G7 AI Principles already claim to tackle ‘the most advanced AI systems, including the most advanced foundation models and generative AI systems (henceforth "advanced AI systems")’ within their scope. It seems unclear that such approach would be based on a lack of knowledge or understanding of the detail the UK has condensed in those reports. It rather seems that the G7 was not ready to move quickly to a level of detail beyond that included in the G7 AI Code of Conduct. Whether significant further developments can be expected beyond the G7 AI Principles and Code of Conduct just two days after they were published seems hard to fathom.

Moreover, although the UK Government is downplaying the fact that eg Chinese participation in the AI Safety Summit is unclear and potentially rather marginal, it seems that, at best, the UK AI Safety Summit will be an opportunity for a continued conversation between G7 countries and a few others. It is also unclear whether significant progress will be made in a forum that seems rather clearly tilted towards industry voice and influence.

Let’s wait and see what the outcomes are, but I am not optimistic for significant progress other than, worryingly, a risk of further displacement of regulatory decision-making towards industry and industry-led (future) standards.

More model contractual AI clauses -- some comments on the SCL AI Clauses

Following the launch of the final version of the model contractual AI clauses sponsored by the European Commission earlier this month, the topic of how to develop and how to use contractual model clauses for AI procurement is getting hotter. As part of its AI Action Plan, New York City has announced that it is starting work to develop its own model clauses for AI procurement (to be completed in 2025). We can expect to see a proliferation of model AI clauses as more ‘AI legislation’ imposes constraints on contractual freedom and compliance obligations, and as different model clauses are revised to (hopefully) capture the learning from current experimentation in AI procurement.

Although not (closely) focused on procurement, a new set of interesting AI contractual clauses has been released by the Society for Computers & Law (SCL) AI Group (thanks to Gisele Waters for bringing them to my attention on LinkedIn!). In this post, I reflect on some aspects of the SCL AI clauses and try to answer Gisele’s question/challenge (below).

SCL AI Clauses

The SCL AI clauses have a clear commercial orientation and are meant as a starting point for supplier-customer negotiations, which is reflected on the fact that the proposed clauses contain two options: (1) a ‘pro-supplier’ drafting based on off-the-shelf provision, and (2) a ‘pro-customer’ drafting based on a bespoke arrangement. Following that commercial logic, most of the SCL AI clauses focus on an allocation of obligations (and thus costs and liability) between the parties (eg in relation to compliance with legal requirements).

The clauses include a few substantive requirements implicit in the allocation of the respective obligations (eg on data or third party licences) but mostly refer to detailed schedules of which there is no default proposal, or to industry standards (and thus have this limitation in common with eg the EU’s model AI clauses). The SCL AI clauses do contain some drafting notes that would help identify issues needing specific regulation in the relevant schedules, although this guidance necessarily remains rather abstract or generic.

This pro-supplier/pro-customer orientation prompted Gisele’s question/challenge, which is whether ‘there is EVER an opportunity for government (customer-buyer) to be better able to negotiate the final language with clauses like these in order to weigh the trade offs between interests?’, especially bearing in mind that the outcome of the negotiations could be strongly pro-supplier, strongly pro-customer, or balanced (and something in between those). I think that answering this question requires exploring what pro-supplier or pro-customer may mean in this specific context.

From a substantive regulation perspective, the SCL AI clauses include a few interesting elements, such as an obligation to establish a circuit-breaker capable of stopping the AI (aka an ‘off button’) and a roll-back obligation (to an earlier, non-faulty version of the AI solution) where the AI is malfunctioning or this is necessary to comply with applicable law. However, most of the substantive obligations are established by reference to ‘Good Industry Practice’, which requires some further unpacking.

SCL AI Clauses and ‘Good Industry Practice’

Most of crucial proposed clauses refer to the benchmark of ‘Good Industry Practice’ as a primary qualifier for the relevant obligations. The proposed clause on explainability is a good example. The SCL AI clause (C1.15) reads as follows:

C1.15 The Supplier will ensure that the AI System is designed, developed and tested in a way which ensures that its operation is sufficiently transparent to enable the Customer to understand and use the AI System appropriately. In particular, the Supplier will produce to the Customer, on request, information which allows the Customer to understand:

C1.15.1 the logic behind an individual output from the AI System; and

C1.15.2 in respect of the AI System or any specific part thereof, which features contributed most to the output of the AI System, in each case, in accordance with Good Industry Practice.

A first observation is that the SCL AI clauses seem to presume that off-the-shelf AI solutions would not be (necessarily) explainable, as they include no clause under the ‘pro-supplier’ version.

Second, the ‘pro-customer’ version both limits the types of explanation that would be contractually owed (to a model-level or global explanation under C1.15.2 and a limited decision-level or local explanation under C1.15.1 — which leaves out eg a counterfactual explanation, as well as not setting any specific requirements on how the explanation needs to be produced, eg is a ‘post hoc’ explanation acceptable and if so how should it be produced?) and qualifies it in two important ways: (1) the overall requirement is that the AI system’s operation should be ‘sufficiently transparent’, with ‘sufficient’ creating a lot of potential issues here; and, (2) the reference to ‘Good Industry Practice’ [more on this below].

The issue of transparency is similarly problematic in its more general treatment under another specific clause (C4.6), which also only has a ‘pro-customer’ version:

C4.6 The Supplier warrants that, so far as is possible [to achieve the intended use of the AI System / comply with the Specification], the AI System is transparent and interpretable [such that its output can be traced back to the input data] .

The qualifier ‘so far as is possible’ is again potentially quite problematic here, as are the open-ended references to transparency and interpretability of the system (with a potential conflict between interpretability for the purposes of this clause and explainability under C1.15).

What I find interesting about this clause is that the drafting notes explain that:

… the purpose of this provision is to ensure that the Supplier has not used an overly-complex algorithm if this is unnecessary for the intended use of the AI System or to comply with its Specification. That said, effectiveness and accuracy are often trade-offs for transparency in AI models.

From this perspective, I think the clause should be retitled and entirely redrafted to make explicit that the purpose is to establish a principle of ‘AI minimisation’ in the sense of the supplier guaranteeing that the AI system is the least complex that can provide the desired functionality — which, of course, has the tricky issue of trade-off and the establishment of the desired functionality in itself to work around. (and which in a procurement context would have been dealt with pre-contract, eg in the context of technical specifications and/or tender evaluation). Interestingly, this issue is another one where reference could be made to ‘Good Industry Practice’ if one accepted that it should be best practice to always use the most explainable/interpretable and most simple model available for a given task.

As mentioned, reference to ‘Good Industry Practice’ is used extensively in the SCL AI clauses, including crucial issues such as: explainability (above), user manual/user training, preventing unlawful discrimination, security (which is inclusive of cyber secturity and some aspects of data protection/privacy), or quality standards. The drafting notes are clear that

… while parties often refer to ‘best practice’ or ‘good industry practice’, these standards can be difficult to apply in developing industry. Accordingly a clear Specification is required, …

Which is the reason why the SCL AI clauses foresee that ‘Good Industry Practice’ will be a defined contract term, whereby the parties will specify the relevant requirements and obligations. And here lies the catch.

Defining ‘Good Industry Practice’?

In the SCL AI clauses, all references to ‘Good Industry Practice’ are used as qualifiers in the pro-customer version of the clauses. It is possible that the same term would be of relevance to establishing whether the supplier had discharged its reasonable duties/best efforts under the pro-supplier version (where the term would be defined but not explicitly used). In both cases, the need to define ‘Good Industry Practice’ is the Achilles heel of the model clauses, as well as a potential Trojan horse for customers seeking a seemingly pro-customer contractual design,

The fact is that the extent of the substantive obligations arising from the contract will entirely depend on how the concept of ‘Good Industry Practice’ is defined and specified. This leaves even seemingly strongly ‘pro-customer’ contracts exposed to weak substantive protections. The biggest challenge for buyers/procurers of AI will be that (1) it will be hard to know how to define the term and what standards to refer to, and (2) it will be difficult to monitor compliance with the standards, especially where those establish eg mechanisms of self-asessment by the tech supplier as the primary or sole quality control mechanims.

So, my answer to Gisele’s question/challenge would be that the SCL AI clauses, much like the EU’s, do not (and cannot?) go far enough in ensuring that the contract for the procurement/purchase of AI embeds adequate substantive requirements. The model clauses are helpful in understanding who needs to do what when, and thus who shoulders the relevant cost and risk. But they do not address the all-important question of how it needs to be done. And that is the crucial issue that will determine whether the contract (and the AI solution) really is in the public buyer’s interest and, ultimately in the public interest.

In a context where tech providers (almost always) have the upper hand in negotiations, this foundational weakness is all important, as suppliers could well ‘agree to pro-customer drafting’ and then immediately deactivate it through the more challenging and technical definition (and implementation) of ‘Good Industry Practices’.

That is why I think we need to cover this regulatory tunnelling risk and this foundational shortcoming of ‘AI regulation by contract’ by creating clear and binding requirements on the how (ie the ‘good (industry) practice’ or technical standards). The emergence of model AI contract clauses to me makes it clear that the most efficient contract design is such that it needs to refer to external benchmarks. Establishing adequarte protections and an adequate balance of risks and benefits (from a social perspective) hinges on this. The contract can then deal with an apportionment of the burdens, obligations, costs and risks stemming from the already set requirements.

So I would suggest that the focus needs to be squarely on developing the regulatory architecture that will lead us to the development of such mandatory requirements and standards for the procurement and use of AI by the public sector — which may then become adequate good industry practice for strictly commercial or private contracts. My proposal in that regard is sketched out here.

Some further thoughts on setting procurement up to fail in 'AI regulation by contract'

The next bit of my reseach project concerns the leveraging of procurement to achieve ‘AI regulation by contract’ (ie to ensure in the use of AI by the public sector: trustworthiness, safety, explainability, human rights compliance, legality especially in data protection terms, ethical use, etc), so I have been thinking about it for the last few weeks to build on my previous views (see here).

In this post, I summarise my further thoughts — which have been prompted by the rich submissions to the House of Commons Science and Technology Committee [ongoing] inquiry on the ‘Governance of Artificial Intelligence’.

Let’s do it via procurement

As a starting point, it is worth stressing that the (perhaps unsurprising) increasingly generalised position is that procurement has a key role to play in regulating the adoption of digital technologies (and AI in particular) by the public sector—which consolidates procurement’s gatekeeping role in this regulatory space (see here).

More precisely, the generalised view is not that procurement ought to play such a role, but that it can do so (effectively and meaningfully). ‘AI regulation by contract’ via procurement is seen as an (easily?) actionable policy and governance mechanism despite the more generalised reluctance and difficulties in regulating AI through general legislative and policy measures, and in creating adequate governance architectures (more below).

This is very clear in several submissions to the ongoing Parliamentary inquiry (above). Without seeking to be exhaustive (I have read most, but not all submissions yet), the following points have been made in written submissions (liberally grouped by topics):

Procurement as (soft) AI regulation by contract & ‘Market leadership’

  • Procurement processes can act as a form of soft regulation Government should use its purchasing power in the market to set procurement requirements that ensure private companies developing AI for the public sector address public standards. ’ (Committee on Standards in Public Life, at [25]-[26], emphasis added).

  • For public sector AI projects, two specific strategies could be adopted [to regulate AI use]. The first … is the use of strategic procurement. This approach utilises government funding to drive change in how AI is built and implemented, which can lead to positive spill-over effects in the industry’ (Oxford Internet Institute, at 5, emphasis added).

  • Responsible AI Licences (“RAILs”) utilise the well-established mechanisms of software and technology licensing to promote self-governance within the AI sector. RAILs allow developers, researchers, and companies to publish AI innovations while specifying restrictions on the use of source code, data, and models. These restrictions can refer to high-level restrictions (e.g., prohibiting uses that would discriminate against any individual) as well as application-specific restrictions (e.g., prohibiting the use of a facial recognition system without consent) … The adoption of such licenses for AI systems funded by public procurement and publicly-funded AI research will help support a pro-innovation culture that acknowledges the unique governance challenges posed by emerging AI technologies’ (Trustworthy Autonomous Systems Hub, at 4, emphasis added).

Procurement and AI explainability

  • public bodies will need to consider explainability in the early stages of AI design and development, and during the procurement process, where requirements for transparency could be stipulated in tenders and contracts’ (Committee on Standards in Public Life, at [17], emphasis added).

  • In the absence of strong regulations, the public sector may use strategic procurement to promote equitable and transparent AI … mandating various criteria in procurement announcements and specifying design criteria, including explainability and interpretability requirements. In addition, clear documentation on the function of a proposed AI system, the data used and an explanation of how it works can help. Beyond this, an approved vendor list for AI procurement in the public sector is useful, to which vendors that agree to meet the defined transparency and explainability requirements may be added’ (Oxford Internet Institute, at 2, referring to K McBride et al (2021) ‘Towards a Systematic Understanding on the Challenges of Procuring Artificial Intelligence in the Public Sector’, emphasis added).

Procurement and AI ethics

  • For example, procurement processes should be designed so products and services that facilitate high standards are preferred and companies that prioritise ethical practices are rewarded. As part of the commissioning process, the government should set out the ethical principles expected of companies providing AI services to the public sector. Adherence to ethical standards should be given an appropriate weighting as part of the evaluation process, and companies that show a commitment to them should be scored more highly than those that do not (Committee on Standards in Public Life, at [26], emphasis added).

Procurement and algorithmic transparency

  • … unlike public bodies, the private sector is not bound by the same safeguards – such as the Public Sector Equality Duty within the Equality Act 2010 (EA) – and is able to shield itself from criticisms regarding transparency behind the veil of ‘commercial sensitivity’. In addition to considering the private company’s purpose, AI governance itself must cover the private as well as public sphere, and be regulated to the same, if not a higher standard. This could include strict procurement rules – for example that private companies need to release certain information to the end user/public, and independent auditing of AI systems’ (Liberty, at [20]).

  • … it is important that public sector agencies are duly empowered to inspect the technologies they’re procuring and are not prevented from doing so by the intellectual property rights. Public sector buyers should use their purchasing power to demand access to suppliers’ systems to test and prove their claims about, for example, accuracy and bias’ (BILETA, at 6).

Procurement and technical standards

  • Standards hold an important role in any potential regulatory regime for AI. Standards have the potential to improve transparency and explainability of AI systems to detail data provenance and improve procurement requirements’ (Ada Lovelace Institute, at 10)

  • The speed at which the technology can develop poses a challenge as it is often faster than the development of both regulation and standards. Few mature standards for autonomous systems exist and adoption of emerging standards need to be encouraged through mechanisms such as regulation and procurement, for example by including the requirement to meet certain standards in procurement specification’ (Royal Academy of Engineering, at 8).

Can procurement do it, though?

Implicit in most views about the possibility of using procurement to regulate public sector AI adoption (and to generate broader spillover effects through market-based propagation mechanisms) is an assumption that the public buyer does (or can get to) know and can (fully, or sufficiently) specify the required standards of explainability, transparency, ethical governance, and a myriad other technical requirements (on auditability, documentation, etc) for the use of AI to be in the public interest and fully legally compliant. Or, relatedly, that such standards can (and will) be developed and readily available for the public buyer to effectively refer to and incorporate them into its public contracts.

This is a BIG implicit assumption, at least in relation with non trivial/open-ended proceduralised requirements and in relation to most of the complex issues raised by (advanced) forms of AI deployment. A sobering and persuasive analysis has shown that, at least for some forms of AI (based on neural networks), ‘it appears unlikely that anyone will be able to develop standards to guide development and testing that give us sufficient confidence in the applications’ respect for health and fundamental rights. We can throw risk management systems, monitoring guidelines, and documentation requirements around all we like, but it will not change that simple fact. It may even risk giving us a false sense of confidence’ [H Pouget, ‘The EU’s AI Act Is Barreling Toward AI Standards That Do Not Exist’ (Lawfare.com, 12 Jan 2023)].

Even for less complex AI deployments, the development of standards will be contested and protracted. This not only creates a transient regulatory gap that forces public buyers to ‘figure it out’ by themselves in the meantime, but can well result in a permanent regulatory gap that leaves procurement as the only safeguard (on paper) in the process of AI adoption in the public sector. If more general and specialised processes of standard setting are unlikely to plug that gap quickly or ever, how can public buyers be expected to do otherwise?

seriously, can procurement do it?

Further, as I wrote in my own submission to the Parliamentary inquiry, ‘to effectively regulate by contract, it is at least necessary to have (i) clarity on the content of the obligations to be imposed, (ii) effective enforcement mechanisms, and (iii) public sector capacity to establish, monitor, and enforce those obligations. Given that the aim of regulation by contract would be to ensure that the public sector only adopts trustworthy AI solutions and deploys them in a way that promotes the public interest in compliance with existing standards of protection of fundamental and individual rights, exercising the expected gatekeeping role in this context requires a level of legal, ethical, and digital capability well beyond the requirements of earlier instances of regulation by contract to eg enforce labour standards’ (at [4]).

Even optimistically ignoring the issues above and adopting the presumption that standards will emerge or the public buyer will be able to (eventually) figure it out (so we park requirement (i) for now), and also assuming that the public sector will be able to develop the required level of eg digital capability (so we also park (iii), but see here)), does however not overcome other obstacles to leveraging procurement for ‘AI regulation by contract’. In particular, it does not address the issue of whether there can be effective enforcement mechanisms within the contractual relationship resulting from a procurement process to impose compliance with the required standards (of explainability, transparency, ethical use, non-discrimination, etc).

I approach this issue as the challenge of enforcing not entirely measurable contractual obligations (ie obligations to comply with a contractual standard rather than a contractual rule), and the closest parallel that comes to my mind is the issue of enforcing quality requirements in public contracts, especially in the provision of outsourced or contracted-out public services. This is an issue on which there is a rich literature (on ‘regulation by contract’ or ‘government by contract’).

Quality-related enforcement problems relate to the difficulty of using contract law remedies to address quality shortcomings (other than perhaps price reductions or contractual penalties where those are permissible) that can do little to address the quality issues in themselves. Major quality shortcomings could lead to eg contractual termination, but replacing contractors can be costly and difficult (especially in a technological setting affected by several sources of potential vendor and technology lock in). Other mechanisms, such as leveraging past performance evaluations to eg bar access to future procurements can also do too little too late to control quality within a specific contract.

An illuminating analysis of the ‘problem of quality’ concluded that the ‘structural problem here is that reliable assurance of quality in performance depends ultimately not on contract terms but on trust and non-legal relations. Relations of trust and powerful non-legal sanctions depend upon the establishment of long-term … relations … The need for a governance structure and detailed monitoring in order to achieve co-operation and quality seems to lead towards the creation of conflictual relations between government and external contractors’ [see H Collins, Regulating Contracts (OUP 1999) 314-15].

To me, this raises important questions about the extent to which procurement and public contracts more generally can effectively deliver the expected safeguards and operate as an adequate sytem of ‘AI regulation by contract’. It seems to me that price clawbacks or financial penalties, even debarment decisions, are unilkely to provide an acceptable safety net in some (or most) cases — eg high-risk uses of complex AI. Not least because procurement disputes can take a long time to settle and because the incentives will not always be there to ensure strict enforcement anyway.

More thoughts to come

It seems increasingly clear to me that the expectations around the leveraging of procurement to ‘regulate AI by contract’ need reassessing in view of its likely effectiveness. Such effectiveness is constrained by the rules on the design of tenders for the award of public contracts, as well as those public contracts, and mechanisms to resolve disputes emerging from either tenders or contracts. The effectiveness of this approach is, of course, also constrained by public sector (digital) capability and by the broader difficulties in ascertaining the appropriate approach to (standards-based) AI regulation, which cannot so easily be set aside. I will keep thinking about all this in the process of writing my monograph. If this is of interested, keep an eye on this blog fior further thougths and analysis.

AI regulation by contract: submission to UK Parliament

In October 2022, the Science and Technology Committee of the House of Commons of the UK Parliament (STC Committee) launched an inquiry on the ‘Governance of Artificial Intelligence’. This inquiry follows the publication in July 2022 of the policy paper ‘Establishing a pro-innovation approach to regulating AI’, which outlined the UK Government’s plans for light-touch AI regulation. The inquiry seeks to examine the effectiveness of current AI governance in the UK, and the Government’s proposals that are expected to follow the policy paper and provide more detail. The STC Committee has published 98 pieces of written evidence, including submissions from UK regulators and academics that will make for interesting reading. Below is my submission, focusing on the UK’s approach to ‘AI regulation by contract’.

A. Introduction

01. This submission addresses two of the questions formulated by the House of Commons Science and Technology Committee in its inquiry on the ‘Governance of artificial intelligence (AI)’. In particular:

  • How should the use of AI be regulated, and which body or bodies should provide regulatory oversight?

  • To what extent is the legal framework for the use of AI, especially in making decisions, fit for purpose?

    • Is more legislation or better guidance required?

02. This submission focuses on the process of AI adoption in the public sector and, particularly, on the acquisition of AI solutions. It evidences how the UK is consolidating an inadequate approach to ‘AI regulation by contract’ through public procurement. Given the level of abstraction and generality of the current guidelines for AI procurement, major gaps in public sector digital capabilities, and potential structural conflicts of interest, procurement is currently an inadequate tool to govern the process of AI adoption in the public sector. Flanking initiatives, such as the pilot algorithmic transparency standard, are unable to address and mitigate governance risks. Contrary to the approach in the AI Regulation Policy Paper,[1] plugging the regulatory gap will require (i) new legislation supported by a new mechanism of external oversight and enforcement (an ‘AI in the Public Sector Authority’ (AIPSA)); (ii) a well-funded strategy to boost in-house public sector digital capabilities; and (iii) the introduction of a (temporary) mechanism of authorisation of AI deployment in the public sector. The Procurement Bill would not suffice to address the governance shortcomings identified in this submission.

B. ‘AI Regulation by Contract’ through Procurement

03. Unless the public sector develops AI solutions in-house, which is extremely rare, the adoption of AI technologies in the public sector requires a procurement procedure leading to their acquisition. This places procurement at the frontline of AI governance because the ‘rules governing the acquisition of algorithmic systems by governments and public agencies are an important point of intervention in ensuring their accountable use’.[2] In that vein, the Committee on Standards in Public Life stressed that the ‘Government should use its purchasing power in the market to set procurement requirements that ensure that private companies developing AI solutions for the public sector appropriately address public standards. This should be achieved by ensuring provisions for ethical standards are considered early in the procurement process and explicitly written into tenders and contractual arrangements’.[3] Procurement is thus erected as a public interest gatekeeper in the process of adoption of AI by the public sector.

04. However, to effectively regulate by contract, it is at least necessary to have (i) clarity on the content of the obligations to be imposed, (ii) effective enforcement mechanisms, and (iii) public sector capacity to establish, monitor, and enforce those obligations. Given that the aim of regulation by contract would be to ensure that the public sector only adopts trustworthy AI solutions and deploys them in a way that promotes the public interest in compliance with existing standards of protection of fundamental and individual rights, exercising the expected gatekeeping role in this context requires a level of legal, ethical, and digital capability well beyond the requirements of earlier instances of regulation by contract to eg enforce labour standards.

05. On a superficial reading, it could seem that the National AI Strategy tackled this by highlighting the importance of the public sector’s role as a buyer and stressing that the Government had already taken steps ‘to inform and empower buyers in the public sector, helping them to evaluate suppliers, then confidently and responsibly procure AI technologies for the benefit of citizens’.[4] The National AI Strategy referred, in particular, to the setting up of the Crown Commercial Service’s AI procurement framework (the ‘CCS AI Framework’),[5] and the adoption of the Guidelines for AI procurement (the ‘Guidelines’)[6] as enabling tools. However, a close look at these instruments will show their inadequacy to provide clarity on the content of procedural and contractual obligations aimed at ensuring the goals stated above (para 03), as well as their potential to widen the existing public sector digital capability gap. Ultimately, they do not enable procurement to carry out the expected gatekeeping role.

C. Guidelines and Framework for AI procurement

06. Despite setting out to ‘provide a set of guiding principles on how to buy AI technology, as well as insights on tackling challenges that may arise during procurement’, the Guidelines provide high-level recommendations that cannot be directly operationalised by inexperienced public buyers and/or those with limited digital capabilities. For example, the recommendation to ‘Try to address flaws and potential bias within your data before you go to market and/or have a plan for dealing with data issues if you cannot rectify them yourself’ (guideline 3) not only requires a thorough understanding of eg the Data Ethics Framework[7] and the Guide to using Artificial Intelligence in the public sector,[8] but also detailed insights on data hazards.[9] This leads the Guidelines to stress that it may be necessary ‘to seek out specific expertise to support this; data architects and data scientists should lead this process … to understand the complexities, completeness and limitations of the data … available’.

07. Relatedly, some of the recommendations are very open ended in areas without clear standards. For example, the effectiveness of the recommendation to ‘Conduct initial AI impact assessments at the start of the procurement process, and ensure that your interim findings inform the procurement. Be sure to revisit the assessments at key decision points’ (guideline 4) is dependent on the robustness of such impact assessments. However, the Guidelines provide no further detail on how to carry out such assessments, other than a list of some generic areas for consideration (eg ‘potential unintended consequences’) and a passing reference to emerging guidelines in other jurisdictions. This is problematic, as the development of algorithmic impact assessments is still at an experimental stage,[10] and emerging evidence shows vastly diverging approaches, eg to risk identification.[11] In the absence of clear standards, algorithmic impact assessments will lead to inconsistent approaches and varying levels of robustness. The absence of standards will also require access to specialist expertise to design and carry out the assessments.

08. Ultimately, understanding and operationalising the Guidelines requires advanced digital competency, including in areas where best practices and industry standards are still developing.[12] However, most procurement organisations lack such expertise, as a reflection of broader digital skills shortages across the public sector,[13] with recent reports placing civil service vacancies for data and tech roles throughout the civil service alone close to 4,000.[14] This not only reduces the practical value of the Guidelines to facilitate responsible AI procurement by inexperienced buyers with limited capabilities, but also highlights the role of the CCS AI Framework for AI adoption in the public sector.

09. The CCS AI Framework creates a procurement vehicle[15] to facilitate public buyers’ access to digital capabilities. CCS’ description for public buyers stresses that ‘If you are new to AI you will be able to procure services through a discovery phase, to get an understanding of AI and how it can benefit your organisation.’[16] The Framework thus seeks to enable contracting authorities, especially those lacking in-house expertise, to carry out AI procurement with the support of external providers. While this can foster the uptake of AI in the public sector in the short term, it is highly unlikely to result in adequate governance of AI procurement, as this approach focuses at most on the initial stages of AI adoption but can hardly be sustainable throughout the lifecycle of AI use in the public sector—and, crucially, would leave the enforcement of contractualised AI governance obligations in a particularly weak position (thus failing to meet the enforcement requirement at para 04). Moreover, it would generate a series of governance shortcomings which avoidance requires an alternative approach.

D. Governance Shortcomings

10. Despite claims to the contrary in the National AI Strategy (above para 05), the approach currently followed by the Government does not empower public buyers to responsibly procure AI. The Guidelines are not susceptible of operationalisation by inexperienced public buyers with limited digital capabilities (above paras 06-08). At the same time, the Guidelines are too generic to support sophisticated approaches by more advanced digital buyers. The Guidelines do not reduce the uncertainty and complexity of procuring AI and do not include any guidance on eg how to design public contracts to perform the regulatory functions expected under the ‘AI regulation by contract’ approach.[17] This is despite existing recommendations on eg the development of ‘model contracts and framework agreements for public sector procurement to incorporate a set of minimum standards around ethical use of AI, with particular focus on expected levels transparency and explainability, and ongoing testing for fairness’.[18] The guidelines thus fail to address the first requirement for effective regulation by contract in relation to clarifying the relevant obligations (para 04).

11. The CCS Framework would also fail to ensure the development of public sector capacity to establish, monitor, and enforce AI governance obligations (para 04). Perhaps counterintuitively, the CCS AI Framework can generate a further disempowerment of public buyers seeking to rely on external capabilities to support AI adoption. There is evidence that reliance on outside providers and consultants to cover immediate needs further erodes public sector capability in the long term,[19] as well as creating risks of technical and intellectual debt in the deployment of AI solutions as consultants come and go and there is no capture of institutional knowledge and memory.[20] This can also exacerbate current trends of pilot AI graveyard spirals, where most projects do not reach full deployment, at least in part due to insufficient digital capabilities beyond the (outsourced) pilot phase. This tends to result in self-reinforcing institutional weaknesses that can limit the public sector’s ability to drive digitalisation, not least because technical debt quickly becomes a significant barrier.[21] It also runs counter to best practices towards building public sector digital maturity,[22] and to the growing consensus that public sector digitalisation first and foremost requires a prioritised investment in building up in-house capabilities.[23] On this point, it is important to note the large size of the CCS AI Framework, which was initially pre-advertised with a £90 mn value,[24] but this was then revised to £200 mn over 42 months.[25] Procuring AI consultancy services under the Framework can thus facilitate the funnelling of significant amounts of public funds to the private sector, rather than using those funds to build in-house capabilities. It can result in multiple public buyers entering contracts for the same expertise, which thus duplicates costs, as well as in a cumulative lack of institutional learning by the public sector because of atomised and uncoordinated contractual relationships.

12. Beyond the issue of institutional dependency on external capabilities, the cumulative effect of the Guidelines and the Framework would be to outsource the role of ‘AI regulation by contract’ to unaccountable private providers that can then introduce their own biases on the substantive and procedural obligations to be embedded in the relevant contracts—which would ultimately negate the effectiveness of the regulatory approach as a public interest safeguard. The lack of accountability of external providers would not only result from the weakness (or absolute inability) of the public buyer to control their activities and challenge important decisions—eg on data governance, or algorithmic impact assessments, as above (paras 06-07)—but also from the potential absence of effective and timely external checks. Market mechanisms are unlikely to deliver adequate checks due market concentration and structural conflicts of interest affecting both providers that sometimes provide consultancy services and other times are involved in the development and deployment of AI solutions,[26] as well as a result of insufficiently effective safeguards on conflicts of interest resulting from quickly revolving doors. Equally, broader governance controls are unlikely to be facilitated by flanking initiatives, such as the pilot algorithmic transparency standard.

13. To try to foster accountability in the adoption of AI by the public sector, the UK is currently piloting an algorithmic transparency standard.[27] While the initial six examples of algorithmic disclosures published by the Government provide some details on emerging AI use cases and the data and types of algorithms used by publishing organisations, and while this information could in principle foster accountability, there are two primary shortcomings. First, completing the documentation requires resources and, in some respects, advanced digital capabilities. Organisations participating in the pilot are being supported by the Government, which makes it difficult to assess to what extent public buyers would generally be able to adequately prepare the documentation on their own. Moreover, the documentation also refers to some underlying requirements, such as algorithmic impact assessments, that are not yet standardised (para 07). In that, the pilot standard replicates the same shortcomings discussed above in relation to the Guidelines. Algorithmic disclosure will thus only be done by entities with high capabilities, or it will be outsourced to consultants (thus reducing the scope for the revelation of governance-relevant information).

14. Second, compliance with the standard is not mandatory—at least while the pilot is developed. If compliance with the algorithmic transparency standard remains voluntary, there are clear governance risks. It is easy to see how precisely the most problematic uses may not be the object of adequate disclosures under a voluntary self-reporting mechanism. More generally, even if the standard was made mandatory, it would be necessary to implement an external quality control mechanism to mitigate problems with the quality of self-reported disclosures that are pervasive in other areas of information-based governance.[28] Whether the Central Digital and Data Office (currently in charge of the pilot) would have capacity (and powers) to do so remains unclear, and it would in any case lack independence.

15. Finally, it should be stressed that the current approach to transparency disclosure following the adoption of AI (ex post) can be problematic where the implementation of the AI is difficult to undo and/or the effects of malicious or risky AI are high stakes or impossible to revert. It is also problematic in that the current approach places the burden of scrutiny and accountability outside the public sector, rather than establishing internal, preventative (ex ante) controls on the deployment of AI technologies that could potentially be very harmful for fundamental and individual socio-economic rights—as evidenced by the inclusion of some fields of application of AI in the public sector as ‘high risk’ in the EU’s proposed EU AI Act.[29] Given the particular risks that AI deployment in the public sector poses to fundamental and individual rights, the minimalistic and reactive approach outlined in the AI Regulation Policy Paper is inadequate.

E. Conclusion: An Alternative Approach

16. Ensuring that the adoption of AI in the public sector operates in the public interest and for the benefit of all citizens will require new legislation supported by a new mechanism of external oversight and enforcement. New legislation is required to impose specific minimum requirements of eg data governance and algorithmic impact assessment and related transparency across the public sector. Such legislation would then need to be developed in statutory guidance of a much more detailed and actionable nature than the current Guidelines. These developed requirements can then be embedded into public contracts by reference. Without such clarification of the relevant substantive obligations, the approach to ‘AI regulation by contract’ can hardly be effective other than in exceptional cases.

17. Legislation would also be necessary to create an independent authority—eg an ‘AI in the Public Sector Authority’ (AIPSA)—with powers to enforce those minimum requirements across the public sector. AIPSA is necessary, as oversight of the use of AI in the public sector does not currently fall within the scope of any specific sectoral regulator and the general regulators (such as the Information Commissioner’s Office) lack procurement-specific knowledge. Moreover, units within Cabinet Office (such as the Office for AI or the Central Digital and Data Office) lack the required independence.

18. It would also be necessary to develop a clear and sustainably funded strategy to build in-house capability in the public sector, including clear policies on the minimisation of expenditure directed at the engagement of external consultants and the development of guidance on how to ensure the capture and retention of the knowledge developed within outsourced projects (including, but not only, through detailed technical documentation).

19. Until sufficient in-house capability is built to ensure adequate understanding and ability to manage digital procurement governance requirements independently, the current reactive approach should be abandoned, and AIPSA should have to approve all projects to develop, procure and deploy AI in the public sector to ensure that they meet the required legislative safeguards in terms of data governance, impact assessment, etc. This approach could progressively be relaxed through eg block exemption mechanisms, once there is sufficiently detailed understanding and guidance on specific AI use cases and/or in relation to public sector entities that could demonstrate sufficient in-house capability, eg through a mechanism of independent certification.

20. The new legislation and statutory guidance would need to be self-standing, as the Procurement Bill would not provide the required governance improvements. First, the Procurement Bill pays limited to no attention to artificial intelligence and the digitalisation of procurement.[30] An amendment (46) that would have created minimum requirements on automated decision-making and data ethics was not moved at the Lords Committee stage, and it seems unlikely to be taken up again at later stages of the legislative process. Second, even if the Procurement Bill created minimum substantive requirements, it would lack adequate enforcement mechanisms, not least due to the limited powers and lack of independence of the foreseen Procurement Review Unit (to also sit within Cabinet Office).

_______________________________________
Note: all websites last accessed on 25 October 2022.

[1] Department for Digital, Culture, Media and Sport, Establishing a pro-innovation approach to regulating AI. An overview of the UK’s emerging approach (CP 728, 2022).

[2] Ada Lovelace Institute, AI Now Institute and Open Government Partnership, Algorithmic Accountability for the Public Sector (August 2021) 33.

[3] Committee on Standards in Public Life, Intelligence and Public Standards (2020) 51.

[4] Department for Digital, Culture, Media and Sport, National AI Strategy (CP 525, 2021) 47.

[5] AI Dynamic Purchasing System < https://www.crowncommercial.gov.uk/agreements/RM6200 >.

[6] Office for Artificial Intelligence, Guidelines for AI Procurement (2020) < https://www.gov.uk/government/publications/guidelines-for-ai-procurement/guidelines-for-ai-procurement >.

[7] Central Digital and Data Office, Data Ethics Framework (Guidance) (2020) < https://www.gov.uk/government/publications/data-ethics-framework >.

[8] Central Digital and Data Office, A guide to using artificial intelligence in the public sector (2019) < https://www.gov.uk/government/collections/a-guide-to-using-artificial-intelligence-in-the-public-sector >.

[9] See eg < https://datahazards.com/index.html >.

[10] Ada Lovelace Institute, Algorithmic impact assessment: a case study in healthcare (2022) < https://www.adalovelaceinstitute.org/report/algorithmic-impact-assessment-case-study-healthcare/ >.

[11] A Sanchez-Graells, ‘Algorithmic Transparency: Some Thoughts On UK's First Four Published Disclosures and the Standards’ Usability’ (2022) < https://www.howtocrackanut.com/blog/2022/7/11/algorithmic-transparency-some-thoughts-on-uk-first-disclosures-and-usability >.

[12] A Sanchez-Graells, ‘“Experimental” WEF/UK Guidelines for AI Procurement: Some Comments’ (2019) < https://www.howtocrackanut.com/blog/2019/9/25/wef-guidelines-for-ai-procurement-and-uk-pilot-some-comments >.

[13] See eg Public Accounts Committee, Challenges in implementing digital change (HC 2021-22, 637).

[14] S Klovig Skelton, ‘Public sector aims to close digital skills gap with private sector’ (Computer Weekly, 4 Oct 2022) < https://www.computerweekly.com/news/252525692/Public-sector-aims-to-close-digital-skills-gap-with-private-sector >.

[15] It is a dynamic purchasing system, or a list of pre-screened potential vendors public buyers can use to carry out their own simplified mini-competitions for the award of AI-related contracts.

[16] Above (n 5).

[17] This contrasts with eg the EU project to develop standard contractual clauses for the procurement of AI by public organisations. See < https://living-in.eu/groups/solutions/ai-procurement >.

[18] Centre for Data Ethics and Innovation, Review into bias in algorithmic decision-making (2020) < https://www.gov.uk/government/publications/cdei-publishes-review-into-bias-in-algorithmic-decision-making/main-report-cdei-review-into-bias-in-algorithmic-decision-making >.

[19] V Weghmann and K Sankey, Hollowed out: The growing impact of consultancies in public administrations (2022) < https://www.epsu.org/sites/default/files/article/files/EPSU%20Report%20Outsourcing%20state_EN.pdf >.

[20] A Sanchez-Graells, ‘Identifying Emerging Risks in Digital Procurement Governance’ in idem, Digital Technologies and Public Procurement. Gatekeeping and experimentation in digital public governance (OUP, forthcoming) < https://ssrn.com/abstract=4254931 >.

[21] M E Nielsen and C Østergaard Madsen, ‘Stakeholder influence on technical debt management in the public sector: An embedded case study’ (2022) 39 Government Information Quarterly 101706.

[22] See eg Kevin C Desouza, ‘Artificial Intelligence in the Public Sector: A Maturity Model’ (2021) IBM Centre for the Business of Government < https://www.businessofgovernment.org/report/artificial-intelligence-public-sector-maturity-model >.

[23] A Clarke and S Boots, A Guide to Reforming Information Technology Procurement in the Government of Canada (2022) < https://govcanadacontracts.ca/it-procurement-guide/ >.

[24] < https://ted.europa.eu/udl?uri=TED:NOTICE:600328-2019:HTML:EN:HTML&tabId=1&tabLang=en >.

[25] < https://ted.europa.eu/udl?uri=TED:NOTICE:373610-2020:HTML:EN:HTML&tabId=1&tabLang=en >.

[26] See S Boots, ‘“Charbonneau Loops” and government IT contracting’ (2022) < https://sboots.ca/2022/10/12/charbonneau-loops-and-government-it-contracting/ >.

[27] Central Digital and Data Office, Algorithmic Transparency Standard (2022) < https://www.gov.uk/government/collections/algorithmic-transparency-standard >.

[28] Eg in the context of financial markets, there have been notorious ongoing problems with ensuring adequate quality in corporate and investor disclosures.

[29] < https://artificialintelligenceact.eu/ >.

[30] P Telles, ‘The lack of automation ideas in the UK Gov Green Paper on procurement reform’ (2021) < http://www.telles.eu/blog/2021/1/13/the-lack-of-automation-ideas-in-the-uk-gov-green-paper-on-procurement-reform >.

Algorithmic transparency: some thoughts on UK's first four published disclosures and the standards' usability

© Fabrice Jazbinsek / Flickr.

The Algorithmic Transparency Standard (ATS) is one of the UK’s flagship initiatives for the regulation of public sector use of artificial intelligence (AI). The ATS encourages (but does not mandate) public sector entities to fill in a template to provide information about the algorithmic tools they use, and why they use them [see e.g. Kingsman et al (2022) for an accessible overview].

The ATS is currently being piloted, and has so far resulted in the publication of four disclosures relating to the use of algorithms in different parts of the UK’s public sector. In this post, I offer some thoughts based on these initial four disclosures, in particular from the perspective of the usability of the ATS in facilitating an enhanced understanding of AI use cases, and accountability for those.

The first four disclosed AI use cases

The ATS pilot has so far published information in two batches (on 1 June and 6 July 2022), comprising the following four AI use cases:

  1. Within Cabinet Office, the GOV.UK Data Labs team piloted the ATS for their Related Links tool; a recommendation engine built to aid navigation of GOV.UK (the primary UK central government website) by providing relevant onward journeys from a content page, with the aim of helping users find useful information and content, aiding navigation.

  2. In the Department for Health and Social Care and NHS Digital, the QCovid team piloted the ATS with a COVID-19 clinical tool used to predict how at risk individuals might be from COVID-19. The tool was developed for use by clinicians in support of conversations with patients about personal risk, and it uses algorithms to combine a number of factors such as age, sex, ethnicity, height and weight (to calculate BMI), and specific health conditions and treatments in order to estimate the combined risk of catching coronavirus and being hospitalised or catching coronavirus and dying. Importantly, “The original version of the QCovid algorithms were also used as part of the Population Risk Assessment to add patients to the Shielded Patient List in February 2021. These patients were advised to shield at that time were provided support for doing so, and were prioritised for COVID-19 vaccination.

  3. The Information Commissioner's Office has piloted the ATS with its Registration Inbox AI, which uses a machine learning algorithm to categorise emails sent to the Information Commissioner's Office’s registration inbox and to send out an auto-reply where the algorithm “detects … a request about changing a business address. In cases where it detects this kind of request, the algorithm sends out an autoreply that directs the customer to a new online service and points out further information required to process a change request. Only emails with an 80% certainty of a change of address request will be sent an email containing the link to the change of address form.”

  4. The Food Standards Agency piloted the ATS with its Food Hygiene Rating Scheme (FHRS) – AI, which is an algorithmic tool to help local authorities to prioritise inspections of food businesses based on their predicted food hygiene rating by predicting which establishments might be at a higher risk of non-compliance with food hygiene regulations. Importantly, the tool is of voluntary use and “it is not intended to replace the current approach to generate a FHRS score. The final score will always be the result of an inspection undertaken by [a local authority] officer.

Harmless (?) use cases

At first glance, and on the basis of the implications of the outcome of the algorithmic recommendation, it would seem that the four use cases are relatively harmless, i.e..

  1. If GOV.UK recommends links to content that is not relevant or helpful, the user may simply ignore them.

  2. The outcome of the QCovid tool simply informs the GPs’ (or other clinicians’) assessment of the risk of their patients, and the GPs’ expertise should mediate any incorrect (either over-inclusive, or under-inclusive) assessments by the AI.

  3. If the ICO sends an automatic email with information on how to change their business address to somebody that had submitted a different query, the receiver can simply ignore that email.

  4. Incorrect or imperfect prioritisation of food businesses for inspection could result in the early inspection of a low-risk restaurant, or the late(r) inspection of a higher-risk restaurant, but this is already a risk implicit in allowing restaurants to open pending inspection; AI does not add risk.

However, this approach could be too simplistic or optimistic. It can be helpful to think about what could really happen if the AI got it wrong ‘in a disaster scenario’ based on possible user reactions (a useful approach promoted by the Data Hazards project). It seems to me that, on ‘worse case scenario’ thinking (and without seeking to be exhaustive):

  1. If GOV.UK recommends content that is not helpful but is confusing, the user can either engage in red tape they did not need to complete (wasting both their time and public resources) or, worse, feel overwhelmed, confused or misled and abandon the administrative interaction they were initially seeking to complete. This can lead to exclusion from public services, and be particularly problematic if these situations can have a differential impact on different user groups.

  2. There could be over-reliance on the QCovid algorithm by (too busy) GPs. This could lead to advising ‘as a matter of routine’ the taking of excessive precautions with significant potential impacts on the day to day lives of those affected—as was arguably the case for some of the citizens included in shielding categories in the earlier incarnation of the algorithm. Conversely, GPs that identified problems in the early use of the algorithm could simply ignore it, thus potentially losing the benefits of the algorithm in other cases where it could have been helpful—potentially leading to under-precaution by individuals that could have otherwise been better safeguarded.

  3. Similarly to 1, the provision of irrelevant and potentially confusing information can lead to waste of resource (e.g. users seeking to change their business registration address because they wrongly think it is a requirement to process their query or, at a lower end of the scale, users having to read and consider information about an administrative process they have no interest in). Beyond that, the classification algorithm could generate loss of queries if there was no human check to verify that the AI classification was correct. If this check takes place anyway, the advantages of automating the sending of the initial email seem rather marginal.

  4. Similar to 2, the incorrect prediction of risk can lead to misuse of resources in the carrying out of inspections by local authorities, potentially pushing down the list of restaurants pending inspection some that are high-risk and that could thus be seen their inspection repeatedly delayed. This could have important public health implications, at least for those citizens using the to be inspected restaurants for longer than they would otherwise have. Conversely, inaccurate prioritisations that did not seem to catch more ‘risky’ restaurants could also lead to local authorities abandoning its use. There is also a risk of profiling of certain types of businesses (and their owners), which could lead to victimisation if the tool was improperly used, or used in relation to restaurants that have been active for a longer period (eg to trigger fresh (re)inspections).

No AI application is thus entirely harmless. Of course, this is just a matter of theoretical speculation—as could also be speculated whether reduced engagement with the AI would generate a second tier negative effect, eg if ‘learning’ algorithms could not be revised and improved on the basis of ‘real-life’ feedback on whether their predictions were or not accurate.

I think that this sort of speculation offers a useful yardstick to assess the extent to which the ATS can be helpful and usable. I would argue that the ATS will be helpful to the extent that (a) it provides information susceptible of clarifying whether the relevant risks have been taken into account and properly mitigated or, failing that (b) it provides information that can be used to challenge the insufficiency of any underlying risk assessments or mitigation strategies. Ultimately, AI transparency is not an end in itself, but simply a means of increasing accountability—at least in the context of public sector AI adoption. And it is clear that any degree of transparency generated by the ATS will be an improvement on the current situation, but is the ATS really usable?

Finding out more on the basis of the ATS disclosures

To try to answer that general question on whether the ATS is usable and serves to facilitate increased accountability, I have read the four disclosures in full. Here is my summary/extracts of the relevant bits for each of them.

GOV.UK Related Links

Since May 2019, the tool has been using an algorithm called node2vec (machine learning algorithm that learns network node embeddings) to train a model on the last three weeks of user movement data (web analytics data). The benefits are described as “the tool … predicts related links for a page. These related links are helpful to users. They help users find the content they are looking for. They also help a user find tangentially related content to the page they are on; it’s a bit like when you are looking for a book in the library, you might find books that are relevant to you on adjacent shelves.

The way the tool works is described in some more detail: “The tool updates links every three weeks and thus tracks changes in user behaviour.” “Every three weeks, the machine learning algorithm is trained using the last three weeks of analytics data and trains a model that outputs related links that are published, overwriting the existing links with new ones.” “The average click through rate for related links is about 5% of visits to a content page. For context, GOV.UK supports an average of 6 million visits per day (Jan 2022). True volumes are likely higher owing to analytics consent tracking. We only track users who consent to analytics cookies …”.

The decision process is fully automated, but there is “a way for publishers to add/amend or remove a link from the component. On average this happens two or three times a month.” “Humans have the capability to recommend changes to related links on a page. There is a process for links to be amended manually and these changes can persist. These human expert generated links are preferred to those generated by the model and will persist.” Moreover, “GOV.UK has a feedback link, “report a problem with this page”, on every page which allows users to flag incorrect links or links they disagree with.” The tool was subjected to a Data Protection Impact Assessment (DPIA), but no other impact assessments (IAs) are listed.

When it comes to risk identification and mitigation, the disclosure indicates: “A recommendation engine can produce links that could be deemed wrong, useless or insensitive by users (e.g. links that point users towards pages that discuss air accidents).” and that, as mitigation: “We added pages to a deny list that might not be useful for a user (such as the homepage) or might be deemed insensitive (e.g. air accident reports). We also enabled publishers or anyone with access to the tagging system to add/amend or remove links. GOV.UK users can also report problems through the feedback mechanisms on GOV.UK.

Overall, then, the risk I had identified is only superficially identified, in that the ATS disclosure does not show awareness of the potential differing implications of incorrect or useless recommendations across the spectrum. The narrative equating the recommendations to browsing the shelves of a library is quite suggestive in that regard, as is the fact that the quality controls are rather limited.

Indeed, it seems that the quality control mechanisms require a high level of effort by every publisher, as they need to check every three weeks whether the (new) related links appearing in each of the pages they publish are relevant and unproblematic. This seems to have reversed the functional balance of convenience. Before the implementation of the tool, only approximately 2,000 out of 600,000 pieces of content on GOV.UK had related links, as they had to be created manually (and thus, hopefully, were relevant, if not necessarily unproblematic). Now, almost all pages have up to five related content suggestions, but only two or three out of 600,000 pages see their links manually amended per month. A question arises whether this extremely low rate of manual intervention is reflective of the high quality of the system, or the reverse evidence of lack of resource to quality-assure websites that previously prevented 98% of pages from having this type of related information.

However, despite the queries as to the desirability of the AI implementation as described, the ATS disclosure is in itself useful because it allows the type of analysis above and, in case someone considers the situation unsatisfactory or would like to prove it further, there are is a clear gateway to (try to) engage the entity responsible for this AI deployment.

QCovid algorithm

The algorithm was developed at the onset of the Covid-19 pandemic to drive government decisions on which citizens to advise to shield, support during shielding, and prioritise for vaccination rollout. Since the end of the shielding period, the tool has been modified. “The clinical tool for clinicians is intended to support individual conversations with patients about risk. Originally, the goal was to help patients understand the reasons for being asked to shield and, where relevant, help them do so. Since the end of shielding requirements, it is hoped that better-informed conversations about risk will have supported patients to make appropriate decisions about personal risk, either protecting them from adverse health outcomes or to some extent alleviating concerns about re-engaging with society.

In essence, the tool creates a risk calculation based on scoring risk factors across a number of data fields pertaining to demographic, clinical and social patient information.“ “The factors incorporated in the model include age, ethnicity, level of deprivation, obesity, whether someone lived in residential care or was homeless, and a range of existing medical conditions, such as cardiovascular disease, diabetes, respiratory disease and cancer. For the latest clinical tool, separate versions of the QCOVID models were estimated for vaccinated and unvaccinated patients.

It is difficult to assess how intensely the tool is (currently) used, although the ATS indicates that “In the period between 1st January 2022 and 31st March 2022, there were 2,180 completed assessments” and that “Assessment numbers often move with relative infection rate (e.g. higher infection rate leads to more usage of the tool).“ The ATS also stresses that “The use of the tool does not override any clinical decision making but is a supporting device in the decision making process.” “The tool promotes shared decision making with the patient and is an extra point of information to consider in the decision making process. The tool helps with risk/benefit analysis around decisions (e.g. recommendation to shield or take other precautionary measures).

The impact assessment of this tool is driven by those mandated for medical devices. The description is thus rather technical and not very detailed, although the selected examples it includes do capture the possibility of somebody being misidentified “as meeting the threshold for higher risk”, as well as someone not having “an output generated from the COVID-19 Predictive Risk Model”. The ATS does stress that “As part of patient safety risk assessment, Hazardous scenarios are documented, yet haven’t occurred as suitable mitigation is introduced and implemented to alleviate the risk.” That mitigation largely seems to be that “The tool is designed for use by clinicians who are reminded to look through clinical guidance before using the tool.

I think this case shows two things. First, that it is difficult to understand how different parts of the analysis fit together when a tool that has had two very different uses is the object of a single ATS disclosure. There seems to be a good argument for use case specific ATS disclosures, even if the underlying AI deployment is the same (or a closely related one), as the implications of different uses from a governance perspective also differ.

Second, that in the context of AI adoption for healthcare purposes, there is a dual barrier to accessing relevant (and understandable) information: the tech barrier and the medical barrier. While the ATS does something to reduce the former, the latter very much remains in place and perhaps turn the issue of trustworthiness of the AI to trustworthiness of the clinician, which is not necessarily entirely helpful (not only in this specific use case, but in many other one can imagine). In that regard, it seems that the usability of the ATS is partially limited, and more could be done to increase meaningful transparency through AI-specific IAs, perhaps as proposed by the Ada Lovelace Institute.

In this case, the ATS disclosure has also provided some valuable information, but arguably to a lesser extent than the previous case study.

ICO’s Registration Inbox AI

This is a tool that very much resembles other forms of email classification (e.g. spam filters), as “This algorithmic tool has been designed to inspect emails sent to the ICO’s registration inbox and send out autoreplies to requests made about changing addresses. The tool has not been designed to automatically change addresses on the requester’s behalf. The tool has not been designed to categorise other types of requests sent to the inbox.

The disclosure indicates that “In a significant proportion of emails received, a simple redirection to an online service is all that is required. However, sifting these types of emails out would also require time if done by a human. The algorithm helps to sift out some of these types of emails that it can then automatically respond to. This enables greater capacity for [Data Protection] Fees Officers in the registration team, who can, consequently, spend more time on more complex requests.” “There is no manual intervention in the process - the links are provided to the customer in a fully automated manner.

The tool has been in use since May 2021 and classifies approximately 23,000 emails a month.

When it comes to risk identification and mitigation, the ATS disclosure stresses that “The algorithmic tool does not make any decisions, but instead provides links in instances where it has calculated the customer has contacted the ICO about an address change, giving the customer the opportunity to self-serve.” Moreover, it indicates that there is “No need for review or appeal as no decision is being made. Incorrectly classified emails would receive the default response which is an acknowledgement.” It further stresses that “The classification scope is limited to a change of address and a generic response stating that we have received the customer’s request and that it will be processed within an estimated timeframe. Incorrectly classified emails would receive the default response which is an acknowledgement. This will not have an impact on personal data. Only emails with an 80% certainty of a change of address request will be sent an email containing the link to the change of address form.”

In my view, this disclosure does not entirely clarify the way the algorithm works (e.g. what happens to emails classified as having requested information on change of address? Are they ‘deleted’ from the backlog of emails requiring a (human) non-automated response?). However, it does provide sufficient information to further consolidate the questions arising from the general description. For example, it seems that the identification of risks is clearly partial in that there is not only a risk of someone asking for change of address information not automatically receiving it, but also a risk of those asking for other information receiving the wrong information. There is also no consideration of additional risks (as above), and the general description makes the claim of benefits doubtful if there has to be a manual check to verify adequate classification.

The ATS disclosure does not provide sufficient contact information for the owner of the AI (perhaps because they were contracted on limited after service terms…), although there is generic contact information for the ICO that could be used by someone that considered the situation unsatisfactory or would like to prove it further.

Food Hygiene Rating Scheme – AI

This tool is also based on machine learning to make predictions. “A machine learning framework called LightGBM was used to develop the FHRS AI model. This model was trained on data from three sources: internal Food Standards Agency (FSA) FHRS data, publicly available Census data from the 2011 census and open data from HERE API. Using this data, the model is trained to predict the food hygiene rating of an establishment awaiting its first inspection, as well as predicting whether the establishment is compliant or not.” “Utilising the service, the Environmental Health Officers (EHOs) are provided with the AI predictions, which are supplemented with their knowledge about the businesses in the area, to prioritise inspections and update their inspection plan.”

Regarding the justification for the development, the disclosure stresses that “the number of businesses classified as ‘Awaiting Inspection’ on the Food Hygiene Rating Scheme website has increased steadily since the beginning of the pandemic. This has been the key driver behind the development of the FHRS AI use case.” “The objective is to help local authorities become more efficient in managing the hygiene inspection workload in the post-pandemic environment of constrained resources and rapidly evolving business models.

Interestingly, the disclosure states that the tool “has not been released to actual end users as yet and hence the maintenance schedule is something that cannot be determined at this point in time (June 2022). The Alpha pilot started at the beginning of April 2022, wherein the end users (the participating Local Authorities) have access to the FHRS AI service for use in their day-to-day workings. This section will be updated depending on the outcomes of the Alpha Pilot ...” It remains to be seen whether there will be future updates on the disclosure, but an error in copy-pasting in the ATS disclosure makes it contain the same paragraph but dated February 2022. This stresses the need to date and reference (eg v.1, v.2) the successive versions of the same disclosure, which does not seem to be a field of the current template, as well as to create a repository of earlier versions of the same disclosure.

The section on oversight stresses that “the system has been designed to provide decision support to Local Authorities. FSA has advised Local Authorities to never use this system in place of the current inspection regime or use it in isolation without further supporting information”. It also stresses that “Since there will be no change to the current inspection process by introducing the model, the existing appeal and review mechanisms will remain in place. Although the model is used for prioritisation purposes, it should not impact how the establishment is assessed during the inspection and therefore any challenges to a food hygiene rating would be made using the existing FHRS appeal mechanism.”

The disclosure also provides detailed information on IAs: “The different impact assessments conducted during the development of the use case were 1. Responsible AI Risk Assessment; 2. Stakeholder Impact Assessment; [and] 3. Privacy Impact Assessment.” Concerning the responsible AI risk assessment, in addition to a personal data issue that should belong in the DPIA, the disclosure reports three identified risks very much in line with the ones I had hinted at above: “2. Potential bias from the model (e.g. consistently scoring establishments of a certain type much lower, less accurate predictions); 3. Potential bias from inspectors seeing predicted food hygiene ratings and whether the system has classified the establishment as compliant or not. This may have an impact on how the organisation is perceived before receiving a full inspection; 4. With the use of AI/ML there is a chance of decision automation bias or automation distrust bias occurring. Essentially, this refers to a user being over or under reliant on the system leading to a degradation of human-reasoning.”

The disclosure presents related mitigation strategies as follows: “2. Integration of explainability and fairness related tooling during exploration and model development. These tools will also be integrated and monitored post-alpha testing to detect and mitigate potential biases from the system once fully operational; 3. Continuously reflect, act and justify sessions with business and technical subject matter experts throughout the delivery of the project, along with the use of the three impact assessments outlined earlier to identify, assess and manage project risks; 4. Development of usage guidance for local authorities specifically outlining how the service is expected to be used. This document also clearly states how the service should not be used, for example, the model outcome must not be the only indicator used when prioritising businesses for inspection.

In this instance, the ATS disclosure is in itself useful because it allows the type of analysis above and, in case someone considers the situation unsatisfactory or would like to prove it further, there are is a clear gateway to (try to) engage the entity responsible for this AI deployment. It is also interesting to see that the disclosure specifies that the private provider was engaged “As well as [in] a development role [… to provide] Responsible AI consulting and delivery services, including the application of a parallel Responsible AI sprint to assess risk and impact, enable model explainability and assess fairness, using a variety of artefacts, processes and tools”. This is clearly reflected in the ATS disclosure and could be an example of good practice where organisations lack that in-house capability and/or outsource the development of the AI. Whether that role should fall with the developer, or should rather be separate to avoid organisational conflicts of interest is a discussion for another day.

Final thoughts

There seems to be a mixed picture on the usability of the ATS disclosures, with some of them not entirely providing (full) usability, or a clear pathway to engage with the specific entity in charge of the development of the algorithmic tool, specifically if it was an outsourced provider. In those cases, the public authority that has implemented the AI (even if not the owner of the project) will have to deal with any issues arising from the disclosure. There is also a mixed practice concerning linking to resources other than previously available (open) data (eg open source code, data sources), with only one project (GOV.UK) including them in the disclosures discussed above.

It will be interesting to see how this assessment scales up (to use a term) once disclosures increase in volume. There is clearly a research opportunity arising as soon as more ATS disclosures are published. As a hypothesis, I would submit that disclosure quality is likely to reduce with volume, as well as with the withdrawal of whichever support the pilot phase has meant for those participating institutions. Let’s see how that empirical issue can be assessed.

The other reflection I have to offer based on these first four disclosures is that there are points of information in the disclosures that can be useful, at least from an academic (and journalistic?) perspective, to assess the extent to which the public sector has the capabilities it needs to harness digital technologies (more on that soon in this blog).

The four reviewed disclosures show that there was one in-house development (GOV.UK), while the other ones were either procured (QCovid, which disclosure includes a redacted copy of the contract), or contracted out, perhaps even directly awarded (ICO email classifier FSA FHRS - AI). And there are some in between the line indications that some of the implementations may have been relatively randomly developed, unless there was strong pre-existing reliable statistical data (eg on information requests concerning change of business address). Which in itself triggers questions on the procurement or commissioning strategy developed by institutions seeking to harness AI potential.

From this perspective, the ATS disclosures can be a useful source of information on the extent to which the adoption of AI by the public sector depends as strongly on third party capabilities as the literature generally hypothesises or/and is starting to demonstrate empirically.