Month: April 2026

Good AI Practice in Drug Development (cont.)

In my previous post, I argued that the medical technology sector offers valuable lessons for building reliable AI systems—particularly through Good Machine Learning Practice (GMLP). With the “Guiding Principles of Good AI Practice in Drug Development”, jointly published by the European authority EMA and the US-american FDA in January 2026, this perspective is extended beyond medical devices.

These principles span the entire drug‑development lifecycle: preclinical research, clinical trials, and manufacturing. This expansion is not only regulatory in nature—it also has important implications for how organisations learn, govern knowledge, and stabilise expertise around AI.

Human‑centric by design: making responsibility explicit
The emphasis on human‑centric and ethical design reframes AI as a socio‑technical system rather than a purely technical artefact. Human oversight is no longer assumed; it is a design requirement.
From a KM perspective, this matters because it forces organisations to make responsibility, judgement, and decision authority explicit—and therefore learnable. Tacit assumptions about “what the system does” or “who decides in the end” no longer suffice. AI design becomes a vehicle for clarifying roles, expectations, and accountability structures that are central to organisational knowledge.

AI systems as part of the GxP (Good Practices in Pharma) knowledge landscape
The explicit requirement for GxP compliance positions AI systems firmly within the organisation’s regulated knowledge infrastructure.
For AI‑enabled computerized systems, e.g. analytical decision‑support systems, this implies:

  • structured data governance as a shared organisational memory (not just a technical safeguard)
  • quality management across the entire AI lifecycle, turning development, deployment, monitoring, and change into learning loops rather than isolated events

In KM terms, AI is treated as institutionalised knowledge—codified, governed, audited, and continuously maintained.

Proportionate validation as organisational sense‑making
The call for risk‑based, proportionate validation once again supports a move away from schematic compliance towards contextual judgement.
This aligns closely with organisational learning: validation becomes an ongoing process of sense‑making about risk, impact, and uncertainty, rather than a checklist exercise. Different AI systems require different depths of scrutiny—not because standards are weakened, but because learning is situated.

Performance beyond metrics: learning in use
By extending performance evaluation beyond isolated model metrics to include human–AI interaction, the principles acknowledge an old KM insight: systems only reveal their quality in practice.
Interpretability and explainability are not technical luxuries; they are prerequisites for:

  • shared understanding
  • justified trust
  • reflective use

An AI system that cannot be meaningfully explained cannot become part of an organisation’s collective knowledge—no matter how accurate it is.

Plain language and monitoring: sustaining knowledge over time
Two further aspects strengthen the learning dimension.
First, the requirement for plain‑language communication treats understanding as a quality attribute. Knowledge about AI functionality, limits, and changes must be accessible—not only to experts, but to users and, where relevant, patients.
Second, the focus on continuous monitoring and data drift reinforces the idea that AI systems are never “finished”. They evolve with their context. Managing them therefore means learning over time, detecting change, and deliberately updating both models and organisational understanding.

My conclusion:
Seen through a KM lens, the EMA–FDA principles do more than regulate AI. They provide a framework for embedding AI into organisational learning structures – through transparency, lifecycle thinking, and explicit responsibility. Reliable AI, in this sense, is not primarily a technical achievement. It is the outcome of organisations that are able to learn about their systems, their data, and their own practices—continuously and collectively.

What We Can Learn from Medical Technology Industry About Reliable AI

It is quite rare for my two main areas of consultancy – knowledge management and computerised system validation in the pharmaceutical sector – to overlap, but when it comes to AI…

Whilst many companies are still experimenting with prototypes, the medical sector has already established a strict framework for the safe use of these technologies: Good Machine Learning Practice (GMLP).
These ten principles, developed by leading regulatory bodies such as the FDA (US Food and Drug Administration) and the IMDRF (International Medical Device Regulators Forum), provide a blueprint for robust and trustworthy AI systems, not only in the regulated field of medical technology.

Here are the key GMLP principles:

  1. Interdisciplinary teams rather than siloed solutions
    An AI model is only as good as the understanding of its intended purpose. GMLP requires experts from various disciplines to collaborate throughout the entire lifecycle. Engineers, data specialists and the actual end-users must work together to define the clinical or business benefits the AI is intended to deliver, as well as the risks involved.
  2. Sound software engineering and security
    AI is software – and should be treated as such. This means consistently applying best practices in software engineering, cybersecurity and risk management. A methodical design process ensures that decisions are traceable and that the integrity and authenticity of the data are maintained.
  3. Data quality and representativeness
    A common mistake in AI projects is a ‘bias’ in the data. GMLP stipulates that datasets must reflect the actual target population. If, for example, only data from a specific user group is used, the model will fail in the real world. Careful data governance is key here.
  4. Strict separation of training and test data
    To objectively evaluate a model’s performance, the training and test datasets must be strictly independent of one another. This prevents ‘label leakage’ – a phenomenon whereby the model ‘guesses’ the outcome based on hidden clues in the training data, rather than learning genuine patterns. Only through independent testing can it be demonstrated that the AI also works with new, unknown data.
  5. Suitability of reference standards
    Use of the best available (clinical) methods for generating reference data
  6. Tailored model design
    The model design must be suited to the available data and the intended use in order to minimise risks such as overfitting
  7. Focus on human-AI interaction
    In practice, AI rarely operates entirely independently. GMLP therefore focuses on the interaction between humans and machines. It is essential to ensure that users can interpret the AI’s outputs correctly and that no dangerous over-reliance on the system develops. Transparency and clear information about the system’s limitations are essential for this.
  8. (Clinically) Relevant test conditions
    Testing is carried out under conditions that simulate real-world clinical use
  9. Clear user information
    Users are provided with transparent information about the system’s capabilities, limitations and updates/re-training.
  10. Control does not stop after roll-out
    An ML model is not a static product. GMLP requires continuous monitoring of performance in real-world use (‘post-market monitoring’). This allows performance declines caused by changing data patterns (dataset drift) to be detected at an early stage and risks associated with retraining the model to be managed.

At first glance, applying these principles may seem like a lot of work. However, it is just necessary if we should build trust in AI, isn’t it? And for me the main message is: „It is the controlled data quality, stupid“.