This is the second post in a three-part series breaking down the Committee of Sponsoring Organizations of the Treadway Commission's (COSO) report, Achieving Effective Internal Control Over Generative AI. If you missed Part 1, which covers Shadow AI, GenAI-specific risks, the eight capability types, and the 17 COSO principles, you can find it here. The full report is available at coso.org.
Most organizations have crossed the threshold from experimenting with GenAI to depending on it. Outputs are being reviewed by staff, embedded in workflows, and in some cases informing decisions without anyone asking hard questions about how those outputs were produced or what happens when the underlying model changes. The governance gap from Part 1 is not just open. In many cases, it is being papered over with informal habits and unwritten assumptions.
The COSO report offers a corrective to that drift. The five takeaways in this post address the disciplines that separate organizations that are genuinely governing GenAI from those that are simply using it and hoping for the best.
Takeaway 1: Treat Prompts Like Code
One of the clearest and most practical points in the report, reflected in Principle 5 on accountability, is that prompts should be treated with the same rigor applied to any other controlled configuration. There is a tendency to think of prompting as informal, something closer to a conversation than a system design. That framing is a governance liability.
When you build a prompt, whether for a custom GPT, a Copilot agent, or any other GenAI tool, you are defining an input-to-output process. You are making decisions about what the system will do, how it will behave, and what constraints it will operate under. That is code in every meaningful sense. It should be documented, version-controlled, reviewed before deployment, and subject to change management the same way any other system configuration would be.
The report recommends treating prompts, system prompts, retrieval connectors, and transformation rules as governed configurations with version history, approval workflows, and rollback plans. For organizations that have not yet formalized this, the starting point is simply asking: if this prompt changed tomorrow, would anyone know?
Takeaway 2: The Document's Illustrative Examples Address a Real Problem
One of the practical contributions of the COSO report that deserves recognition is its use of concrete examples throughout the text. The clause extraction assistant, for instance, walks through how a legal team deployed GenAI to identify termination clauses in supplier contracts, what went wrong with lower-quality scanned documents, and how controls were redesigned to address it.
This matters because one of the genuine barriers to AI governance adoption is not skepticism. It is a lack of imagination. Many practitioners understand the principles in the abstract but struggle to see how they apply to the work their teams actually do. The report's examples close that gap. They are not hypothetical edge cases. They are the kinds of workflows that exist in finance, legal, compliance, and operations departments right now, and they illustrate both how GenAI can fail and what a well-designed control looks like in response.
If you are building an internal case for governance investment, the examples in the report are ready-made reference material.
Takeaway 3: An AI Policy Is Not Just a Risk Document, It Is a Decision Record
Principle 6 of the COSO framework calls for organizations to specify suitable objectives for each GenAI use case, and one of the most important applications of that principle is the AI policy itself. The report makes the case that having a documented AI policy is not simply about restricting what employees can do. It is about recording conscious, intentional decisions about how the organization has chosen to approach GenAI.
Consider a marketing department that has assessed its GenAI use as low risk and decided to allow broad access. That may be a perfectly reasonable conclusion. But if it is not documented, it is not a decision. It is an oversight waiting to become a problem. A policy that captures the reasoning, the scope, the acceptable use boundaries, and the data classification rules for a given context transforms an informal practice into a governed one. It also provides a baseline for future risk reassessment as the technology and the regulatory environment continue to evolve.
The organizations that will be best positioned as GenAI regulation matures are the ones that can demonstrate not just that they are using the technology, but that they made deliberate choices about how and why.
Takeaway 4: Model Drift Is a Real Risk
Principle 9 of the framework addresses the need to identify and analyze significant change, and in the GenAI context, few risks illustrate this more clearly than model drift. Model drift refers to the gradual or sudden degradation of a model's performance over time, which can result from changes in the underlying data, shifts in the operating environment, or updates pushed by the model vendor.
The practical implication is straightforward and easy to overlook. A prompt that produced reliable, accurate outputs on one version of a model may produce materially different outputs on a newer version, even if the prompt itself has not changed. This is not a theoretical concern. Organizations that have built workflows around specific model behavior need to treat vendor model updates the same way they would treat any other significant system change: test the outputs, document the comparison, and confirm that the behavior you are relying on has been preserved before the new version goes into production.
This requires building model version awareness into your governance process, something most organizations have not yet done. The COSO report's emphasis on continuous risk assessment rather than annual reviews reflects exactly this reality.
Takeaway 5: The Human in the Loop Is Not a Backup Plan. It Is the Control.
Principle 10 of the COSO framework addresses the selection and development of control activities, and the most important of these in the GenAI context is human review. The report is clear that GenAI outputs should be treated as assertions requiring evidence, not facts to accept by default, and that the level of human corroboration should be proportionate to the risk involved.
A useful frame for this is to think of GenAI as a junior employee. You would not take a first-year analyst's work product and send it directly to a client or use it to support a material business decision without reviewing it first. Not because the analyst lacks potential, but because the stakes of an undetected error are too high and the track record is not yet established. GenAI warrants exactly the same posture. The outputs can be valuable and the productivity gains are real, but the human reviewer is not a formality. They are the control.
The report identifies several approaches to operationalizing this, ranging from full re-performance of AI outputs in high-risk scenarios to risk-based sampling in lower-stakes contexts. The right level of review depends on the use case. The wrong answer is no review at all.
In the post, we will conclude the review of COSO's GenAI framework with our final five takeaways.
Reference
Emett, S., Eulerich, M., Guthrie, J., Pikoos, J., & Wood, D. A. (2026). Achieving effective internal control over generative AI (GenAI). Committee of Sponsoring Organizations of the Treadway Commission. https://www.coso.org/generative-ai