What can agentic AI do now — and what will it soon be able to do?
- 6 days ago
- 7 min read

By Paul Shotton, Advocacy Strategy
When people discuss agentic AI, the conversation often becomes too abstract. The more useful question is practical: which steps in a real workflow are current models actually good at, which steps are still risky or weak, and which are likely to become more feasible soon?
That matters because adoption is rarely an all-or-nothing choice. Teams are not deciding whether to automate an entire function. They are deciding whether a particular model or tool is good enough for a specific step in a workflow. OpenAI's current guide to building agents frames the issue in exactly those terms: agents are suited to workflows where systems need to manage execution, make decisions, use tools, and operate within clearly defined guardrails. Anthropic makes a similar point from an engineering perspective, arguing that the most successful implementations tend to use simple, composable patterns rather than overly complex frameworks or unchecked autonomy.
This means the starting point for any use-case discussion should be workflow mapping. Before asking whether to use a large language model, a custom GPT, or a more agentic tool, it is worth breaking the work into discrete stages and asking what type of retrieval, comparison, analysis, drafting, or judgement each stage requires. OpenAI is explicit that different tasks in a workflow may require different models, and that simpler steps — retrieval or intent classification — should not automatically be treated the same way as harder judgement tasks.
A monitoring report is a useful example
A monitoring report breaks naturally into stages. First, source acquisition: obtaining the transcript, the agenda, the legislative documents, and any earlier update. Then extraction and analysis: identifying what was said, what changed, and which parts of the material matter most. Then, possibly, a comparison stage: checking the latest material against a previous report or earlier document version to identify additions, omissions, or shifts in emphasis. After that comes drafting. The final stage is more strategic: deciding what the development means, what the next steps in the policy process are likely to be, which stakeholders matter most, and what the organisation should do next.
Current agentic AI is already reasonably well suited to several of the earlier and middle stages — retrieving materials, extracting information from documents, comparing versions, summarising proceedings, and drafting a first-pass note from source material. These are bounded, checkable tasks with relatively clear inputs and outputs, which is exactly the type of workflow that current agent guidance identifies as promising. Anthropic's guidance stresses that the strongest applications are those with clear success criteria, feedback loops, and meaningful human oversight.
Where current systems are weaker
What current systems are less dependable at is the final layer of strategic interpretation. Drafting a summary of what happened in a meeting is one thing. Recommending the right strategic response is another. That latter step requires judgement about institutional dynamics, stakeholder incentives, organisational priorities, political timing, and risk.
This is where the distinction between task completion and trusted task completion becomes important. It is not enough that a model can produce an answer. You need to trust the answer, understand the conditions under which it was produced, and know how it will be checked. The practical question is not simply whether the system can do the task — it is whether the surrounding workflow makes the result dependable enough to use.
If an agent is retrieving source material, comparing documents, or drafting a note, teams still need review points, validation steps, and clear escalation rules. OpenAI explicitly recommends starting with strong foundations — capable models paired with well-defined tools, clear instructions, and human intervention where needed — rather than assuming a system should run end to end without supervision.
What will soon improve
The next step in capability is likely to be less about a dramatic leap in "intelligence" and more about systems carrying out more than one workflow stage coherently and with less manual prompting. In other words, the near-term shift is from using a model step by step to using more agentic systems that can move across retrieval, extraction, comparison, and first-draft generation with a degree of autonomy. Anthropic recommends increasing complexity only when necessary and treating additional agentic design as a deliberate trade-off rather than a default.
That is particularly relevant for monitoring and policy intelligence work. It is increasingly plausible that systems will soon be able to acquire source material, compare it with earlier records, identify key developments, draft a structured update, and flag likely next procedural steps with limited human prompting. The improvement to expect is not unlimited autonomy — it is more reliable chaining across a bounded workflow.
A second likely improvement is better procedural awareness. In policy work, one of the most valuable capabilities is not just summarising what happened, but understanding what comes next: where a file sits in the process, what meetings are coming up, which calendar signals matter, and when a political or institutional window may open. Current models can already help with parts of this when given structured context, but they remain uneven in reliability. The likely near-term gain is that they will become better at linking documents, timelines, and calendars into a more coherent picture of the next step in the process — especially when connected to external tools and explicit workflow logic.
Adoption should be phased, not over-ambitious
For organisations looking at adoption, the practical recommendation is to begin with the strongest current functionalities, not the most ambitious possible vision. That means identifying use cases where the task is relatively structured, the output can be checked, and the risk of error is manageable. From there, it makes sense to expand step by step — increasing complexity, the number of linked steps, and the sophistication of the tool only once the earlier stage is working well.
This is not just a theoretical recommendation. McKinsey's 2025 global survey found that 88% of respondents said their organisations were using AI in at least one business function, but most were still in experimentation or pilot phases rather than scaled deployment. 23% said they were scaling an agentic AI system somewhere in the enterprise, while another 39% were experimenting with agents. Use is broadening faster than deep operational embedding.
The same survey found that redesigning workflows is a major differentiator. High performers are far more likely than others to have fundamentally redesigned workflows around AI, and workflow redesign is one of the strongest contributors to meaningful business impact. The value does not come from dropping a model onto an old process and hoping for the best. It comes from understanding which steps are suitable, how outputs will be validated, and how the workflow itself should change.
Trust, maturity, and controls still lag
This is also why the trust and governance layer needs to sit inside the adoption conversation, not outside it. McKinsey's 2026 AI Trust Maturity Survey found that average responsible-AI maturity rose to 2.3 from 2.0 the previous year, but only around one-third of organisations had maturity levels of three or higher in strategy, governance, and agentic AI governance. Strategy, governance, and agentic controls are lagging behind technical and risk-management capabilities — which fits what many teams are now experiencing in practice: the tools are moving fast, but internal confidence, controls, and governance are developing more slowly.
That point strengthens the case for a phased approach. It is not only easier operationally; it is also more realistic organisationally. It allows teams to learn where the tool is strong, where checking is needed, what guardrails work, and how much autonomy is actually acceptable before they move into more complex or higher-risk parts of the workflow.
The role of monitoring providers
It is also worth recognising that many software and monitoring providers are already doing a large part of the groundwork. In legislative and public-affairs monitoring, the value of many platforms lies not just in the model, but in the fact that they have already spent years scraping, structuring, classifying, and connecting the underlying data. If the source material is already organised, searchable, and linked to relevant issue categories, AI can be layered on top much more effectively.
Quorum presents its legislative tracking and public-affairs intelligence products as AI-augmented workspaces that help teams monitor, analyse, compare, and act on legislative and regulatory developments, including AI-powered bill summaries and issue triage. FiscalNote describes PolicyNote as an AI-powered platform that accelerates and optimises policy tracking, summarising, briefing, and action — drawing on its existing breadth of policy data and analysis. Those examples matter because they show that the current opportunity is not just better models in isolation, but better models applied to structured data and operational workflows that already exist.
These are still early days
At the same time, these remain early days. Most organisations have not yet begun scaling AI across the enterprise, and even where agentic systems are being tested, scaling tends to be limited to one or two functions. The most credible near-term path is bounded autonomy, structured checking, and gradual expansion — not sweeping claims about full automation.
The practical conclusion
The real question is not whether agentic AI can "do monitoring". The better question is which parts of the monitoring workflow it can already do well, which parts it may soon do better, and where human judgement still adds the most value.
Today, the strongest fit is in bounded steps: retrieval, extraction, comparison, structured summarisation, and first-pass drafting. Soon, we should expect systems to perform more of those steps together, with less manual orchestration and better procedural awareness. But the strategic end of the workflow — deciding what matters most, which stakeholders to prioritise, and what the organisation should do next — is likely to remain more dependent on human judgement, trust, and oversight for longer.
The main changes: stripped the preamble (which read like a chat response rather than an article), tightened transitions, removed a few slightly repetitive sentences, and softened the referencing of OpenAI/Anthropic guidance in places where it was reading more like a literature review than an article. The evidence and structure are unchanged.




Comments