How Con Edison Can Transform Utility Grid Management and Customer Services with Agentic AI
How Con Edison Can Transform Utility Grid Management and Customer Services with Agentic AI
Agentic AI for utility grid management is quickly moving from an innovation buzzword to a practical operating model for utilities that need to improve reliability, accelerate outage response, and rebuild customer trust during high-stakes events. For Con Edison, the opportunity is especially clear: a dense urban service territory, complex infrastructure, increasingly volatile weather, and rising expectations for real-time updates create the perfect environment for AI agents that can coordinate work across systems, teams, and channels.
This isn’t about replacing grid operators, dispatchers, or customer care teams. It’s about reducing the friction that slows them down: searching across SCADA alarms, OMS tickets, AMI signals, GIS layers, asset history, playbooks, and customer records just to answer basic questions like “What’s happening?”, “What should we do next?”, and “Who needs to know?”
Agentic AI in utilities can help Con Edison move faster with better consistency by turning insights into actions, under strict human control and operational guardrails. Done well, it becomes a shared orchestration layer across grid operations automation and customer experience: one system that understands the situation, follows approved procedures, and executes repeatable steps at machine speed.
What “Agentic AI” Means for a Utility (and Why It’s Different)
Definition (clear, featured-snippet ready)
Agentic AI for utility grid management refers to goal-driven AI systems that can plan tasks, make decisions within defined limits, and take actions across utility tools and workflows. Unlike a chatbot that only answers questions, an AI agent can read telemetry and tickets, follow operating procedures, call approved system APIs, generate drafts and recommendations, and route work for human approval when risk is high.
The key is that the agent is connected to real workflows and governed by strict controls. It doesn’t “freestyle” grid operations. It performs structured work: gather context, apply rules and policies, propose a next step, and either execute or escalate depending on risk.
Here’s a practical way to distinguish the common approaches:
Traditional automation (rules/RPA): Executes fixed steps when conditions match, but can’t adapt well when inputs are messy or incomplete.
Predictive ML: Forecasts outcomes (failures, loads, churn), but usually stops short of doing the next operational step.
Chatbots: Provide conversation and knowledge retrieval, but typically don’t connect to tools to complete work.
Agentic AI in utilities: Combines understanding, planning, tool use, and controlled action to complete multi-step workflows end to end.
Why agentic AI is emerging now
Utilities have long used automation and analytics, but today’s constraints are different. A modern grid is more dynamic, and the work is more cross-functional. Agentic AI is emerging because the ingredients finally line up:
Better integration options: More mature APIs and event streams make it easier to connect OMS, ADMS/DMS, AMI/MDMS, CRM, and work management.
Stronger orchestration frameworks: Teams can build multi-step workflows with logging, approvals, retries, and versioned procedures.
Operational complexity is increasing: DER adoption, EV charging growth, and aging assets add new variables to every decision.
Customer expectations have shifted: People want proactive updates, self-service resolution, and consistent answers across channels, especially during outages.
For Con Edison, these forces converge in daily grid and service operations. Agentic AI for utility grid management becomes a way to standardize execution without forcing every team into yet another manual process.
Con Edison’s High-Impact Use Cases for Grid Management
Agentic AI for utility grid management becomes most valuable when it touches high-volume, high-variability workflows where decisions depend on multiple systems and time matters. The goal is not to create a single “super-agent,” but a set of focused AI agents for outage management, maintenance, distribution optimization, and storm response, each with clear inputs, outputs, and guardrails.
Outage detection, triage, and restoration orchestration
Outage response is a coordination problem. Signals arrive from everywhere: AMI last-gasp messages, SCADA alarms, customer calls, IVR selections, mobile app reports, OMS tickets, and weather feeds. Humans can manage this, but the first 15 to 30 minutes are where delays cascade.
An AI agent can continuously monitor these feeds and do the repetitive work at speed:
Cluster signals into probable feeder- or transformer-level events
Cross-check with known device topology from GIS and switching constraints from ADMS/DMS
Prioritize incidents based on impacted customers, critical facilities, and safety flags
Propose switching plans for human review, especially for FLISR-style restoration options
Update estimated time of restoration (ETR) confidence as milestones are completed
This is where AI agents for outage management can improve outcomes without taking unsafe actions. The agent prepares the decision package; operators approve the steps.
KPIs that can move with this approach include:
Faster fault location and triage time
Better ETR accuracy and consistency across channels
Reduced time to restoration when safe switching alternatives exist
Improved SAIDI/SAIFI performance where process delays are reduced
Predictive maintenance that turns insights into work orders
Most utilities already experiment with predictive maintenance for utilities, but many programs stall at “nice dashboard” stage. The hard part is converting predictions into scheduled work that fits constraints: crew availability, parts, access windows, load conditions, and safety requirements.
This is a natural fit for agentic AI for utility grid management because agents can push the workflow forward:
Inputs the agent can use:
Asset health indices and failure history
Inspection notes and defect codes
Thermal imaging and condition monitoring alerts
Vegetation management data
Work backlog, crew calendars, and outage windows
Outputs the agent can produce:
A prioritized maintenance backlog with risk scoring and rationale
Draft work orders with recommended scope, parts/tools checklist, and safety notes
Suggested scheduling windows that consider load, weather, and operational constraints
Escalations when data is missing or risk crosses a threshold
Instead of asking planners to “go look at the model,” the agent delivers ready-to-review work packages and keeps the backlog current as new data arrives.
Distribution optimization with ADMS/DMS plus DER coordination
As DER penetration increases, distribution operations become more variable. Voltage and load issues can change quickly, and optimization needs to account for local conditions and operating policies. Agentic AI in utilities can support ADMS optimization with AI without bypassing safety rules.
Examples of agentic support in distribution management system (DMS) AI workflows include:
Assisting with voltage/VAR optimization setpoints by compiling constraints, recent events, and device statuses
Supporting FLISR by generating alternative restoration paths and highlighting constraint conflicts
Recommending peak load mitigation actions when constraints emerge (with operator approval)
DER orchestration AI becomes increasingly relevant as solar, storage, and EV charging change net load patterns. Even when a utility is not directly controlling customer DERs, an agent can coordinate available programs and operational levers:
Recommend demand response events or targeted outreach where programs allow
Identify feeders with recurring overload risk and propose staged mitigation plans
Suggest EV charging demand shaping strategies through partnerships or customer programs
The value is less about “AI makes the grid optimal” and more about “AI reduces the time to produce a safe, policy-compliant plan.”
Storm response automation (before, during, after)
Storm operations are where process maturity is tested. The challenge is volume, speed, and uncertainty. AI for storm response utilities works best as a set of focused agents that support each phase.
Pre-storm:
Identify circuits with higher vulnerability based on historical performance and forecasted conditions
Recommend staging locations for crews and materials based on predicted impact zones
Draft pre-event communications and preparedness reminders with approved language
During storm:
Continuously reconcile OMS, AMI signals, and customer reports to refine incident clusters
Recommend dynamic crew reallocation based on restoration progress and priorities
Generate situation reports at defined intervals without manual cut-and-paste
Post-storm:
Automate reporting packages by pulling timelines, actions taken, and outcomes
Support root-cause analysis by summarizing recurring failure modes
Propose resilience investments and operational changes based on patterns
The transition here is important: an agent doesn’t need to “run restoration.” It needs to reduce the coordination overhead so humans can focus on judgment calls and safety.
Top 5 agentic AI grid use cases (quick list):
AI agents for outage management: detection, clustering, and triage support
ETR and customer update orchestration with consistent status logic
Predictive maintenance for utilities that generates work orders
ADMS/DMS decision support for switching and constraint management
Storm response utilities automation: situational reporting and resource coordination
Transforming Customer Service with Agentic AI (Beyond Chatbots)
Customer experience improvements often start with a chatbot and end with frustration because the bot can’t do anything. Agentic AI for utility grid management changes this dynamic by connecting customer intent to real workflow actions, while still protecting sensitive data and enforcing policy constraints.
Proactive, personalized outage communications
The fastest way to reduce inbound volume and frustration is to reduce uncertainty. When customers don’t know what’s happening, they call.
An agent can trigger proactive updates via SMS, email, app notifications, or automated voice based on OMS status, geography, and customer preferences. The best version is not “one message to everyone,” but neighborhood- and incident-specific status:
ETR ranges with confidence, not false precision
Clear explanations of what has been completed (assessment, switching, repairs)
Safety guidance relevant to conditions (downed wire warnings, generator safety)
Links to local resources where appropriate
This is where agentic AI in utilities improves trust: consistent, timely updates that don’t contradict what customers see on the outage map or hear from a live agent.
“Resolve my issue” self-service agent (billing, service, field)
A customer self-service virtual agent becomes genuinely useful when it can take approved actions. For Con Edison, that means tying together CIS/billing, CRM, service orders, appointment scheduling, and field dispatch workflows.
Actions an agent can support, under guardrails:
Start, stop, or move service with identity verification and policy checks
Payment arrangement workflows, eligibility screening, and next-step instructions
Dispute triage: gather required information, categorize the case, and open the right ticket
Schedule appointments and coordinate field visits, including confirmations and reminders
Even when it hands off to a human, it can reduce transfers by summarizing intent, relevant account context, prior actions, and what the customer has already tried. That’s where contact center automation utilities programs see the most immediate operational gains.
Step-by-step: how an agentic AI resolves a billing issue
Authenticate the customer and confirm account scope (single or multiple locations)
Identify the issue type (high bill, payment posting, rate plan confusion, disputed charge)
Pull relevant data: recent bills, meter reads, usage patterns, payments, adjustments
Check for known events: estimated reads, service changes, outages, billing schedule shifts
Explain the situation in plain language and present options allowed by policy
If needed, create a case with the correct category and attach a summary plus evidence
Offer next actions: payment plan, review request, appointment, or escalation path
Confirm what will happen next and when the customer will hear back
Log the interaction for audit and quality review
Contact center agent assist (real-time)
Not every call should be automated, especially complex or emotionally charged situations. But agent assist can reduce handle time and improve consistency without risking incorrect autonomous actions.
Real-time capabilities include:
Live summarization and structured notes during the call
Next-best action prompts based on policy and the customer’s issue type
Required disclosure reminders for compliance consistency
Fast knowledge retrieval from tariffs, program requirements, outage playbooks, and internal procedures
In practice, this improves first contact resolution (FCR) because agents spend less time searching and more time resolving.
Multilingual and accessibility improvements
Utilities serve diverse communities. Agentic AI can improve clarity and access when it is trained on approved terminology and wrapped in strong quality controls:
Higher-quality translation for customer communications
Plain-language explanations of bills, programs, and fees
Support for accessibility-aligned content formatting and delivery across channels
This is not just a brand improvement; it reduces repeat contacts that happen when customers don’t understand what they’re being told.
Reference Architecture: How Agentic AI Would Plug into Con Edison Systems
A successful agentic AI for utility grid management program depends on integration and control, not just model choice. The agent must plug into the systems where work happens and provide end-to-end auditability.
Core systems the agents must integrate with (examples)
Grid/operations:
SCADA
OMS
ADMS/DMS
GIS
AMI/MDMS
Asset management / EAM
Work management and scheduling
Field mobility tools
Customer:
CIS/billing
CRM
Contact center/IVR platform
Web/app experiences
Outage map and notification systems
Enterprise:
Data platform (lakehouse/warehouse)
Document management and knowledge base (SOPs, playbooks, policies)
Ticketing and collaboration tools
The agent stack (layered view)
A practical stack for agentic AI in utilities usually looks like this:
Orchestration layer: plans tasks, breaks work into steps, manages retries and timeouts, routes to tools
Tool layer: approved connectors and APIs to OMS/DMS/CRM/EAM, messaging systems, ticketing, scheduling
Knowledge layer: procedures, policies, and playbooks the agent can reference for consistent outputs
Observability layer: logs, evaluations, approvals, versioning, and monitoring for production governance
This layered approach also makes it easier to scale: you can build a second or third agent that reuses the same tools and governance patterns.
Human-in-the-loop controls by risk level
Utilities can’t treat all actions the same. A simple but effective approach is to tier controls:
Low-risk (can automate with monitoring):
Draft outage updates and service emails for review or auto-send with approved templates
Summarize tickets and generate structured reports
Retrieve policy answers and propose guidance
Medium-risk (recommendation with approval):
Recommend dispatch changes or scheduling adjustments
Propose switching plans and restoration sequences
Generate maintenance work orders pending planner approval
High-risk (strict authorization required):
Any action that directly changes operational states or could affect safety and reliability must require explicit authorization, step-level logging, and role-based permissions
This is how agentic AI for utility grid management becomes usable: it earns trust through boundaries.
Governance, Security, and Regulatory Realities (Non-Negotiables)
Utilities operate in a high-accountability environment. Agentic systems must be designed with risk management at the center, not added later.
Safety and operational risk management
The safest agent is one that can never exceed operating limits or bypass required approvals. Practical controls include:
Fail-safe design: agents should default to escalation when uncertain
Role-based access control (RBAC) with least privilege
Change management processes for workflows, policies, and tool permissions
Clear separation between recommendation and execution for operational actions
A disciplined approach also improves adoption: operators are far more likely to trust an agent that behaves predictably and documents its work.
Data privacy and customer trust
Customer data is sensitive, and utilities can’t afford mishandled PII. Core requirements typically include:
Redaction or minimization of PII in transcripts and logs where possible
Strong retention controls aligned with policy and regulatory needs
Transparent disclosures when customers are interacting with AI-supported channels
Consistent language and escalation paths for sensitive situations
Trust is built when customers feel informed, not handled.
Cybersecurity for agentic systems
Utility cybersecurity for AI systems has a different threat model than a standalone analytics tool. When an agent can call tools, it becomes a new control plane that must be protected.
Common risks:
Prompt injection through untrusted text inputs (tickets, emails, notes)
Tool misuse if the agent is given overly broad permissions
Data exfiltration through connectors or misconfigured logs
Compromised APIs or credentials used by agent tools
Core defenses:
Network segmentation respecting IT/OT boundaries
Allowlisted tools and allowlisted actions per workflow
Strong authentication, secrets management, and key rotation
Continuous monitoring and anomaly detection for tool calls
Incident response runbooks specifically for agent components
Compliance considerations to address early
Even when specific frameworks vary, the themes are consistent:
Records retention and auditability for actions, approvals, and communications
Procurement and vendor risk management
Accessibility requirements for customer-facing communications
Clear accountability for model updates, workflow changes, and approvals
When governance is built into the workflow, scaling becomes much easier.
Implementation Roadmap (90 Days to 12 Months)
Agentic AI for utility grid management succeeds when it is deployed like a utility program: phased, measured, governed, and operationally owned. The fastest wins typically come from customer and communications workflows first, then deeper grid integrations.
Phase 1 (0–90 days): Pilot with measurable ROI
Pick 1–2 use cases where value is visible and risk is manageable:
Contact center agent assist
Outage communications automation
Set success metrics upfront, such as:
Average handle time (AHT) reduction
Digital containment / call deflection improvement
CSAT movement for outage interactions
ETR update consistency and reduced “where is my crew” follow-up calls
Build the foundation during the pilot:
Secure data connectors to a limited set of systems
Logging and audit trails for every agent action
An evaluation harness to test quality, compliance, and failure modes
This phase should prove that the agent can operate reliably within guardrails.
Phase 2 (3–6 months): Expand to cross-functional workflows
After the pilot, expand integration and permissions carefully:
Add OMS and CRM integration for unified outage context and customer communications
Connect EAM/work management for maintenance and work order workflows
Introduce role-based action permissions and approval routing by risk tier
Establish governance routines: testing, red-teaming, QA, and versioned procedures
This is where grid operations automation starts to become real: the agent is no longer a helpful assistant, but a workflow engine.
Phase 3 (6–12 months): Production-scale agentic operations
At scale, the big gains come from multi-agent orchestration, where specialized agents coordinate like a team:
Storm response operations: reporting, prioritization support, and communications coordination
Field service dispatch optimization recommendations based on evolving conditions
Continuous improvement loops where human feedback updates playbooks and constraints
Training and change management matter here. The goal is adoption, not novelty. Operators, supervisors, and customer service leaders should help define what “good” looks like and what the agent should never do.
90-day pilot checklist (practical)
Select one customer workflow and one operational workflow with clear scope
Map inputs and outputs: systems, data fields, required policies, escalation points
Define risk tiers and approval rules before building
Build connectors with least-privilege permissions and strong logging
Create test cases from real historical events (including storms and edge cases)
Run evaluations weekly and track error categories, not just averages
Launch with a limited user group and a clear feedback loop
Document ownership: who changes procedures, who approves expansions, who audits logs
Measuring Success: KPIs That Matter for Con Edison
To justify scaling agentic AI for utility grid management, measurement should cover outcomes, efficiency, and risk. The best programs define attribution rules early so improvements don’t get lost in broader modernization efforts.
Grid operations KPIs
SAIDI/SAIFI improvements where process speed and coordination contribute
Fault location time and time-to-triage
Restoration time for comparable events
ETR accuracy and update consistency
Preventive vs corrective maintenance ratio shifts
Crew utilization and truck rolls avoided via better scheduling and triage
Customer service KPIs
Call deflection and digital containment rates
AHT and after-call work reduction (especially with agent assist)
First contact resolution (FCR)
CSAT/NPS changes, especially during outage events
Complaint rate and escalation volume
Business and risk KPIs
Cost-to-serve changes by channel and interaction type
Agent error rate by category (policy errors, hallucinations, tool errors, missing context)
Override rate and escalation rate by workflow
Audit findings, security incidents detected/prevented, and time-to-remediate
A mature program will treat these like operational KPIs, not data science metrics.
Contentious Questions (And Practical Answers)
Will AI replace dispatchers or customer service agents?
In practice, agentic AI in utilities is more effective as augmentation than replacement. Utilities have deep institutional knowledge embedded in experienced people. The immediate value comes from removing low-value work:
Searching across multiple systems for context
Repeating the same explanations and documentation
Manual report creation and status reconciliation
Routine case creation and routing
Over time, roles may evolve. But the most realistic near-term outcome is that teams handle more volume with less burnout and more consistent quality.
Can we trust AI during storms?
Only if the system is engineered for storms, not just tested on calm-day data. Trust comes from:
Guardrails that prevent unsafe actions
Approval workflows for higher-risk recommendations
Strong testing on historical storm events and edge cases
A clear fall-back mode when data quality degrades
A sensible approach is to start with decision support and communications, then graduate to constrained actions as confidence grows.
What data do we need, and what if it’s messy?
Messy data is the norm, not the exception. The key is to define minimum viable data for a pilot and improve progressively.
Minimum viable inputs often include:
A reliable incident source (OMS tickets, outage events)
A communication channel and customer preferences source (CRM/CIS plus notification platform)
A knowledge base of approved language and procedures
Basic telemetry indicators (AMI last gasp or SCADA alarms) if the use case requires it
As the program scales, data quality work can be prioritized based on where it increases automation safely.
Conclusion: A Dual Transformation Powered by Agentic AI
Con Edison doesn’t need agentic AI for utility grid management because it’s trendy. It needs it because grid reliability and customer trust now depend on faster, more consistent execution across dozens of interconnected systems and teams. The same agentic layer that speeds outage triage can also improve outage communications. The same governance that protects operational actions can also protect customer data and compliance.
The best path forward is phased:
Start with a 90-day pilot that improves customer experience and reduces operational load
Expand to cross-functional workflows that connect OMS, CRM, and work management
Scale into production-grade storm response and field coordination with strong controls
To see what an enterprise-grade agentic workflow looks like in practice, book a StackAI demo: https://www.stack-ai.com/demo
