Real-Time Demand Response Dispatch in Smart Grids: Balancing Economic Efficiency and Social Fairness via Advanced Data-Driven Approaches ()
1. Introduction
As active prosumers increasingly dominate low-carbon electricity markets, the physical boundaries of distribution networks—particularly nodal voltage limits and thermal line capacities—have emerged as the primary bottlenecks for large-scale demand response (DR) execution [1]. Traditional economic dispatch frameworks, which predominantly optimize for system-wide cost reduction or emission targets, often operate under idealized network assumptions [2] [3]. This oversight not only risks severe reverse power flows and equipment degradation during peak flexibility events, but also disproportionately penalizes industrial stakeholders whose rigid physical production constraints restrict their responsiveness [4] [5].
Unlike previous studies focusing solely on algorithmic convergence, modern grid dispatch must evolve to deeply integrate multi-physics constraints with cross-sectoral social equity [6]. The inherent heterogeneity across residential, commercial, and industrial sectors means that a “one-size-fits-all” pricing signal often leads to a “Green Penalty,” where rigid industrial loads subsidize the flexibility dividends of other sectors [7]. To technically resolve this socio-technical tension, this paper proposes an advanced Soft Actor-Critic (SAC) framework [8]. By embedding a LinDistFlow-based physical safety layer and a physics-based battery aging model directly into the learning environment, our approach ensures hardware stewardship and grid stability while navigating the complex Pareto frontier of smart grid operations [9].
As global decarbonization goals intensify, emission-based DR and low-carbon optimization have emerged as key pathways to align economic dispatch with environmental targets [10] [11]. In integrated electricity and carbon markets, prosumer behavior is driven not only by real-time electricity prices but also by fluctuating carbon trading signals [12]. However, designing efficient and socially equitable dispatch strategies faces critical challenges. A primary hurdle lies in the significant heterogeneity across different sectors [13] [14]. While residential users possess flexible loads like HVAC and electric vehicles (EVs), their participation is constrained by the need to maintain basic living comfort. Commercial sectors exhibit high baseloads but have specific operating windows, whereas industrial users, though possessing significant demand, often face rigid production schedules and high economic risks from supply interruptions [13] [15].
Furthermore, modern grid dispatch must respect stringent physical and technical constraints [16]. This includes maintaining distribution network power flow security (e.g., voltage limits) and accounting for the non-linear degradation costs of energy storage systems during frequent cycling [8] [17]. Traditional optimization frameworks, such as bi-level programming or Stackelberg games, often struggle to handle these high-dimensional, non-convex constraints in real-time [9] [11]. While deep reinforcement learning (DRL) has shown promise in managing such complexities [18]-[22], most existing research focuses on total system efficiency, often overlooking the “fairness gap” among different user sectors [23] [24]. In a low-carbon transition, a purely efficiency-driven strategy may inadvertently impose a disproportionate financial burden on industrial production or over-exploit residential flexibility, leading to sectoral inequities that undermine the social sustainability of DR programs.
To address these challenges, this paper proposes an advanced DRL-based framework for low-carbon economic dispatch that explicitly balances economic efficiency with sectoral fairness. By replacing the abstract mathematical equity used in previous studies with a tangible cross-sector fairness metric, this study ensures a balanced distribution of transition costs and benefits among residential, commercial, and industrial stakeholders. The main contributions of this work are as follows:
1) Multi-Physics Constrained Environment for Hardware Stewardship: We formulate a dispatch environment that transcends abstract optimization by integrating non-linear AC power flow constraints and a physics-based battery aging model. This ensures that the reinforcement learning agent respects the physical limits of the distribution infrastructure while preventing the over-exploitation of energy storage assets.
2) Sectoral Fairness-Aware Reward Mechanism: We introduce a novel multi-objective reward structure grounded in the “Social Contract” theory. By utilizing Jain’s Fairness Index, the framework explicitly quantifies and mitigates the “Green Penalty” traditionally imposed on rigid industrial loads, ensuring that the economic dividends of low-carbon transitions are shared proportionally across the residential-commercial-industrial nexus.
3) Validation of Efficiency-Equity Pareto Equilibrium: Through extensive simulations, we provide a quantitative map of the Pareto frontier between aggregate social welfare and distributional equity. We demonstrate that through sophisticated agent design, grid operators can achieve millisecond-level responsiveness and guaranteed physical security without requiring additional infrastructure investments.
2. Problem Formulation
2.1. The Grid as a Socio-Technical Ecosystem: A Multi-Sector Perspective
The modern smart grid is no longer a simple unidirectional energy delivery system but has evolved into a complex socio-technical ecosystem where technical stability, economic efficiency, and social equity are deeply intertwined. In our framework, the Grid Dispatch Center acts as a central coordinator, tasked with reconciling the stochastic nature of renewable energy with the diverse objectives of three macro-sectors: Residential, Commercial, and Industrial.
The interaction within this ecosystem is modeled as a dynamic feedback loop. The dispatch center broadcasts price and carbon signals, and in response, each sector optimizes its internal resources to achieve a balance between economic savings and operational constraints. This “Social Contract” ensures that the burden of grid regulation and the benefits of low-carbon transitions are shared across society.
Figure 1 illustrates the holistic socio-technical architecture of the proposed multi-objective demand response market. At the top layer, the power supply side comprises conventional thermal generators and renewable energy sources, which are collectively governed by a Tiered Carbon Cap-and-Trade mechanism to enforce environmental stewardship.
Figure 1. Multi-objective demand response market architecture.
The Grid Dispatch Center, functioning as the intelligent coordinator powered by the Soft Actor-Critic (SAC) agent, bridges the supply and demand sides. All dispatch interactions are strictly bounded by a physical safety layer—Power Flow & Voltage Constraints—ensuring that the economic scheduling does not compromise real-time grid stability.
On the demand side, the system interacts with three highly heterogeneous prosumer sectors: Residential (characterized by high flexibility but strict comfort constraints), Commercial (medium flexibility bounded by specific operating hours), and Industrial (heavy baseloads, rigid production schedules, and high carbon exposure).
The entire architecture operates as a dynamic closed-loop feedback system: the SAC agent formulates and broadcasts Fairness-Aware Price Signals downwards to the prosumers. In return, the resulting Sectoral Demand Response behaviors and the calculated Jain’s Index are fed back to the dispatch center. This continuous bidirectional interaction empowers the agent to dynamically navigate the complex Pareto trade-off among system operational costs, absolute carbon emission reductions, and cross-sectoral equity.
2.2. Sectoral Demand Modeling: Utility and Sensitivity
To reflect the diverse behavioral patterns of prosumers, we employ sectoral utility functions that quantify the “satisfaction” derived from electricity consumption.
1) Residential and Commercial Sectors:
For residential and commercial sectors, energy usage is primarily driven by comfort and service quality. We utilize a logarithmic utility function to reflect the law of diminishing marginal utility:
(1)
where
quantifies the absolute value or necessity the user assigns to electricity. A higher
indicates that the sector is less willing to reduce consumption even at high prices.
dictates how quickly the marginal satisfaction decreases as more power is consumed. It captures the “comfort ceiling”—once basic heating or cooling needs are met, additional electricity provides rapidly decreasing marginal utility.
2) Industrial Sector:
Unlike residential users, industrial prosumers operate under rigid production schedules. Any significant deviation from their baseline power consumption
results in financial losses due to disrupted assembly lines or labor idle time. The utility is modeled as:
(2)
where
represents the direct economic output generated per unit of electricity under normal operating conditions.
is a critical variable representing the “flexibility cost.” A high
implies that the industrial process is extremely rigid, and the dispatch center must offer significantly higher incentives to persuade this sector to shift its load.
2.3. Environmental Stewardship: Tiered Carbon Cap-and-Trade
To bridge the gap between economic dispatch and global decarbonization, we incorporate a Tiered Cap-and-Trade mechanism. This model forces the system to internalize the “social cost of carbon.”
(3)
where
calculated carbon output based on the current generation mix.
is the “free” emission limit granted by regulators.
represent the economic pressure applied to the grid. By setting
, the model simulates a progressive tax where excessive pollution becomes exponentially expensive, encouraging the agent to prioritize renewable energy accommodation.
2.4. Asset Protection: The Physics of Battery Aging
Energy storage systems (ESS) are the “buffers” of the grid, but they are finite assets. To avoid the “exploitation” of battery resources, we model the degradation cost
based on the Depth of Discharge (DoD):
(4)
where
is the upfront capital cost of the battery per kWh.
is the total number of expected charge-discharge cycles under standard conditions.
measures the intensity of the discharge. Higher DoD significantly accelerates the chemical degradation of the battery cells.
is the actual amount of energy exchanged in the current interval.
By incorporating this, the DRL agent learns “Stewardship”—it will only use the battery when the market benefits outweigh the long-term cost of hardware replacement.
2.5. The Social Welfare and Fairness Nexus
The final objective is to navigate the Efficiency-Fairness Pareto Frontier. We utilize Jain’s Fairness Index (JFI) to ensure that the “Green Dividend” (the savings from low-carbon dispatch) is not monopolized by a single sector.
(5)
where
Calculated as the percentage reduction in sectoral expenditure compared to a baseline without demand response.
is a value of 1.0 represents a “Social Harmony” state where all three sectors benefit proportionally. This directly addresses the “Social Fairness” requirement in the paper title by ensuring that residential comfort is not sacrificed solely to protect industrial profits, or vice versa.
2.6. Physical Boundary Layer: AC Power Flow Constraints
To prevent the dispatch strategy from becoming a “paper-only” solution, it must satisfy the physical laws of the distribution network.
Active Power Balance: Ensures that at every millisecond,
, maintaining frequency stability.
Voltage Stability (
): Critical for preventing equipment damage at the end of the line during massive demand response events.
Branch Capacity: Ensures the distribution transformers and lines do not overheat, respecting the thermal limits of the grid infrastructure.
3. Methodology: Advanced DRL for Multi-Objective Dispatch
To solve the non-linear, multi-constraint optimization problem defined in Chapter 2, this study adopts the Soft Actor-Critic (SAC) algorithm. Unlike traditional deterministic policy gradients, SAC optimizes a stochastic policy by maximizing both the expected return and the policy’s entropy, ensuring robust exploration in the volatile environment of a renewable-dominated grid.
3.1. MDP Formulation for Sectoral Dispatch
The real-time demand response task is mapped to a Markov Decision Process (MDP), characterized by the tuple
.
State Space (
)
The state vector
represents the dispatcher’s observation of the global grid status at time
:
(6)
where
is the historical load sequence over a sliding window
.
is the current renewable generation (wind/solar).
is the state of charge of the sector-level energy storage systems.
is Static features identify the price elasticity and production rigidity of the three sectors.
Action Space (
)
The action
is a continuous 3-dimensional vector representing the price adjustment signals for the Residential, Commercial, and Industrial sectors:
The final broadcast price is
, subject to the regulatory price caps defined in Section 2.6.
Reward Function (
)
The reward
is the mathematical heartbeat of the “Social Fairness” objective. It is engineered to guide the agent toward the Pareto frontier:
where
is the Jain’s Fairness Index calculated in Section 2.5. By adjusting the weight
, the agent learns to prioritize sectoral equity over absolute system profit.
3.2. The Soft Actor-Critic (SAC) Architecture
The SAC agent utilizes an Actor-Critic architecture to decouple the policy learning from the value estimation.
1) Soft Q-Value Function (Critic):
Two independent Critic networks,
and
, are used to estimate the soft Q-value, mitigating the overestimation bias common in volatile grid environments. The objective is to minimize the Bellman residual:
(7)
where
includes the entropy-augmented target value.
2) Stochastic Policy (Actor):
The Actor network
outputs a Gaussian distribution of actions. The policy is updated to maximize the expected Q-value plus entropy:
(8)
where
controls the trade-off between exploitation (maximizing grid welfare) and exploration.
3.3. Interaction Loop: Bridging Data and Physics
Figure 2 illustrates the closed-loop control architecture of the proposed Soft Actor-Critic (SAC) framework, seamlessly integrating deep reinforcement learning with rigorous power system physics. On the left, the SAC Agent (Digital Brain) utilizes an Actor Network to generate continuous dispatch actions—specifically, the three-tiered price signals (
) for the residential, commercial, and industrial sectors. Simultaneously, the Twin Critic Networks evaluate the policy to mitigate overestimation bias, continuously learning from past experiences stored in the Replay Buffer. A pivotal innovation of this framework is the Physical Safety Filter located in the center. Before any generated price signal is broadcast to the grid, the raw action
must pass through a strict LinDistFlow Constraints funnel. If the anticipated load shifting violates voltage safety limits or branch capacities, the action is autonomously clipped and the agent receives a safety penalty, thereby strictly preventing physically hazardous dispatch commands. On the right, the safe actions are executed within the Smart Grid Environment, which simulates complex socio-technical dynamics. This environment accurately evaluates the DoD Battery Aging Model, the Tiered Carbon Tax penalties, and the Cross-Sector Utility responses. Finally, the environment feeds back the updated state (
) and a comprehensive multi-objective reward (
)—aggregating social welfare, carbon emissions, total system costs, and Jain’s Index—back to the digital brain. This continuous bidirectional feedback loop drives the agent to discover the optimal Pareto frontier between economic efficiency and sectoral fairness.
![]()
Figure 2. The SAC algorithm control loop incorporating physical constraints and fairness.
4. Case Studies and Results
To rigorously validate the engineering feasibility and socio-economic benefits of the proposed fairness-aware dispatch strategy, comprehensive simulations were conducted. The analysis evaluates computational scalability, physical grid security, and the critical Pareto trade-off between absolute carbon reduction and sectoral equity (Jain’s Fairness Index).
4.1. Simulation Setup and Benchmark Models
To prevent the algorithm from operating in a “physics-free” vacuum, the dispatch environment was built upon the standard IEEE 33-bus distribution network. Real-world empirical data spanning a full year (with 15-minute dispatch intervals) was utilized, combining fluctuating renewable generation profiles (wind and solar) with tiered carbon trading prices.
The network nodes were strictly clustered into the three socio-technical sectors defined in Chapter 2:
Residential Nodes (18 buses): High flexibility (EVs, HVAC) but stringent comfort constraints.
Commercial Nodes (9 buses): Medium flexibility, constrained by daytime operating hours.
Industrial Nodes (6 buses): High baseload, high penalty ($\gamma_i$) for production deviation, and heavily exposed to the carbon tax.
To isolate the contributions of our proposed framework, we evaluated it against three established baseline models:
1) MILP (Mixed-Integer Linear Programming): Solved via the commercial Gurobi solver. This represents the absolute theoretical optimal solution under perfect foresight, but it is notoriously slow.
2) PSO (Particle Swarm Optimization): A traditional heuristic algorithm widely used in microgrid dispatch.
3) Standard DRL (DDPG/SAC without Fairness): A conventional reinforcement learning agent whose sole objective is to minimize total system cost, ignoring Jain’s Fairness Index and battery aging constraints.
4) Proposed Strategy (Fairness-Aware SAC): Our complete framework, incorporating LinDistFlow physics, DoD battery aging, and Jain’s Fairness multi-objective reward.
4.2. Computational Scalability and Real-Time Feasibility
In real-time electricity and carbon markets, clearing and dispatch commands must be executed within stringent sub-minute windows. Table 1 compares the computational latency of the evaluated models.
Table 1 provides a comprehensive comparison of the four dispatch models across critical dimensions: computational scalability, economic optimality, social equity, and physical safety.
Table 1. Comparison of computational scalability and real-time feasibility.
Dispatch Model |
Average Inference Latency/Step |
Optimality Gap (vs. MILP) |
Jain’s Fairness Index (
) |
Voltage Violations (
or
) |
MILP (Gurobi) |
14.5 min |
0.0% (Theoretical Baseline) |
N/A (Single Objective) |
0 (Safe) |
PSO (Heuristic) |
42.6 s |
8.5% |
0.52 |
Occasional (Peak hours) |
Standard DRL (SAC) |
12.5 ms |
2.1% |
0.41 (Severe Inequity) |
Severe (End-of-line nodes) |
Proposed SAC (Fairness-Aware) |
18.4 milliseconds |
3.2% |
0.92 (High Equity) |
0 (Strictly Safe) |
As the theoretical benchmark, the Mixed-Integer Linear Programming (MILP) model—solved via commercial solvers—achieves perfect economic optimality (a 0.0% optimality gap) and guarantees absolute physical safety. However, its single-step computation latency reaches 14.5 minutes due to the non-convexities of AC power flows and tiered carbon pricing. In real-time electricity markets requiring sub-minute responsiveness, this severe computational bottleneck renders MILP entirely unviable for practical edge deployment. Meanwhile, the traditional heuristic PSO algorithm compresses the latency to 42.6 seconds but frequently falls into local optima under highly volatile renewable scenarios (exhibiting an 8.5% optimality gap) and fails to strictly guarantee voltage safety during peak hours.
The Standard DRL (SAC) model demonstrates the extreme speed of data-driven approaches, drastically reducing inference latency to 12.5 milliseconds with high economic efficiency (only a 2.1% optimality gap). However, devoid of physical awareness and social responsibility objectives, this standard agent focuses exclusively on short-term optimization in pursuit of absolute profit. It over-utilizes flexible users, leading to a severe deterioration in cross-sector equity (Jain’s Index plummets to 0.41). Furthermore, the uncoordinated concentration of loads induces severe reverse power flows, resulting in critical voltage violations at end-of-line nodes, which poses a fatal threat to physical grid security.
Conversely, the Proposed SAC (Fairness-Aware) strategy demonstrates robust operational balance. By embedding the LinDistFlow physical safety boundaries and a multi-objective fairness reward mechanism, the proposed strategy maintains an ultra-low online inference latency of 18.4 milliseconds, perfectly satisfying real-time market requirements. Crucially, by sacrificing a marginal 1.1% in total system efficiency (with the optimality gap slightly increasing from 2.1% to 3.2%), the agent secures a lot in social equity—elevating Jain’s Fairness Index to 0.92 and ensuring a harmonized distribution of benefits across the residential, commercial, and industrial sectors. Simultaneously, it strictly guarantees zero voltage violations throughout the entire dispatch horizon.
These quantitative results rigorously validate the core thesis of this framework: through sophisticated DRL architecture and multi-objective design, grid operators can achieve millisecond-level efficient dispatch, absolute physical grid security, and cross-sector social equity, reaching a perfect Pareto equilibrium without requiring additional physical infrastructure investments.
4.3. Economic Analysis of Demand Response
A dispatch strategy is only as good as its physical viability. We analyzed the voltage profiles across the IEEE 33-bus network during peak demand hours.
Figure 3. Voltage profile analysis under different dispatch strategies.
Figure 3 illustrates the voltage magnitude profiles across the IEEE 33-bus distribution network during a highly volatile dispatch period under different strategies. The light green shaded area denotes the strictly enforced safe regulatory voltage band (
p.u.).
As clearly observed, the Standard DRL strategy (represented by the grey dashed line), which optimizes solely for economic efficiency, induces critical physical hazards. Driven purely by price incentives, the standard agent aggressively coordinates massive flexible loads to shift into low-price periods simultaneously. This unconstrained “herd behavior” severely disrupts local power flows, causing the voltage at mid-feeder nodes (Nodes 12 - 20) to plummet to approximately 0.92 p.u. (severe under-voltage). Furthermore, the tail-end nodes (Nodes 28 - 33) experience dangerous over-voltage spikes reaching 1.08 p.u., primarily due to uncoordinated reverse power flows from distributed renewable generation.
By comparison, the Proposed SAC strategy (represented by the solid green line) demonstrates exceptional physical stewardship. By embedding the LinDistFlow constraints into the reinforcement learning environment as a physical safety boundary layer, the proposed agent autonomously learns to distribute demand response actions both spatially and temporally. Consequently, the voltage profile remains remarkably smooth and is strictly confined within the safe regulatory band (ranging from 0.98 to 1.03 p.u.) across all 33 nodes. This geometric evidence validates that the proposed framework successfully bridges the gap between data-driven economic optimization and strict physical grid security.
4.4. Sectoral Fairness and System Efficiency Pareto Trade-Off
A core objective of this study is to explicitly address the distributional “fairness gap” among heterogeneous stakeholders without violating the physical boundaries of grid assets (such as battery DoD limits and node voltages). Traditional economic dispatch models often fall into the trap of the “Green Penalty,” where system-wide emission reductions are achieved by disproportionately burdening sectors with rigid physical constraints.
To visually expose this distributional inequity, Figure 4 compares the sectoral cost savings under different dispatch paradigms. Under the Standard DRL strategy (
), the agent acts as a pure profit-maximizer. It achieves overall system efficiency by over-utilizing the flexibility of the Residential sector (securing an 18.5% cost saving) while forcing the rigid Industrial sector to absorb severe tiered carbon tax penalties, resulting in a negative saving (−22.4%). More critically, this unconstrained pursuit of efficiency frequently pushes energy storage systems to their maximum Depth of Discharge (DoD) limits, accelerating hardware degradation.
Figure 4. Sectoral cost savings distribution.
Conversely, the proposed SAC framework (
) successfully rectifies this imbalance. By internalizing Jain’s Fairness Index and the DoD aging penalty into its multi-objective reward function, the agent ensures a highly equitable distribution of economic benefits, with the Residential, Commercial, and Industrial sectors achieving harmonized cost savings of 8.5%, 7.9%, and 7.2%, respectively.
To systematically quantify the relationship between these conflicting objectives, we tuned the fairness penalty weight (
) from 0.0 to 0.5. Figure 5 illustrates the resulting Sectoral Fairness-Efficiency Pareto Frontier. As depicted, when the dispatch strictly prioritizes economic efficiency (
), the system minimizes operational costs but suffers from severe social inequity (Jain’s Index plunges to approximately 0.41). As the fairness weight gradually increases, the curve exhibits a steep initial ascent, indicating high “fairness elasticity”—meaning substantial equity improvements can be gained with minimal economic losses.
The most significant finding lies at the Optimal Trade-off Point (
). At this configuration, Jain’s Fairness Index reaches 0.92, representing a substantial 124% relative improvement in cross-sector equity. This societal gain is secured by sacrificing a mere 2.8% of the total system efficiency (dropping from 100% to 97.2%), while strictly maintaining battery degradation within the safe operational threshold. This quantifiable Pareto analysis proves that a resilient low-carbon transition—where industrial production is safeguarded, residential comfort is respected, and physical grid assets are protected—is technically achievable.
Figure 5. Sectoral fairness-efficiency pareto frontier.
5. Conclusions
This study conceptualizes the modern smart grid not merely as an algorithmic optimization problem, but as a complex socio-technical ecosystem where physical stability, economic efficiency, and social equity are inextricably linked. By developing a physics-aware SAC framework, we demonstrated that the inherent tension between aggressive decarbonization and cross-sectoral inclusivity can be technically resolved.
Our findings indicate that embedding strict physical safety boundaries (LinDistFlow) and long-term asset protection models (DoD aging) directly into the reinforcement learning dispatch logic is essential. Through Pareto frontier analysis, we established that a 124% relative improvement in cross-sectoral equity (Jain’s Index) can be achieved at a marginal 2.8% cost to total system efficiency. By abandoning the singular pursuit of absolute economic optimality, the proposed strategy effectively eliminates the “Green Penalty” imposed on rigid industrial loads and prevents the over-utilization of residential flexibility. Ultimately, this evidence-based framework provides a scalable, constraint-guaranteed governance tool for grid operators, ensuring that physical infrastructure is safeguarded and the economic dividends of the energy transition are shared equitably across society.