Compare commits

...

12 Commits

59 changed files with 56599 additions and 0 deletions

1304
2303.16633.pdf Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,459 @@
Is Machine Learning in Power Systems Vulnerable?
Yize Chen, Yushi Tan, and Deepjyoti Deka†
Department of Electrical Engineering, University of Washington, Seattle, USA
{yizechen, ystan}@uw.edu
†Theory Division, Los Alamos National Laboratory, Los Alamos, USA
deepjyoti@lanl.gov
arXiv:1808.08197v2 [cs.SY] 27 Aug 2018 Abstract—Recent advances in Machine Learning (ML) have compared to traditional model-based methods, or have proven
led to its broad adoption in a series of power system applica- to be computationally more efficient. These progresses have
tions, ranging from meter data analytics, renewable/load/price shown the great potential of applying ML in power systems.
forecasting to grid security assessment. Although these data-
driven methods yield state-of-the-art performances in many tasks, However, since power systems are at the core of critical
the robustness and security of applying such algorithms in infrastructures, we are taking a step back cautiously, and
modern power grids have not been discussed. In this paper, asking ourselves two simple yet not-answered questions:
we attempt to address the issues regarding the security of
ML applications in power systems. We first show that most “Is ML in power systems vulnerable to data attacks?
of the current ML algorithms proposed in power systems are Are vulnerabilities of ML-integrated power systems easy to
vulnerable to adversarial examples, which are maliciously crafted deciper by an adversary?”
input data. We then adopt and extend a simple yet efficient
algorithm for finding subtle perturbations, which could be used Raw Data Renewables Forecasts
for generating adversaries for both categorical (e.g., user load 
profile classification) and sequential applications (e.g., renewables
generation forecasting). Case studies on classification of power  ML Model
quality disturbances and forecast of building loads demonstrate
the vulnerabilities of current ML algorithms in power networks  Outage Detection
under our adversarial designs. These vulnerabilities call for Load Forecasting
design of robust and secure ML algorithms for real world  Power System
applications. Tasks
 Algorithm ---
I. INTRODUCTION Input --- --- --- ---

The modern power systems, with deeper penetration of ---
renewable generation and higher level of demand-side par- 
ticipation, are faced with increasing degree of complexities
and uncertainties [1], [2]. Reliable operation of the grid in  Tasks
this context calls for improved techniques in system modeling, Fail
assessment and decision making [3], [4], [5]. On the one hand,  Smart Meter Data
smart meters and advanced sensing technologies have made Classiciation
the collection of fine-grained electricity data, both historical 
and streaming, available to system operators [6]. On the other
hand, there is an urgent need of efficient and near real-time    
algorithms to analyze and make better use of these available        
data. IAndvpeurtsaries
Planning and Control
Recent advancements on Machine Learning (ML) algo-
rithms, especially the giant leaps on deep learning, make ML Adversarial Data Adversarial ML Model
a good candidate in solving a series of data-driven problems 
in power systems [7]. To name a few, ML methods such as Craft
Recurrent Neural Networks (RNN) can find its straightforward  Adversaries --- Normal Operation
applications in wind/solar power and building load forecast- --- --- --- ---
ing [8], [9], [10]. In [4], [11], ML algorithms are applied on  Attacks on Learning
power grid outage detection; while in [6], deep convolutional --- Algorithms
neural networks are adopted for classifying user load profiles.
Planning and control problems in power systems, such as 
HVAC control and grid protection policy-making, can also be
solved via ML approaches [9], [12]. All of the algorithms 
mentioned above have achieved either better performances

   
       
Fig. 1: The schematic of the proposed attack on ML in
power systems. (Black) Normal ML operations, which learn
from the given raw data and has various applications in
power systems; (Red) without knowing any knowledge of
targeted ML model (Blue), attackers could generate adversarial
examples by only using raw data. Such adversaries would
exploit the vulnerabilities of the targeted ML models.
Unfortunately, in this paper, we answer both these questions
affirmatively. By adopting and extending the algorithms pro-
posed in [13], [14], we show that most of the ML algorithms
designed for power systems are vulnerable to adversarial
data manipulation, often under very weak assumptions on
adversarial ability. As depicted in Fig. 1, attackers do not
need any access to the operating ML model itself. Using
limited access to the input data, one can generate adversarial
data by injecting designed perturbations to the original data.
The operating ML models performance (e.g., classification
accuracy) is greatly impaired with these adversarial inputs.
To demonstrate that such vulnerabilities broadly exist in
currently proposed ML algorithms for power systems, we
show two typical cases on categorical and time-series cases
respectively. In the first case, we successfully attack a power
quality disturbances classifier [4], [11], which leads to a signals, while Y are the one-hot encoded vectors of respective
misclassification of over 70% of given adversarial voltage sig- labels [4]. The ML model aims to learn a function fθ(X) that
nals (e.g., label sage signals as normal). In the second case, we maps from X to Y with model parameters θ. For convenience,
consider an RNN-based building load forecasting model [9], we sometimes suppress the θ symbol. In order to find such
[15]. After imposing crafted perturbations on input variables mapping, we consider the general algorithm,
such as the temperature setpoints and building occupancies,
the attack results in a significant performance degradation in θ∗ = arg min L(fθ(X), Y ) (1)
the sense that the predicting accuracy drops by a factor of ten.
The adversaries in both cases thus exhibit detrimental impacts θ
on power system operations.
where L(·, ·) is a pre-defined loss function. For instance, L2
A. Contributions loss can be directly used to determine the distance, which is
commonly used in LASSO along with L1 regularization on
In the area of computer vision, researchers have found that θ; while in the case of classification using Neural Networks,
Neural Network models behave poorly on some crafted images we may choose cross-entropy for L(·, ·), and in the case of
created by simply adding noises to clean images [16]. This regression via Neural Networks, an L2 loss is feasible to
kind of misbehavior on noisy input may be more hazardous determine the deviation of models outputs from true values.
for highly automated power systems, since one single wrong
decision made by the ML model could undermine the secure Since many of the ML applications [12], [9], [4] have fo-
operation and lead to a large scale blackout. In light of the cused on utilizing the learning and representation capabilities
criticality of secure sensing and estimation in power grids, provided by Neural Networks, here we briefly illustrate the
this paper includes the following key contributions: learning procedure on Neural Networks parameters (neurons
weights). Neural Networks are composed of stacked, differ-
• We highlight and discuss the general security issues of entiable “neuronal” layers, such as fully connected layers,
ML algorithms in power systems. convolutional layers and activation functions. It is powerful
in learning tasks with high-dimensional X and Y . Though
• We propose an efficient attacking strategy, which could there are many variants of the iterative steps, the standard
find the vulnerabilities of ML algorithms in both static back-propagation procedure via gradient descent for updating
and transient cases. model weights is summarized as follows,
• We provide detailed numerical simulations of the pro- θi+1 = θi η∇θL(fθ(X), Y ) (2)
posed adversarial algorithm design, which reveal the
vulnerabilities of current ML approaches. We also open- where η is the learning rate, and the subscripts on θ denote
source our code for reproducing the results and testing the iteration steps on the weight parameters of the Neural
the security of other physical-ML integrated systems1. Networks. Once the model is trained using X, Y via (1) and
(2), we get an accurate model fθ . Recent progresses on deep
The rest of the paper is organized as follows. In Section II learning have enabled Neural Networks composed of millions
we discuss the general model setup for learning problems in of neurons to outperform all other algorithms in many real-
power systems; in Section III we describe our implementations world applications [7].
of attacks on ML models; in Section IV we show two
representative cases of vulnerabilities on current algorithms; B. RNN Model
we draw conclusions in Section V with more discussions on
the security and robustness of ML model in power systems. In many cases, the states in power systems are not static,
but rather evolve in a sequential manner. For instance, the
II. MACHINE LEARNING IN POWER SYSTEMS future solar power and wind generation have temporal and
spatial correlations. Under this scenario, Recurrent Neural
In this section, we briefly review the ML models of interest, Network (RNN) becomes a good fit as its structure allows
along with the specific model architecture in the case of Neural it to model temporal dependencies for sequence data [17].
Networks. We also introduce the model setup of Recurrent
Neural Networks (RNN), which is a powerful modeling and Modeling via RNN requires a group of sequential input
learning algorithm for sequential data. sample x = {x0, ..., xT }, where T is the memory length.
The weight coefficient of RNN consists of three subsets:
A. Learning Task θin,t, θout,t and θhiddent,t. RNN also allows for linked neurons
between neighboring timesteps. The t-timestep RNN cell is
Machine Learning provides tools for learning the patterns using inputs from hidden state ht and input xt, and delivers
or relationship in available data, which can be generalized to outputs yˆt as well as next steps hidden state ht+1. The t-step
the future operation and decision-making in power systems. RNN cell then completes the following computations:
The supervised learning setup is normally considered, where
a paired training dataset X, Y is given. X, Y are vectors yˆt = fθin,t,θout,t (xt, ht) (3a)
of fixed dimensions. For instance, in the case of power ht+1 = fθhidden,t,θin,t (xt, ht) (3b)
quality classification, X are the collected fixed-length power
By stacking such cells over time, the hidden state can be
1Code repository: https://github.com/chennnnnyize/PowerAdversary used to store and transfer the input information from previous
t-1 t t+1 t+2 for X. Different choices of d for the norm of δX lead to
different constraints on adversarial manipulation:
θ out Input X
Output Y • d = 0 : (4) has similar objective as the Grad0 attack
... θ hidden ... Neurons proposed by [14], where γ denotes how many dimensions
of input data is allowed to be modified.
θ in
• d = ∞ : (4) has similar objective as the Fast Gradient
t-1 t t+1 t+2 Sign (FGS) attack proposed by [16], where γ denotes the
maximum level of noise allowed on each dimension of
Fig. 2: Basic RNN structure composed of hidden, input and δX .
output neurons. The output yˆT is a function of sequential input
x0, ..., xT . with a memory length T . We also observe an interesting connection between (2)
(operator) and (4) (adversary), where the ML training al-
steps. With a memory length of T , the output yˆT is essentially gorithm is essentially training over model parameters θ to
a function of x0, ..., xT . We can then conclude that RNNs minimize model loss; while the adversaries task is quite
modeling and learning strategies also take the form of (1) opposite: to optimize over model inputs X to maximize model
and (2), where X is composed of the sequential vectors loss. Specifically, we look into the case of Neural Networks
{xt, t = 0, ..., T }. involving highly non-convex model in terms of both X and θ
that have been shown to achieve state of the art performance
in several power system applications. Since solving (2) always
yields an accurate model, we are interested in finding ways to
solve (4) which would provide insights on the vulnerabilities
of Neural Networks used in power systems.
B. Crafting Adversarial Examples
III. CRAFTING ATTACKS FOR ML In this sub-section, we propose an efficient attack algorithm
which can incorporate the constraints (4c) with d = 0 and
In this section, we first give mathematical definitions on d = ∞ and exploit the vulnerabilities for both normal Neural
adversarial examples which exploit MLs vulnerabilities. We Networks and sequential models like RNN.
then propose an algorithm, which is a variant of the Neural
Networks attack approach proposed in [16]. Our proposed 1) Adversarial Samples without d = 0 Constraint: Since
algorithm produces adversarial examples for both normal the optimization problem (4) itself is highly nonconvex and
Neural Networks and sequential models such as RNN. high-dimensional, it is intractable to achieve the global optimal
solution X. Alternatively, since the gradients of L(fθ(X), Y )
A. Adversarial Examples encode the loss landscape, we propose a gradient ascent
method on the loss function with respect to X to acquire the
Consider any given supervised ML model fθ with corre- small perturbations which would increase L(fθ(X), Y ):
sponding paired dataset X, Y . We assume that an attacker has
no access to the model f and cannot modify it. Instead we X = X + δX = X + ∇X L(fθ(X), Y ) (5)
consider the mild setting that the attacker can only change the
input samples X to X ytto the model to modify its output where controls the noise level added to the clean samples.
fθ(X) such that is not accurate compared to the ground Crafting attacks following (5) exactly follows the FGS attack
truth Y . Moreover, to avoid detection by the system operator, strategy, which has found vulnerabilities for ML models used
the attacker ensures that adversarial input X is close to the in computer vision. Yet this attack has no constraint on
true inputs X. For instance, an attacker tries to modify the ||δX ||0 ≤ γ · |X|, so attacker has the control and access
system voltage wavelet signals such that ML-based power to modify every entry of X, which adds relatively large
quality classifier would classify falsely, while making sure that perturbations to the input.
such changes on signals would not be observed by the system
operator. Formally, the attacker would craft an adversary via 2) Adversarial Samples with d = 0 Constraint: We now
solving the following optimization problem: discuss the constraint on the number of entries the attacker
is allowed to modify. The attacker shall only change the γ ·
max L(fθ(X), Y ) (4a) |X| input entries, which have the most impact on L(fθ(X +
δX δX ), Y ). Formally, let A define the set of largest γ ·|X| entries
of ∇X L(fθ(X + δX ), Y ), while let S denote the entire set
s.t. X = X + δX (4b) of entries. Then we propose the following operation to get
adversarial samples with constraint ||δX ||0 ≤ γ · |X|:
||δX ||d ≤ γ · |X|
(4c)
where δX in (4b) is the perturbation we add to the clean δXA = ∇XA L(fθ(X), Y ) (6a)
samples X; (4c) constrains the level of perturbation γ allowed
δXS\A = 0 (6b)
Adversarial Test Samples Clean Test Samples Normal|Normal Sag|Sag Impulse|Impulse Ground Truth|NN Classification Results
Distortion|Distortion
Sampling Steps
Voltage (p.u.) Normal|Sag Voltage (p.u.) Voltage (p.u.) Voltage (p.u.)
Sampling Steps Sampling Steps Sampling Steps
Sag|Normal Impulse|Normal Distortion|Normal
Voltage (p.u.) Voltage (p.u.) Voltage (p.u.) Voltage (p.u.)
Sampling Steps Sampling Steps Sampling Steps Sampling Steps
Fig. 3: Case studies on power quality signal classification with randomly selected clean samples from our test sets (top) versus corresponding
adversarial samples crafted by Algorithm. 1 (bottom). The original Neural Networks could accurately classify four classes of power signals,
yet it fails to classify adversarial samples with high probability.
S\A denotes the complement set of input entries. The final Algorithm 1 Crafting Adversarial Examples
adversarial examples can still be generated via X = X + δX .
Since all the ML models considered in this paper, including Input: Clean pairing training data X, Y , input entries set A
normal Neural Networks and RNN structures are all differen- Input: Training iterations Niter, number of adversarial exam-
tiable with respect to input, we highlight the universality of
the proposed algorithm on finding the vulnerabilities of any ples Nadv
trained models. Input: Clean testing samples {xi, yi}, i = 1, ..., Nadv
Initialize: Attacker surrogate ML model fθ
Note that even though (5) only implements once in our Initialize: Adversarial examples set X ← ∅
proposed algorithm without any iterative optimization on δX ,
we show in Section IV that the trained, unknown model is # Training the surrogate ML model
vulnerable to such attacks. for iteration= 0, ..., Niter do
We also distinguish our work from previous attack and de- Update θ using gradient descent (2) on X, Y
fense research in power systems [18], [19]. Previous research end for
only exploited the vulnerabilities of state estimation, while # Find adversarial examples using clean data {xi, yi}
we found weaknesses of general ML tasks in power systems. for iteration = 0, ..., Nadv do
Moreover, the proposed algorithm works under the black-box
setting. To put it in other words, the attacker only needs to train Calculate gradients w.r.t xi: δxi = ∇xi L(fθ (xi), yi)
its own version of surrogate ML model fθ without knowing Find set A: the largest γ · |X| gradients of δxi
any knowledge of fθ. By finding adversarial examples X of δxSi\A = 0
fθ, X can then be used for attacking unknown ML model fθ xi = xi + δxi
operating in power systems. We summarize the algorithm in end for
Algorithm 1. X .insert(xi )
IV. CASE STUDIES Neural Networks model. Two Nvidia Geforce GTX TITAN
X GPUs are used for training acceleration and the average
We evaluate the proposed algorithms performance on two training times of both tasks are within 10 seconds.
tasks: power quality assessment via classifying the voltage
signals by feed-forward Neural Network [4], [11], and short- A. Power Quality Classification
term building load forecasting via RNN [9]. We set up the
deep learning models using Tensorflow and Keras, two Python In this task, we would like to investigate if ML model could
open-source packages. We adopt rectified linear unit (ReLU) detect the power quality disturbances in the waveform signals.
activation functions, dropout layers and Stochastic Gradient Past research claim that Neural Networks based classifier
Descent, a variant of (2) to improve the performance of our would detect those disturbances in signals, which would then
avoid damages and improve the power quality [4], [11]. Here,
we attempt to add slight perturbations to the input signals and
see if such classifier would fail to classify these disturbances.
100 γ=5% (a) 23 Temperature Setpoints [C] (b) 140 Human Occupancy
80 γ=10%
60 γ=20% 22 120
γ=40% 21 100
20 80
Test Accuracy 40 19 Original Signal 60 Original Signal
18 Adversarial Signal 40 Adversarial Signal
17 20
16 Wed Thu Fri Mon Tue Wed Thu Fri
15 0
20 Tue
Mon Tue Wed
0 0.1 0.2 0.3 0.4 0.5 (c) 9 × 108 Energy Consumption [J]
value 8
7
Fig. 4: Voltage signal classification accuracy with varying 6 Ground Truth
5 Original Prediction
noise level and input perturbation percentage γ for adver- 4 Adversarial Prediction
saries X. 3
2 Fri
1) Data Description: We consider four types of wave 1 Thu
0
Mon
signals as illustrated in the first row of Fig. 3, with one Fig. 5: Building forecasts results under = 0.03 and γ = 10.
(a) and (b) the data profiles for one weeks sub-region tem-
group of normal signals, and three types of disturbances: sags, perature setpoints and occupancy level before and after the
attack; (c) the ground truth of one weeks energy consumption,
impulses and distortion. We construct a labeled dataset with predicted energy consumption using clean testing data and
predicted result after injecting adversarial data profiles.
200 signals from each class, with each signal of fixed length.
After shuffling and separating 1 of the data as testing set,
4
we construct a 3layer fully connected Neural Networks to
classify these signals into their respective class.
2) Simulation Results: We firstly observe the Neural Net- Department of Energy (DoE CRB) [20]. The building has
works classifier is powerful in classifying wave signals with a total floor area of 498, 584 square feet which is divided
different source of disturbances. The model achieves 97.5% into 16 separate zones. We simulate the building running
testing accuracy on the split test data. through the year of 2004 in Seattle, WA, and record xt,
yt with a resolution of 10 minutes, where xt includes data
Then we test if such trained classifier is able to correctly coming from various sensors, such as building occupancy,
classify the adversarial signals crafted by Algorithm 1. As
shown in Fig. 3, with = 0.03 and γ = 10%, the temperature setpoint and temperature measurements, and yt
is the building energy consumption. We shuffle and separate
black-box classifier wrongly classifies the adversarial signals.
Specifically, the adversarial impulse and distortion signals 2 months data as our stand-alone testing dataset for both
look similar to corresponding clean signals, and can be still predictive accuracy validation and vulnerabilities testing. The
classified as impulse and distortion signals by a technician, RNN model is composed of 1 recurrent layer and 2 subsequent
yet the ML model incorrectly regards them as normal signals. fully-connected layers with a memory length of 2 hours. Our
As shown in Fig. 4, we qualitatively test the adversaries ML model is also easy to extend to Long Short-Term Memory
performance by evaluating the Neural Networks classification network (LSTM) or any other variants of RNN structure. Since
result on adversarial examples. The models classification all these architectures are differentiable w.r.t xt, they would
exhibit similar vulnerabilities to proposed adversaries.
accuracy drops drastically with higher level of and γ, which
meets our assumption. When γ = 40% in which our algorithm MAPE Temp Occupancy Prediction
Deviation Deviation Error
changes 40% entries of the input signal, by only injecting a = 0.0 5.29%
= 0.01 0% 0%
small perturbation = 0.1, the ML model can only classify = 0.03 0.35% 2.44% 25.90%
= 0.05 1.07% 6.94% 31.55%
67.5% of the test samples. 1.86% 12.36% 55.37%
B. Building Load Forecasting TABLE I: The building load forecasting performance using
adversarial examples with varying noise level under γ =
In this example, we first train an RNN model, which could 10%. Note when = 0 it is the case with clean testing data.
forecast building load accurately by using input features such
as temperature measurements, building occupancy and solar 2) Simulation Results: We use the Mean Absolute Per-
radiation. We then construct sequential adversarial inputs by centage Error (MAPE) to evaluate both the forecasting error
using a surrogate model and evaluate the vulnerabilities on the and the input feature deviation caused by adding adversarial
original load forecasting model. perturbations:
1) Data Description: We set up our building simulation M AP E(var, var) = 1 N |var var|
platforms using EnergyPluss 12-storey large office build- × 100%
ing listed in the commercial reference buildings from U.S. N var (7)
i=1
where var represents either input feature or output energy con- on the features involved that may no be modifiable. Thus the
sumption, while var represents the corresponding adversarial
feature or output energy consumption prediction. We test the defense against such intrusion attacks on ML algorithms in
ML model performance by using the same one-week of testing
data with different level of on adversarial data. power systems is still a urgent yet open problem.
As can be seen in Table. I, the model performs well with REFERENCES
only 5.29% MAPE by using clean data. However, by only
injecting δX with = 0.01, the models forecast has a 25.90% [1] M. H. Albadi and E. F. El-Saadany, “A summary of demand response
deviation from the ground truth. The results worsen with more in electricity markets,” Electric power systems research, vol. 78, no. 11,
intense level of noise injected. Meanwhile, the input features pp. 19891996, 2008.
have little deviation from the clean data. We can also visually
inspect the vulnerabilities of the RNN model in Fig. 5 when [2] M. R. Patel, Wind and solar power systems: design, analysis, and
we only change 10% of the input features with noise level = operation. CRC press, 2005.
0.03. The output prediction jumps a lot compared to previous
forecasts, which is not informative for building operators. [3] Y. Chen, Y. Wang, D. Kirschen, and B. Zhang, “Model-free renewable
scenario generation using generative adversarial networks,” IEEE Trans-
V. CONCLUSION AND DISCUSSION actions on Power Systems, vol. 33, no. 3, pp. 32653275, 2018.
In this work, we look into the security and vulnerability of [4] M. Valtierra-Rodriguez, R. de Jesus Romero-Troncoso, R. A. Osornio-
Machine Learning algorithms in power systems to adversaries. Rios, and A. Garcia-Perez, “Detection and classification of single and
We propose an attack algorithm that universally exploits the combined power quality disturbances using neural networks,” IEEE
vulnerabilities of ML in power systems, especially Neural Transactions on Industrial Electronics, vol. 61, no. 5, pp. 24732482,
Network based algorithms. The adversarial strategy is practical 2014.
as it does not change the system operators ML engine but ma-
nipulates only input data. Case studies on two representative [5] P. Li, H. Wang, and B. Zhang, “A distributed online pricing strategy for
power system examples reveal the vulnerability of proposed demand response programs,” arXiv preprint arXiv:1702.05551, 2017.
ML algorithms. As researchers havent looked into such vul-
nerabilities in current algorithm design, we hope our work [6] Y. Wang, Q. Chen, D. Gan, J. Yang, D. S. Kirschen, and C. Kang, “Deep
will stimulate future discussions to increase the robustness of learning-based socio-demographic information identification from smart
current ML algorithms in power systems to data manipulation. meter data,” IEEE Transactions on Smart Grid, 2018.
Going forward, the following directions regarding secure ML
applications are worth investigating, and we are also interested [7] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,
in investigating security issues of broader estimation/learning no. 7553, p. 436, 2015.
algorithms in power system operation and control.
[8] T. Hong, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, and R. J.
A. Adversaries in Learning Hyndman, “Probabilistic energy forecasting: Global energy forecasting
competition 2014 and beyond,” 2016.
In this work, we only discuss the scenario that ML models
in power systems output inaccurate results with adversarial [9] Y. Chen, Y. Shi, and B. Zhang, “Modeling and optimization of complex
inputs. Stronger attacks such as targeted attack can be consid- building energy systems with deep neural networks,” in Signals, Systems,
ered where instead of solely maximizing the predicted loss, the and Computers, 2017 51st Asilomar Conference on. IEEE, 2017, pp.
attacker can also add perturbations to falsify trained models to 13681373.
a new adversarial objective. Such attacks are discussed in our
previous work [14] which can be extended to the case in power [10] Y. Chen, X. Wang, and B. Zhang, “An unsupervised deep learning ap-
systems. Moreover, there are also vulnerabilities to the model proach for scenario forecasts,” Power Systems Computation Conference
itself. For instance, attacker would hack into the operation (PSCC), 2018.
room to change the weights of trained model. Even though
there is a line of work addressing the security issues in the [11] R. Eskandarpour and A. Khodaei, “Machine learning based power grid
control, communication and infrastructure of power systems, outage prediction in response to extreme events,” IEEE Transactions on
there is scope for work to address the security of learning in Power Systems, vol. 32, no. 4, pp. 33153316, 2017.
power and cyber-physical systems.
[12] C. Lassetter, E. Cotilla-Sanchez, and J. Kim, “Learning schemes for
B. Defense for ML Algorithms in Power Systems power system planning and control.” in System Sciences (HICSS), 51st
Hawaii International Conference on, 2018.
Up to now, there has been some work on defending ML
attacks in the research of computer vision. Yet most of them [13] N. Papernot, P. McDaniel, A. Swami, and R. Harang, “Crafting ad-
operate on the ensemble or filtering of input images [21], versarial input sequences for recurrent neural networks,” in Military
which may not be applicable for power systems as most of Communications Conference, MILCOM 2016-2016 IEEE. IEEE, 2016,
applications involved in power have clear physical definitions pp. 4954.
[14] H. Hosseini, Y. Chen, S. Kannan, B. Zhang, and R. Poovendran,
“Blocking transferability of adversarial examples in black-box learning
systems,” arXiv preprint arXiv:1703.04318, 2017.
[15] H. Hahn, S. Meyer-Nieberg, and S. Pickl, “Electric load forecasting
methods: Tools for decision making,” European journal of operational
research, vol. 199, no. 3, pp. 902907, 2009.
[16] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
[17] T. Mikolov, M. Karafia´t, L. Burget, J. Cˇ ernocky`, and S. Khudanpur,
“Recurrent neural network based language model,” in Eleventh Annual
Conference of the International Speech Communication Association,
2010.
[18] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against
state estimation in electric power grids,” ACM Transactions on Informa-
tion and System Security (TISSEC), vol. 14, no. 1, p. 13, 2011.
[19] Y. Huang, M. Esmalifalak, H. Nguyen, R. Zheng, Z. Han, H. Li,
and L. Song, “Bad data injection in smart grid: attack and defense
mechanisms,” IEEE Communications Magazine, vol. 51, no. 1, pp. 27
33, 2013.
[20] P. Torcellini, M. Deru, B. Griffith, K. Benne, M. Halverson,
D. Winiarski, and D. Crawley, “Doe commercial building benchmark
models,” in Proceeding of, 2008, pp. 1722.
[21] F. Trame`r, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and
P. McDaniel, “Ensemble adversarial training: Attacks and defenses,”
arXiv preprint arXiv:1705.07204, 2017.

BIN
Liu.zip Normal file

Binary file not shown.

View File

@ -1,2 +1,3 @@
# Liu
# 阶段一的代码在分支里面~

View File

@ -0,0 +1,79 @@
from data_load import data_format
import tensorflow as tf
import numpy as np
from keras.layers import Dropout
from keras import regularizers
from keras.callbacks import TensorBoard, LearningRateScheduler
import keras
def model_train(X_train, X_test, Y_train, Y_test):
"""_summary_
Args:
X_train (np.array): _description_
X_test (np.array): _description_
Y_train (np.array): _description_
Y_test (np.array): _description_
"""
# 数据随机化
np.random.seed(7)
np.random.shuffle(X_train)
np.random.seed(7)
np.random.shuffle(Y_train)
tf.random.set_seed(7)
# 构建模型
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(100, return_sequences=True), # 第一层
Dropout(0.2),
tf.keras.layers.LSTM(80), # 第二层
Dropout(0.2),
tf.keras.layers.Dense(
1, kernel_regularizer=regularizers.l2(0.01))
])
# 损失函数
loss_fn = tf.keras.losses.MeanSquaredError()
# 编译模型
model.compile(
optimizer='SGD',
loss=loss_fn,
metrics=[tf.keras.metrics.MeanAbsolutePercentageError()]
)
# 定义学习率指数递减的函数
def lr_schedule(epoch):
initial_learning_rate = 0.01
decay_rate = 0.1
decay_steps = 2000
new_learning_rate = initial_learning_rate * \
decay_rate ** (epoch / decay_steps)
return new_learning_rate
# 定义学习率调度器
lr_scheduler = LearningRateScheduler(lr_schedule)
# TensorBoard 回调
log_dir = "logs/fit"
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
# 训练模型,添加 TensorBoard 回调
model.fit(X_train, Y_train, epochs=6000,
callbacks=[tensorboard_callback, lr_scheduler], batch_size=256)
loss, mape = model.evaluate(X_test, Y_test)
print("Test loss:", loss,)
print("test mape:", mape)
# 保存模型
keras.models.save_model(model, 'model')
if __name__ == "__main__":
X_train, X_test, Y_train, Y_test = data_format(
'data/archive/PowerQualityDistributionDataset1.csv', md = 1)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
model_train(X_train, X_test, Y_train, Y_test)

Binary file not shown.

Binary file not shown.

72
attack/attack_craft.py Normal file
View File

@ -0,0 +1,72 @@
import tensorflow as tf
def craft_adv(X, Y, gamma, learning_rate, model, loss_fn, md = 0):
# 将测试数据转换为TensorFlow张量
X_test_tensor = tf.convert_to_tensor(X, dtype=tf.float64)
if md == 0:
Y_test_tensor = tf.convert_to_tensor(Y, dtype=tf.int32)
elif md == 1:
Y_test_tensor = tf.convert_to_tensor(Y, dtype=tf.float64)
# 初始化更新后的数据集
X_train_updated = []
for i in range(X_test_tensor.shape[0]):
# 对每个样本使用GradientTape
with tf.GradientTape() as tape:
# 监视当前样本
current_sample = X_test_tensor[i:i+1]
tape.watch(current_sample)
# 对当前样本进行预测并计算损失
predictions = model(current_sample)
loss = loss_fn(Y_test_tensor[i:i+1], predictions)
# 计算关于输入的梯度
gradients = tape.gradient(loss, current_sample)
# 平坦化梯度以便进行处理
flattened_gradients = tf.reshape(gradients, [-1])
# 选择最大的γ * |X|个梯度
num_gradients_to_select = int(gamma * tf.size(flattened_gradients, out_type=tf.dtypes.float32))
top_gradients_indices = tf.argsort(flattened_gradients, direction='DESCENDING')[:num_gradients_to_select]
# 创建新的梯度张量,初始值为原始梯度
updated_gradients = tf.identity(flattened_gradients)
# 创建布尔掩码,用于选择特定梯度
mask = tf.ones_like(updated_gradients, dtype=bool)
mask = tf.tensor_scatter_nd_update(mask, tf.expand_dims(top_gradients_indices, 1), tf.zeros_like(top_gradients_indices, dtype=bool))
# 应用掩码更新梯度
updated_gradients = tf.where(mask, tf.zeros_like(updated_gradients), updated_gradients)
# 将梯度恢复到原始形状
updated_gradients = tf.reshape(updated_gradients, tf.shape(gradients))
# 应用学习率到梯度
scaled_gradients = learning_rate * updated_gradients
# 更新当前样本
current_sample_updated = tf.add(current_sample, scaled_gradients)
# 将更新后的样本添加到列表中
X_train_updated.append(current_sample_updated.numpy())
# 将列表转换为张量
X_train_updated = tf.concat(X_train_updated, axis=0)
# 评估更新后的模型
if md == 1:
loss, mape = model.evaluate(X_train_updated, Y)
print(f"Accuracy gamma: {gamma},learning:{learning_rate}", loss)
return X_train_updated, loss, mape
elif md == 0:
loss, accuracy = model.evaluate(X_train_updated, Y)
print(f"Accuracy gamma: {gamma},learning:{learning_rate},accuracy{accuracy}" )
return X_train_updated, accuracy

125
attack/data_load.py Normal file
View File

@ -0,0 +1,125 @@
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
def data_format(data_path, is_column=False, rate=0.25, md=0):
"""_summary_
Args:
data_path (_type_): 数据路径
is_column (bool, optional): 是否为列数据. Defaults to False.
rate (float, optional): 实验集划分的比例. Defaults to 0.25.
md:模式,0为分类,1为预测
Returns:X_train, X_test, Y_train, Y_test
_type_: np.array
"""
if md == 0:
# 读入数据
X, Y = data_load_classify(data_path, is_column)
# 归一化数据
sc = MinMaxScaler(feature_range=(-1, 1))
X = sc.fit_transform(X)
elif md == 1:
# 读入数据
X = data_load_forecast(data_path, is_column)
# 归一化数据
sc = MinMaxScaler(feature_range=(-1, 1))
X = sc.fit_transform(X)
# 分离Y
# 分离第 128 个元素
Y = X[:, -1]
# 分离前 127 个元素
X = X[:, :-1]
# 划分数据集75%用于训练25%用于测试
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size=rate, random_state=7)
return X_train, X_test, Y_train, Y_test
def data_load_classify(data_path, is_column=False):
"""
数据加载
data_path: 数据路径
is_column: 是否是列数据
return:X,Y
"""
# 读取csv文件
df = pd.read_csv(data_path)
# 进行数据清洗
data_clean(df, is_column)
# 去除第一列
df = df.drop(df.columns[0], axis=1)
# 初始化X,Y
X, Y = [], []
# 遍历DataFrame的每一行
for index, row in df.iterrows():
# 获取前128个数据项
X.append(row.iloc[0:128])
Y.append(int(row.iloc[128]))
return np.array(X), np.array(Y)
def data_load_forecast(data_path, is_column=False):
"""
数据加载
data_path: 数据路径
is_column: 是否是列数据
return:X,Y
"""
# 读取csv文件
df = pd.read_csv(data_path)
# 进行数据清洗
data_clean(df, is_column)
df = df[df['output'] == 1]
# 去除第一列
df = df.drop(df.columns[0], axis=1)
# 初始化X,Y
X= []
# 遍历DataFrame的每一行
for index, row in df.iterrows():
# 获取前127个数据项
X.append(row.iloc[0:128])
return np.array(X)
def data_clean(data, is_column=False):
"""_summary_
Args:
data (_type_): csv数据
is_column (bool, optional): 清除含有NaN数据的列. Defaults to False.即清除含有NaN数据的行
Returns:
_type_: 清洗过的数据
"""
if not is_column:
data = data.dropna(axis=0)
return data
else:
data = data.dropna(axis=1)
return data
if __name__ == '__main__':
# 加载数据
X_train, X_test, Y_train, Y_test = data_format(
'data/archive/PowerQualityDistributionDataset1.csv', md = 1)
print(Y_train)

71
attack/main.py Normal file
View File

@ -0,0 +1,71 @@
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import keras
from data_load import data_format
from attack_craft import craft_adv
md = 0
print("请输入:0或1\n0为攻击全连接层模型的结果\n1为攻击LSTM(RNN)模型的结果")
md = int(input())
# 加载数据集
X_train, X_test, Y_train, Y_test = data_format(
'data/archive/PowerQualityDistributionDataset1.csv', md=md)
# 设置随机种子以确保重现性
np.random.seed(7)
np.random.shuffle(X_test)
np.random.seed(7)
np.random.shuffle(Y_test)
tf.random.set_seed(7)
if md == 1:
# 加载训练好的模型
model = keras.models.load_model('model_rnn')
# 定义损失函数
loss_fn = tf.keras.losses.MeanSquaredError()
elif md == 0:
# 加载训练好的模型
model = keras.models.load_model('model_normal')
# 定义损失函数
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# 用于存储不同gamma值下的准确率
accuracy_per_gamma = {}
# 遍历不同的gamma值
for gamma in [0.05, 0.1, 0.2, 0.4]:
# 遍历不同的学习率
# 用于存储不同学习率下的准确率
accuracy_list = []
for learning_rate in [0.1, 0.2, 0.3, 0.4, 0.5]:
if md == 1:
x_adv, loss, mape = craft_adv(
X_test, Y_test, gamma, learning_rate, model, loss_fn, md = 1)
accuracy_list.append(100 - mape)
elif md == 0:
x_adv, accuracy = craft_adv(
X_test, Y_test, gamma, learning_rate, model, loss_fn)
accuracy_list.append(accuracy)
# 存储每个gamma值下的准确率
accuracy_per_gamma[gamma] = accuracy_list
# 定义学习率和gamma值
learning_rates = [0.1, 0.2, 0.3, 0.4, 0.5]
gammas = [0.05, 0.1, 0.2, 0.4]
# 创建并绘制结果图
plt.figure(figsize=(10, 6))
for gamma in gammas:
plt.plot(learning_rates,
accuracy_per_gamma[gamma], marker='o', label=f'Gamma={gamma}')
plt.title('Accuracy vs Learning Rate for Different Gammas')
plt.xlabel('Learning Rate')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

View File

@ -0,0 +1,75 @@
from data_load import data_format
import tensorflow as tf
import numpy as np
from keras.layers import Dropout
from keras import regularizers
from keras.callbacks import TensorBoard, LearningRateScheduler
import keras
def model_train(X_train, X_test, Y_train, Y_test):
"""_summary_
Args:
X_train (np.array): _description_
X_test (np.array): _description_
Y_train (np.array): _description_
Y_test (np.array): _description_
"""
# 数据随机化
np.random.seed(7)
np.random.shuffle(X_train)
np.random.seed(7)
np.random.shuffle(Y_train)
tf.random.set_seed(7)
# 构建模型
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10000, activation='relu'), # 第一层
Dropout(0.2),
tf.keras.layers.Dense(800, activation='relu'), # 第一层
Dropout(0.2),
tf.keras.layers.Dense(
(len(np.unique(Y_train)) + 1), activation='relu', kernel_regularizer=regularizers.l2(0.01))
])
# 损失函数
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# 编译模型
model.compile(
optimizer='SGD',
loss=loss_fn,
metrics=['accuracy'])
# 定义学习率指数递减的函数
def lr_schedule(epoch):
initial_learning_rate = 0.01
decay_rate = 0.1
decay_steps = 1500
new_learning_rate = initial_learning_rate * \
decay_rate ** (epoch / decay_steps)
return new_learning_rate
# 定义学习率调度器
lr_scheduler = LearningRateScheduler(lr_schedule)
# TensorBoard 回调
log_dir = "logs/fit"
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
# 训练模型,添加 TensorBoard 回调
model.fit(X_train, Y_train, epochs=1000,
callbacks=[tensorboard_callback, lr_scheduler], batch_size=256)
loss, accuracy = model.evaluate(X_test, Y_test)
print("Test accuracy:", accuracy)
# 保存模型
keras.models.save_model(model, 'model')
if __name__ == "__main__":
X_train, X_test, Y_train, Y_test = data_format(
'data/archive/PowerQualityDistributionDataset1.csv')
model_train(X_train, X_test, Y_train, Y_test)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,86 @@
from data_load import data_format
import tensorflow as tf
import numpy as np
from keras.layers import Dropout
from keras import regularizers
from keras.callbacks import TensorBoard, LearningRateScheduler
import keras
def model_train(X_train, X_test, Y_train, Y_test):
"""_summary_
Args:
X_train (np.array): _description_
X_test (np.array): _description_
Y_train (np.array): _description_
Y_test (np.array): _description_
"""
# 数据随机化
np.random.seed(7)
np.random.shuffle(X_train)
np.random.seed(7)
np.random.shuffle(Y_train)
tf.random.set_seed(7)
# 构建模型
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(100, return_sequences=True), # 第一层
Dropout(0.2),
tf.keras.layers.LSTM(80), # 第二层
Dropout(0.2),
tf.keras.layers.Dense(
1, kernel_regularizer=regularizers.l2(0.01))
])
# 损失函数
loss_fn = tf.keras.losses.MeanSquaredError()
# 编译模型
model.compile(
optimizer='SGD',
loss=loss_fn)
# 定义学习率指数递减的函数
def lr_schedule(epoch):
initial_learning_rate = 0.01
decay_rate = 0.1
decay_steps = 1500
new_learning_rate = initial_learning_rate * \
decay_rate ** (epoch / decay_steps)
return new_learning_rate
# 定义学习率调度器
lr_scheduler = LearningRateScheduler(lr_schedule)
# TensorBoard 回调
log_dir = "logs/fit"
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
# EarlyStopping 回调
early_stopping_callback = tf.keras.callbacks.EarlyStopping(
monitor='val_loss', # 监控模型的验证集损失
patience=10, # 设置“忍耐”周期例如10个epoch
min_delta=0.001, # 表示观察到的最小改变量,小于这个量的改变被认为是没有显著改善
mode='min', # 'min' 表示监控量loss减小被认为是改善
verbose=1 # 打印信息
)
# 训练模型,添加 TensorBoard 回调
model.fit(X_train, Y_train, epochs=1000,
callbacks=[tensorboard_callback, lr_scheduler, early_stopping_callback], batch_size=256, validation_split=0.2)
loss = model.evaluate(X_test, Y_test)
print("Test loss:", loss)
# 保存模型
keras.models.save_model(model, 'model')
if __name__ == "__main__":
X_train, X_test, Y_train, Y_test = data_format(
'data/archive/PowerQualityDistributionDataset1.csv')
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
model_train(X_train, X_test, Y_train, Y_test)

Binary file not shown.

Binary file not shown.

49
defend/attack_craft.py Normal file
View File

@ -0,0 +1,49 @@
import tensorflow as tf
def craft_adv(X, Y, gamma, learning_rate, model, loss_fn):
# 将测试数据转换为TensorFlow张量
X_test_tensor = tf.convert_to_tensor(X, dtype=tf.float64)
Y_test_tensor = tf.convert_to_tensor(Y, dtype=tf.float64)
# 使用GradientTape计算梯度
with tf.GradientTape() as tape:
tape.watch(X_test_tensor)
predictions = model(X_test_tensor)
loss = loss_fn(Y_test_tensor, predictions)
# 计算关于输入的梯度
gradients = tape.gradient(loss, X_test_tensor)
# 平坦化梯度以便进行处理
flattened_gradients = tf.reshape(gradients, [-1])
# 选择最大的γ * |X|个梯度
num_gradients_to_select = int(gamma * tf.size(flattened_gradients, out_type=tf.dtypes.float32))
top_gradients_indices = tf.argsort(flattened_gradients, direction='DESCENDING')[:num_gradients_to_select]
# 创建新的梯度张量,初始值为原始梯度
updated_gradients = tf.identity(flattened_gradients)
# 创建布尔掩码,用于选择特定梯度
mask = tf.ones_like(updated_gradients, dtype=bool)
mask = tf.tensor_scatter_nd_update(mask, tf.expand_dims(top_gradients_indices, 1), tf.zeros_like(top_gradients_indices, dtype=bool))
# 应用掩码更新梯度
updated_gradients = tf.where(mask, tf.zeros_like(updated_gradients), updated_gradients)
# 将梯度恢复到原始形状
updated_gradients = tf.reshape(updated_gradients, tf.shape(gradients))
# 应用学习率到梯度
scaled_gradients = (learning_rate * 700) * updated_gradients
# 更新X_test_tensor
X_train_updated = tf.add(X_test_tensor, scaled_gradients)
X_train_updated = X_train_updated.numpy()
# 评估更新后的模型
loss = model.evaluate(X_train_updated, Y)
print(f"Accuracy gamma: {gamma},learning:{learning_rate}", loss)
return X_train_updated, loss

89
defend/data_load.py Normal file
View File

@ -0,0 +1,89 @@
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
def data_format(data_path, is_column=False, rate=0.25):
"""_summary_
Args:
data_path (_type_): 数据路径
is_column (bool, optional): 是否为列数据. Defaults to False.
rate (float, optional): 实验集划分的比例. Defaults to 0.25.
md:模式,0为分类,1为预测
Returns:X_train, X_test, Y_train, Y_test
_type_: np.array
"""
# 读入数据
X = data_load_forecast(data_path, is_column)
# 归一化数据
sc = MinMaxScaler(feature_range=(-1, 1))
X = sc.fit_transform(X)
# 分离Y
# 分离第 128 个元素
Y = X[:, -1]
# 分离前 127 个元素
X = X[:, :-1]
# 划分数据集75%用于训练25%用于测试
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size=rate, random_state=7)
return X_train, X_test, Y_train, Y_test
def data_load_forecast(data_path, is_column=False):
"""
数据加载
data_path: 数据路径
is_column: 是否是列数据
return:X,Y
"""
# 读取csv文件
df = pd.read_csv(data_path)
# 进行数据清洗
data_clean(df, is_column)
df = df[df['output'] == 1]
# 去除第一列
df = df.drop(df.columns[0], axis=1)
# 初始化X,Y
X = []
# 遍历DataFrame的每一行
for index, row in df.iterrows():
# 获取前127个数据项
X.append(row.iloc[0:128])
return np.array(X)
def data_clean(data, is_column=False):
"""_summary_
Args:
data (_type_): csv数据
is_column (bool, optional): 清除含有NaN数据的列. Defaults to False.即清除含有NaN数据的行
Returns:
_type_: 清洗过的数据
"""
if not is_column:
data = data.dropna(axis=0)
return data
else:
data = data.dropna(axis=1)
return data
if __name__ == '__main__':
# 加载数据
X_train, X_test, Y_train, Y_test = data_format(
'data/archive/PowerQualityDistributionDataset1.csv')
print(X_train.shape)

36
defend/main.py Normal file
View File

@ -0,0 +1,36 @@
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import keras
from data_load import data_format
from attack_craft import craft_adv
# 加载数据集
X_train, X_test, Y_train, Y_test = data_format(
'data/archive/PowerQualityDistributionDataset1.csv')
# 设置随机种子以确保重现性
np.random.seed(7)
np.random.shuffle(X_test)
np.random.seed(7)
np.random.shuffle(Y_test)
tf.random.set_seed(7)
# 加载训练好的模型
model = keras.models.load_model('model')
model_adv = keras.models.load_model('model_adv')
# 定义损失函数
loss_fn = tf.keras.losses.MeanSquaredError()
x_adv, loss = craft_adv(
X_test, Y_test, 0.4, 0.5, model, loss_fn)
loss_adv = model_adv.evaluate(x_adv, Y_test)
print(f"原始模型:{loss},对抗训练后的模型:{loss_adv}")

13
model/keras_metadata.pb Normal file

File diff suppressed because one or more lines are too long

BIN
model/saved_model.pb Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

BIN
model_adv/saved_model.pb Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

BIN
model_normal/saved_model.pb Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

BIN
model_rnn/saved_model.pb Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,16 @@
#build the NN models: RNN module
import tensorflow
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
def dnn_model(input_dim):
model = Sequential()
model.add(Dense(128, input_dim=input_dim))
model.add(Dropout(0.2))
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(4,init='normal', activation='softmax'))
return model

View File

@ -0,0 +1 @@
This is the code example for attacking a normal Neural Networks with adversarial inputs.

View File

@ -0,0 +1,133 @@
import tensorflow as tf
import keras
from keras.optimizers import SGD
from Neural_Net_Module import dnn_model
import csv
from numpy import shape
import numpy as np
import matplotlib.pyplot as plt
batch_size=32
nb_epoch=15
eps=0.5
gamma=80
def scaled_gradient(x, y, predictions):
#loss: the mean of loss(cross entropy)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=predictions, labels=y))
grad, = tf.gradients(loss, x)
signed_grad = tf.sign(grad)
return grad, signed_grad
if __name__ == '__main__':
if keras.backend.image_dim_ordering() != 'tf':
keras.backend.set_image_dim_ordering('tf')
sess = tf.Session()
keras.backend.set_session(sess)
with open('normal.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
rows=np.array(rows, dtype=float)
data=rows
label=np.zeros((200,1))
with open('sag.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
rows=np.array(rows, dtype=float)
data=np.concatenate((data, rows))
labels=np.ones((200,1))
label=np.concatenate((label, labels))
with open('distortion.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
rows=np.array(rows, dtype=float)
data=np.concatenate((data, rows))
labels=2*np.ones((200,1))
label=np.concatenate((label, labels))
with open('impulse.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
rows=np.array(rows, dtype=float)
data=np.concatenate((data, rows))
labels=3*np.ones((200,1))
label=np.concatenate((label, labels))
label=label.reshape(-1,1)
label=keras.utils.to_categorical(label, num_classes=None)
print("Input label shape", shape(label))
print("Input data shape", shape(data))
index = np.arange(len(label))
np.random.shuffle(index)
label = label[index]
data = data[index]
trX=data[:600]
trY=label[:600]
teX=data[600:]
teY=label[600:]
x = tf.placeholder(tf.float32, shape=(None, 1000))
y = tf.placeholder(tf.float32, shape=(None, 4))
model = dnn_model(input_dim=1000)
predictions = model(x)
sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(trX, trY, batch_size=batch_size, epochs=nb_epoch, shuffle=True) # validation_split=0.1
# model.save_weights('dnn_clean.h5')
score = model.evaluate(teX, teY, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
with sess.as_default():
adv_sample=[]
counter = 0
# Initialize the SGD optimizer
grad, sign_grad = scaled_gradient(x, y, predictions)
for q in range(200):
if counter % 50 == 0 and counter > 0:
print("Attack on samples" + str(counter))
X_new_group=np.copy(teX[counter])
gradient_value, signed_grad = sess.run([grad, sign_grad], feed_dict={x: X_new_group.reshape(-1,1000),
y: teY[counter].reshape(-1,4),
keras.backend.learning_phase(): 0})
saliency_mat = np.abs(gradient_value)
saliency_mat = (saliency_mat > np.percentile(np.abs(gradient_value), [gamma])).astype(int)
X_new_group = X_new_group + np.multiply(eps * signed_grad, saliency_mat)
adv_sample.append(X_new_group)
'''print("Ground truth", teY[counter])
print(model.predict(teX[counter].reshape(-1, 1000)))
print(model.predict(X_new_group.reshape(-1,1000)))
plt.plot(teX[counter])
plt.show()
plt.plot(X_new_group.reshape(-1,1),'r')
plt.show()'''
counter+=1
adv_sample=np.array(adv_sample, dtype=float).reshape(-1,1000)
score=model.evaluate(adv_sample, teY, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
teY_pred=np.argmax(model.predict(teX, batch_size=32), axis=1)
adv_pred=np.argmax(model.predict(adv_sample, batch_size=32), axis=1)
'''with open('test_true.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(np.argmax(teY, axis=1).reshape(-1, 1))
with open('test_pred.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(teY_pred.reshape(-1, 1))
with open('test_adversary.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(adv_pred.reshape(-1, 1))'''

View File

@ -0,0 +1,90 @@
import numpy as np
import csv
import matplotlib.pyplot as plt
def voltage_sag(signal, level):
x=np.linspace(4*np.pi, 10*np.pi, 300)
y=level*np.sin(x)
signal[200:500]=y
return signal
def voltage_distortion(signal, level):
noise=np.random.normal(loc=0, scale=level, size=np.shape(signal))
signal+=noise
#plt.plot(signal)
#plt.show()
return signal
def voltage_impulse(signal, level):
noise=np.random.normal(loc=0, scale=level, size=np.shape(signal))
signal[400:420]+=noise[400:420]
#plt.plot(signal)
#plt.show()
return signal
if __name__ == '__main__':
x = np.linspace(0 * np.pi, 20* np.pi, 1000)
y = np.sin(x)
signal=y
signal_all=[]
'''for i in range(200):
levels=np.random.uniform(0.5, 0.9)
#print(levels)
signals=voltage_sag(signal, level=levels)
#plt.plot(signals)
#plt.show()
signal_all.append(np.copy(signals))
signal_all=np.array(signal_all, dtype=float).reshape(-1,1000)
with open('sag.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(signal_all)'''
'''signal_all=[]
for i in range(200):
levels=np.random.uniform(0.0, 0.1)
#print(levels)
signals=voltage_distortion(signal, level=levels)
#plt.plot(signals)
#plt.show()
signal_all.append(np.copy(signals))
signal_all=np.array(signal_all, dtype=float).reshape(-1,1000)
with open('distortion.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(signal_all)'''
'''signal_all=[]
for i in range(200):
levels=np.random.uniform(0.5, 0.8)
#print(levels)
signals=voltage_impulse(signal, level=levels)
#plt.plot(signals)
#plt.show()
signal_all.append(np.copy(signals))
signal_all=np.array(signal_all, dtype=float).reshape(-1,1000)
with open('impulse.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(signal_all)'''
signal_all = []
for i in range(200):
levels = np.random.uniform(0.0, 0.01)
# print(levels)
signals = voltage_distortion(signal, level=levels)
# plt.plot(signals)
# plt.show()
signal_all.append(np.copy(signals))
signal_all = np.array(signal_all, dtype=float).reshape(-1, 1000)
with open('normal.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(signal_all)

View File

@ -0,0 +1,244 @@
import tensorflow as tf
import keras
from keras.optimizers import SGD
from Neural_Net_Module import rnn_model
import csv
import matplotlib.pyplot as plt
from utils import *
lr = 0.01
batch_size = 200
nb_epoch = 10
controllable_dim = 16
seq_length = 10
TEMP_MAX = 24
TEMP_MIN = 19
eps=0.03
gamma=90
def scaled_gradient(x, predictions, target):
loss = tf.square(predictions - target)
# Take gradient with respect to x_{T}, since it contains all the x value needs to be updated
grad, = tf.gradients(loss, x)
signed_grad = tf.sign(grad)
# Define the gradient of log barrier function on constraints
#grad_comfort_high = 1 / ((tref_high - tset))
#grad_comfort_low = 1 / ((tset - tref_low))
#grad_contrained = grad[:, :, 0:16] + 0.000000001 * (grad_comfort_high + grad_comfort_low)
return grad, signed_grad
if __name__ == '__main__':
if keras.backend.image_dim_ordering() != 'tf':
keras.backend.set_image_dim_ordering('tf')
sess = tf.Session()
keras.backend.set_session(sess)
with open('building_data.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
rows = rows[1:43264]
print("Dataset shape", shape(rows))
rows = np.array(rows[1:], dtype=float)
feature_dim = rows.shape[1]
print("Feature dimension", feature_dim)
# Normalize the feature and response
max_value = np.max(rows, axis=0)
print("Max power values: ", max_value)
min_value = np.min(rows, axis=0)
rows2 = (rows - min_value) / (max_value - min_value)
# Reorganize to the RNN-like sequence
X_train, Y_train = reorganize(rows2[:, 0:feature_dim - 1], rows2[:, feature_dim - 1], seq_length=seq_length)
print("Training data shape", shape(X_train))
print("X_train None:", np.argwhere(np.isnan(X_train)))
X_train = np.array(X_train, dtype=float)
Y_train = np.array(Y_train, dtype=float)
# Test data: change here for real testing data
Y_test = np.copy(Y_train[3500:])
X_test = np.copy(X_train[3500:])
X_train = X_train[:35000]
Y_train = Y_train[:35000]
print('Number of testing samples', Y_test.shape[0])
print('Number of training samples', Y_train.shape[0])
# Define tensor
x = tf.placeholder(tf.float32, shape=(None, seq_length, feature_dim - 1))
y = tf.placeholder(tf.float32, shape=(None, 1))
tset = tf.placeholder(tf.float32, shape=(None, seq_length, controllable_dim))
target = tf.placeholder(tf.float32, shape=(None, 1))
# Define the tempture setpoint upper and lower bound
temp_low = TEMP_MIN * np.ones((1, controllable_dim)) # temp setpoint lowest as 20
temp_low = (temp_low - min_value[0:controllable_dim]) / (max_value[0:controllable_dim] - min_value[0:controllable_dim])
temp_high = TEMP_MAX * np.ones((1, controllable_dim)) # temp setpoint highest as 25
temp_high = (temp_high - min_value[0:controllable_dim]) / (max_value[0:controllable_dim] - min_value[0:controllable_dim])
# Define the RNN model, establish the graph and SGD solver
model = rnn_model(seq_length=seq_length, input_dim=feature_dim - 1)
predictions = model(x)
sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
# Fit the RNN model with training data and save the model weight
model.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch, shuffle=True) # validation_split=0.1
# model.save_weights('rnn_clean.h5')
model.load_weights('rnn_clean.h5')
y_value = model.predict(X_test[0:5000], batch_size=32)
# Record the prediction result
with open('predicted_rnn2.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(y_value)
with open('truth_rnn2.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(Y_test[0:5000])
# Plot the prediction result. This is the same as Building_Load_Forecasting.py
t = np.arange(0, 2016)
plt.plot(t, Y_test[216:216 + 2016], 'r--', label="True")
plt.plot(t, y_value[216:216 + 2016], 'b', label="predicted")
plt.legend(loc='northeast')
ax = plt.gca() # grab the current axis
ax.set_xticks(144 * np.arange(0, 14)) # choose which x locations to have ticks
ax.set_xticklabels(["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat",
"Sun"]) # set the labels to display at those ticks
plt.title("Building electricity consumption")
plt.show()
print("Clean training completed!")
print("Training percentage error:", np.mean(np.divide(abs(y_value - Y_train[0:5000]), Y_train[0:5000])))
#model.save_weights('rnn_clean.h5')
# Optimization step starts here!
X_new = []
grad_new = []
mpc_scope = seq_length
X_train2 = np.copy(X_test)
with sess.as_default():
counter = 0
# Initialize the SGD optimizer
grad, sign_grad = scaled_gradient(x, predictions, target)
for q in range(1000 - seq_length):
if counter % 100 == 0 and counter > 0:
print("Optimization Time step" + str(counter))
# Define the control output target
#Y_target = (0 * Y_test[counter:counter + mpc_scope]).reshape(-1, 1)
Y_target = Y_test[counter:counter + mpc_scope].reshape(-1, 1)
# upper and lower bound for controllable features
X_upper_bound = np.tile(temp_high, (mpc_scope, seq_length, 1))
X_lower_bound = np.tile(temp_low, (mpc_scope, seq_length, 1))
# Define input: x_t, x_{t+1},...,x_{t+pred_scope}
X_input = X_train2[counter:counter + mpc_scope]
#X_input = check_control_constraint(X_input, controllable_dim, X_upper_bound, X_lower_bound)
X_controllable = X_input[:, :, 0:controllable_dim]
# the uncontrollable part needs to be replaced by prediction later!!!
X_uncontrollable = X_input[:, :, controllable_dim:feature_dim - 1]
X_new_group = X_input
#print("X_new_group shape", shape(X_new_group))
gradient_value, signed_grad = sess.run([grad, sign_grad], feed_dict={x: X_new_group,
target: Y_target,
tset: X_controllable,
keras.backend.learning_phase(): 0})
#print("sign_grad", signed_grad)
#print(np.shape(signed_grad))
saliency_mat=np.abs(gradient_value)
saliency_mat=(saliency_mat>np.percentile(np.abs(gradient_value),[gamma])).astype(int)
random_num=np.random.randint(0,2)
if random_num==0:
X_new_group = X_new_group + np.multiply(eps * signed_grad, saliency_mat)
#X_new_group = X_new_group + eps * signed_grad
else:
X_new_group = X_new_group - np.multiply(eps * signed_grad, saliency_mat)
#X_new_group = X_new_group - eps * signed_grad
# check the norm constraints on input
#X_new_group = check_control_constraint(X_new_group, controllable_dim, X_upper_bound, X_lower_bound)
y_new_group = model.predict(X_new_group)
if X_new == []:
X_new = X_new_group[0].reshape([1, seq_length, feature_dim - 1])
grad_new = gradient_value[0]
else:
X_new = np.concatenate((X_new, X_new_group[0].reshape([1, seq_length, feature_dim - 1])), axis=0)
grad_new = np.concatenate((grad_new, gradient_value[0]), axis=0)
# Update the x value in the training data
X_train2[counter] = X_new_group[0].reshape([1, seq_length, feature_dim - 1])
for i in range(1, seq_length):
X_train2[counter + i, 0:seq_length - i, :] = X_train2[counter, i:seq_length, :]
# Next time step
counter += 1
X_new = np.array(X_new, dtype=float)
print("Adversarial X shape", shape(X_new))
dime=55
y_new = model.predict(X_new, batch_size=64)* (max_value[dime] - min_value[dime]) + min_value[dime]
y_val=model.predict(X_test[:1000], batch_size=32)* (max_value[dime] - min_value[dime]) + min_value[dime]
y_orig=Y_test[:1000]* (max_value[dime] - min_value[dime]) + min_value[dime]
plt.plot(y_new,'r')
plt.plot(y_val, 'g')
plt.plot(y_orig,'b')
plt.show()
print("Adversary Forecast Error:", np.mean(np.clip(np.abs(y_new - y_orig[:990])/y_orig[:990], 0,3)))
with open('test_true.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(y_orig.reshape(-1,1))
with open('test_pred.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(y_val.reshape(-1,1))
with open('test_adversary.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(y_new.reshape(-1,1))
#Observe the difference on input features and visualize
deviation_all=0
'''for dime in range(30):
X_temp = rows[0:len(X_new), dime]
X_temp_new = X_new[0:len(X_new), 0, dime] * (max_value[dime] - min_value[dime]) + min_value[dime]
deviation=np.mean(np.abs(X_temp_new-X_temp)/(X_temp+0.0001))
print(deviation)
deviation_all+=deviation
plt.plot(X_temp, 'r--', label="previous")
plt.plot(X_temp_new, 'b', label="adversarial")
plt.show()
print("The overall input features deviation: ", deviation_all/30.0)'''
dime=26
X_temp = rows[0:len(X_new), dime].reshape(-1,1)
X_temp_new = (X_new[0:len(X_new), 0, dime] * (max_value[dime] - min_value[dime]) + min_value[dime]).reshape(-1,1)
plt.plot(X_temp, 'r--', label="previous")
plt.plot(X_temp_new, 'b', label="adversarial")
plt.show()
with open('fea10_orig.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(X_temp)
with open('fea10_adv.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(X_temp_new)

View File

@ -0,0 +1,37 @@
#build the NN models: RNN module
import tensorflow
from tensorflow.python.ops import control_flow_ops
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import LSTM, Embedding,SimpleRNN
from keras.utils import np_utils
from tensorflow.python.platform import flags
from numpy import shape
import numpy as np
from skimage import io, color, exposure, transform
import os
import glob
import h5py
import pandas as pd
import numpy
FLAGS = flags.FLAGS
#tensorflow.python.control_flow_ops =control_flow_ops
def rnn_model(seq_length, input_dim):
model = Sequential()
model.add((SimpleRNN(64, input_shape=(seq_length, input_dim))))
model.add(Dropout(0.2))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(1,init='normal'))
return model

View File

@ -0,0 +1,20 @@
# Power_adversary
The code repo for Is Machine Learning in Power Systems Vulnerable?
Paper accepted to SmartGridComm2018, Workshop on AI in Energy Systems.
Authors: Yize Chen, Yushi Tan and Deepjyoti Deka
University of Washington and Los Alamos National Laboratory.
## Introduction
We look into the vulnerabilities of ML algorithms in power systems, and craft specific attacks on power systems with different applications.
## Usage
To exploit the algorithmic vulnerabilities, we consider the classification and forecasting case in power systems. Directly run the Python files and test the model accuracy before and after the attack.
Contact: yizechen@uw.edu

View File

@ -0,0 +1,38 @@
# Defense strategies from literature
## Manipulating ML: poisoning attacks and countermeasure for regression learning
### System model
The function is chosen to minimize a quadratic loss function:
$
\mathcal{L}(\mathcal{D}_{tr}, \mathsf{\theta}) = \frac{1}{n}\sum_{i = 1}^{n} (f(\mathsf{x}_i, \mathsf{\theta}) - y_i)^2 + \lambda \Omega(\mathsf{w})
$
### Adversarial modeling
The goal is to corrupt the learning model generated in the training phase so that the predictions on new data will be modified in the testing phase. Two setup are considered, *white-box* and *black-box* attacks. In *black-box* attacks, the attackers has no knowledge of the training set $\mathcal{D}_{tr}$ but can collect a substitute data set $\mathcal{D}_{tr}^{\prime}$. The feature set and the learning algorithm are know, while the training parameters are not.
The *white-box* attack eventually could be modeled as
$$
\arg \max_{\mathcal{D}_p} \;\, \mathcal{W}(\mathcal{D}^{\prime}, \mathsf{\theta}_p^{\ast}) \\
\;\,\;\,\;\,\;\, s.t. \;\, \mathsf{\theta}_p^{\ast} \in \arg \min_{\mathsf{\theta}} \mathcal{L(\mathcal{D}_{tr} \cup \mathcal{D}_{p}, \mathsf{\theta})}
$$
In the *black-box* setting, the poisoned regression parameters $\mathsf{\theta}_{p}^{\ast}$ are estimated using the substitute data.
### Attack Methods
### Comments
1. The attack model is kind of different if I understand correctly. The aim is to come up with an additional data set so that the “optimized” parameter would fail on any intact data set.
2. And the set up is like the breakdown point, while the major difference is the evaluation. Breakdown point evaluates the parameter, but this setup evaluates the performance on “test” set.
3. This setup intrinsically attacks the fitting strategy, rather than a specific model.
4. And it uses the bi-level stackelberg game.
5. The defense strategy is still, more or less, the conventional trimmed loss.
## The space of transferable adversarial examples

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,28 @@
from numpy import shape
import numpy as np
def reorganize(X_train, Y_train, seq_length):
# Organize the input and output to feed into RNN model
x_data = []
for i in range(len(X_train) - seq_length):
x_new = X_train[i:i + seq_length]
x_data.append(x_new)
# Y_train
y_data = Y_train[seq_length:]
y_data = y_data.reshape((-1, 1))
return x_data, y_data
def check_control_constraint(X, dim, uppper_bound, lower_bound):
for i in range(0, shape(X)[0]):
for j in range(0, shape(X)[0]):
for k in range(0, dim):
if X[i, j, k] >= uppper_bound[i, j, k]:
X[i, j, k] = uppper_bound[i, j, k] - 0.01
if X[i, j, k] <= lower_bound[i, j, k]:
X[i, j, k] = lower_bound[i, j, k] + 0.01
return X