Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition

Backdoor attacks against supervised machine learning methods seek to modify the training samples in such a way that, at inference time, the presence of a specific pattern (trigger) in the input data causes misclassifications to a target class chosen by the adversary. Successful backdoor attacks have been presented in particular for face recognition systems based on deep neural networks (DNNs). These attacks were evaluated for identical triggers at training and inference time. However, the vulnerability to backdoor attacks in practice crucially depends on the sensitivity of the backdoored classifier to approximate trigger inputs. To assess this, we study the response of a backdoored DNN for face recognition to trigger signals that have been transformed with typical image processing operators of varying strength. Results for different kinds of geometric and color transformations suggest that in particular geometric misplacements and partial occlusions of the trigger limit the effectiveness of the backdoor attacks considered. Moreover, our analysis reveals that the spatial interaction of the trigger with the subject’s face affects the success of the attack. Experiments with physical triggers inserted in live acquisitions validate the observed response of the DNN when triggers are inserted digitally.


Introduction
The field of machine learning has experienced tremendous developments in the recent years. Inexpensive compute power in data-parallel architectures and the availability of large labeled datasets have spurred a race for increasingly advanced models, which are able to capture ever more complex structures of the underlying distribution [1]. Originating from computer vision, the use of deep neural networks (DNN) has led to unprecedented performance in many automatic learning tasks [2]. As a result, DNNs will likely become a key element in security decisions, such as in identification, authentication, and intrusion detection.
However, many machine learning methods are vulnerable to attacks that can compromise their performance *Correspondence: cecilia.pasquini@unitn.it 1 Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 9, 38123 Trento, Italy Full list of author information is available at the end of the article in adversarial scenarios [3], where a malicious user can modify the data used for training or at inference time. Pioneering works in machine learning security [4,5] have proposed a taxonomy of possible attacks, categorizing them by the domain of influence, the knowledge available to the adversary, and the protection goals violated. Subsequent explorations of this attack space have confirmed vulnerabilities to machine learning in general and, more recently, deep learning in particular [6,7].
Many studies focus on explorative (or evasion) attacks (i.e., adversarial examples), where the adversary acts at inference time by creating strategically modified inputs with the goal of causing misclassification. In causative (or poisoning) attacks, the adversary instead manipulates the training samples strategically in order to affect the performance of the classifier at inference time [8]. While less studied than explorative attacks, causative attacks can be powerful and hard to detect [9].
Pasquini and Böhme EURASIP Journal on Information Security (2020) 2020: 12 Page 2 of 15 In this work, we focus on backdoor attacks, a class of causative attacks where the model is trained to output a certain target class when a specific pattern (called trigger) is present in the input sample. Backdoor attacks pose a significant security risk to several application domains of neural networks. A particularly relevant case is face recognition. Recent works have proposed training strategies that lead a classifier to assign a specific identity whenever the trigger is present, regardless of whose face is depicted. The literature reports an impressive effectiveness of this attack [10]. As demonstrated in [11], this paves the way to attacks in the physical world, where an attacker could fool camera-based face recognition systems by exhibiting a trigger-like object in front of the camera.
The backdoored models proposed in the literature are typically designed and tested scenarios where the malicious input data presented to the model at inference time carries exactly the same trigger used for training. This carries the implied assumption that the attacker has full control over the input data. In practice, however, inputs can be pre-processed images or probes acquired in realtime from a physical sensor. This distortion of the trigger information, for instance due to geometric displacement or varying illumination conditions, is often beyond the attacker's control. While these factors have been investigated in the context of adversarial examples [12], we are not aware of similar work for backdoor attacks.
To close this gap, the present work studies the sensitivity of a backdoored model to triggers that have been transformed with typical signal processing operators. We choose a recently proposed model for the domain of face recognition [10] and consider a post-training scenario, where the classifier is given a modified trigger with respect to the one it has been trained to recognize. We apply different kinds of transformations, including geometric transformations, occlusion, and different image compression pipelines. In each case, we analyze the classification outputs as a function of the controlled strength of the transformation. This allows us to empirically determine critical thresholds for the attack's effectiveness. Physical triggers have also been realized and their effectiveness has been tested in several settings. As a first step towards exploring the causes for the observed effects, we also study the interaction of the trigger with the face information contained in the image. This sensitivity analysis allows us to gain insights on the visual properties of the trigger that are most relevant for the infected model. Moreover, it provides an intuition on the feasibility and risk of backdoor attacks in the physical domain.
The rest of this paper is structured as follows: Section 2 introduces backdoor attacks to neural networks, reviews prior work on the topic, and positions our work within the literature. Section 3 describes the method adopted in our analysis and the experimental setting considered. Section 4 reports the experimental results on the studied face recognition model. In Section 5 documents the validation experiments and their results. Section 6 concludes with a discussion.

Background and prior work
We consider a multi-class classification problem where an input sample x ∈ R N is assigned to one of the K classes in {c 1 , . . . , c K }. This is achieved by a neural network model F : R N → R K , with parameters induced during a training phase. F(·) takes as input a sample x and provides a K-dimensional output vector whose kth element is interpreted as the probability of x belonging to class c k . The decision on x is then taken by assigning the class with the highest probability, i.e., d(x) = arg max c∈{c 1 ,..., Backdoor attacks belong to the class of causative attacks: the attacker influences the training phase with the goal of causing a specific behavior of the model at inference time. Different threat models include the case where the attacker has full access to the training set and can train the network from scratch or the case where the attacker can only retrain a pre-trained model (transfer-learning scenario).
In this context, the peculiarity of backdoor attacks with respect to conventional causative attacks is that the adversarial effect should take place if the input sample contains a specific pattern, called trigger, while the model should behave normally when no trigger is present. We can formalize this by defining an embedding function (x, s) that inserts a trigger s into an input sample x, resulting in a new input sample in the same space as x. Then, the attacker wants to achieve that the model F(·) misclassifies any sample (x, s). The most relevant case is a targeted backdoor with target class c t . Here, the attacker's goal is to enforce that The input x, the trigger s, and the embedding function are defined according to the experimental scenario and kind of data. We will provide these specifications for our analysis in Section 3. Moreover, in the following, we will use the expression s-infected for both backdoored models trained to react to the trigger s and input samples that "contain" the trigger s (i.e., are output of the embedding function ). By contrast, we will refer to clean models and inputs when training and testing are performed in non-adversarial conditions. Figure 1 illustrates backdoor attacks.

Known backdoor attacks and defenses
Backdoor attacks against neural networks have been first proposed in [13] and extended in [14]. The attacker initially chooses the target class and the trigger. In the case of images, the trigger can be an arbitrary set of pixel locations and values. Poisoning is carried out by acting on a random subset of the training set, where the selected trigger is embedded into the samples and the corresponding label is set to the target class. By assumption, the attacker has full control over the training procedure (including dataset, loss function, learning rate, fraction of modified data) and can adjust it to achieve her goals. Under these conditions, the approach yields attack success rates of more than 99% on the popular MNIST dataset [15], while preserving good performance on data not exhibiting the trigger.
More recently, the authors of [10] addressed a more challenging transfer learning scenario where the attacker inserts a backdoor in a pre-trained model by retraining it with poisoned data. In order to compensate for the limited control over the training data, rather than using arbitrary triggers, they derive a suitable trigger from the pretrained model. By doing so, they construct triggers that achieve high success rates on models trained for different application domains (from face recognition to speech processing), while using few additional training samples. A transfer learning scenario is also addressed in [16], where the authors propose a procedure to inject backdoors that can be transferred from a "teacher" model to a "student" model. Moreover, recent approaches focused on improving the stealthiness of the trigger [17], which reduces the visual detectability of backdoor attacks on images, as well as the possibility to inject attacks without imposing poisoned labels [18]. Backdoor attacks have also been investigated for video signals [19].
Several defenses against backdoor attacks have been proposed [20,21]. The approach in [22] analyzes the internal activations of the network in order to find anomalies. A similar idea is investigated in [23]. Neurons that are supposedly less useful for classification are discarded at inference time. In [24], the network is reverse-engineered with an algorithm that estimates a candidate trigger injected in the model. While all of these defense strategies require white-box access to the neural network model, the approach in [25] relies on black-box queries to the model under investigation. The responses are statistically analyzed to detect whether the model is backdoored and whether a specific input sample contains a trigger.

Relation to prior work
Operating in a post-training scenario and in a black-box setting, our sensitivity analysis is somewhat comparable to the defense approach in [25]. However, the cited work studies multi-class problems in image recognition (MNIST and CIFAR10, 10 classes in each case), while we address a face recognition problem involving larger images and a higher number of classes (more than 2000), as detailed in Section 3.1. Also, the authors of [26] analyze responses of backdoored face recognition models, but with a different scope. They aim at finding defense strategies, while not evaluating the impact of possible transformations of the trigger signal. Our analysis is also related to the work in [12], which explores geometric transformations of an adversarial patch. However, this source is limited to evasion attacks and does not generalize to the causative attacks studied here.
More generally, our work contains conceptual similarities to studies in the field of digital watermarking, where the robustness of a watermark is evaluated with respect to distortions of the watermarked signal [27,28]. While this literature focusses on transformations of the entire input signal (i.e., after embedding), we process the trigger signal before embedding.

Method
Now, we describe our sensitivity analysis of the responses of backdoored neural networks with respect to variations in the trigger signal. We consider the domain of face recognition; thus, our input samples are images x ∈ R H·W ·C , where H, W, and C denote width, height, and the number of color channels, respectively. Using the notation introduced in Section 2, we study a pre-trained s-infected model F(·) targeted to a specific class c t . For each input is the classification confidence. The trigger s is also an image of the same size as x, but it has an additional opacity layer with values in [ 0, 1]; thus, s ∈ R H·W ·(C+1) . We introduce a family of transformation functions φ θ : R H·W ·(C+1) −→ R H·W ·(C+1) depending on a strength parameter θ ∈ used to transform the trigger before being embedded into an image.
The embedding function is defined as a linear blending operation with parameter λ ∈[ 0, 1] between the image x and the trigger s that also encompasses the opacity layer of s. By explicitly indicating the image indices as subscripts, the embedding function is given by: where the index C + 1 in the third dimension refers to the opacity layer.
Given a dataset of clean testing samples X, for each transformation, we create a version λ (X, φ θ (s)) of X containing φ θ (s)-infected samples. When φ θ (·) is the identity function, we are in the baseline case where the model is both trained and tested with s-infected images. Otherwise, a mismatch occurs and we measure its impact on the model performance.
For each dataset variant λ (X, φ θ (s)), we compute the following performance metrics: 1. ACC : accuracy (rate of φ θ (s)-infected samples assigned to the correct class); 2. ASR : attack success rate (rate of φ θ (s)-infected samples assigned to the target class c t ); 3. CM : average classification confidence over the images in λ (X, φ θ (s)) (regardless of the class assignment); 4. CCM : average classification confidence over the images in λ (X, φ θ (s)) assigned to the correct class; 5. CTM : average classification confidence over the images in λ (X, φ θ (s)) assigned to the target class c t .

Experimental setup
We use as F(·) the pre-trained models proposed in [10] and available at [29]. As mentioned in Section 2.1, in this approach to backdoor attacks, the adversary does not need access to the full training set. Instead, she finds a candidate trigger s by heuristic optimization and tunes an existing model by retraining it with additional s-infected samples. The authors of [10] placed a backdoor into the face recognition model proposed in [30], which is trained on the VGG Face Dataset [31] and outputs a probability vector with 2622 dimensions, one for each identity appearing in the training set. For our own experiments, we consider as X two datasets of face images used in [10], denoted as OR and EXT dataset. The OR dataset is composed of 2622 JPEG images depicting distinct faces corresponding to the identities in the training dataset [31]. For the OR dataset, we can compute the accuracy of s-infected models (ACC, CCM). The values should be comparable to the performance of the original model. The EXT dataset contains 1000 JPEG images of faces from subjects who do not appear in the training set. This dataset can only be used to measure the effectiveness of the attack (ASR, CTM).
Our embedding function in Eq. (1) corresponds to the one used in [10] for training and evaluation. We adopt the choice of λ = 0.7 for all our experiments.
Two infected models for face recognition are released at [29]. Both yield an accuracy of ∼ 0.75 on clean inputs of OR, which is 0.03 less than the clean original model. For our experiments, we select the stronger one in terms of ASR, i.e., the one based on a square-shaped trigger (see Fig. 2). It yields an ASR close to 0.9 on s-infected images under baseline conditions.

Transformations
The transformation families selected for our analysis are reported in Tables 1, 2, and 3. For the sake of clarity, we have grouped them in categories, namely geometric, occlusive, and color transformations. Each transformation is associated with an icon that will be used to annotate the experimental results.
For the rotation, resizing, and sharing operations, we always align the centers of the baseline squareshaped and transformed triggers in the resulting image before blending. In these cases, the trigger images also undergo resampling and interpolation processes, which introduce additional artifacts in the signal [32,33].
For contrast, sharpness adjustment, and median filtering, we employed the implementation provided in the Python Pillow library (v6.10). The "fading to grayscale" transformation linearly blends the trigger image with its grayscale version.  Figure 3 visualizes the process applied to create λ (x, φ θ (s)) for each clean image x. The pipeline adopted from [10] stores the test images (X, φ θ (s)) in JPEG format with default quality factor 75. However, this operation introduces further distortion and artifacts, as widely investigated in the field of digital image forensics [34,35]. In order to assess the effect of lossy post-compression on the performance of the backdoored classifier, we repeat our experiments without JPEG compression and decompression (i.e., skipping the red area in Fig. 3) and feeding the s-infected model directly with the output of the embedding function. Due to limitations of the data source, we cannot avoid that the clean images are pre-compressed with JPEG in all experiments.

Results
We report the main results for different trigger transformations in Section 4.1. Then, we move on to the exploration of causes and the validation in the physical domain. Section 4.2 reports the impact of the JPEG postcompression. Section 4.3 sheds light on the interaction of the trigger with image content, using a breakdown of Table 1 Geometric transformations the dataset by the amount of overlap between the default trigger and the depicted face. We summarize our observations as follows:

Trigger transformations
1. Monotonicity: for most of the transformations, the ASR decays monotonically when moving away from the baseline case, whereas the ACC increases towards the accuracy obtained by the model in case of clean inputs (0.75), as expected. Consistent trends are also observed for the CCM and CTM metrics. However, there are exceptions. For contrast enhancement, increasing opacity, and shifting the  trigger towards the centre of the image, the ASR increases after the application of φ θ (·). Interestingly, all shifting transformations are not monotonic around the baseline. They exhibit local maxima around 0 and at multiples of 8, which could be related to JPEG compression artifacts. Recall that the JPEG format cuts images into blocks of 8 × 8 pixels before quantizing in the frequency domain. A representation of the trigger alignment with respect to the JPEG grid is reported in Fig. 5. Sharpness adjustment stands out in having almost no impact on the attack. A speculative explanation is that sharpness is modified by linear filtering. The effect might be neutralized in the convolutional layers of the DNN. 2. Symmetry: some transformations are defined symmetrically around the baseline. In case of brightness adjustments, shearing, and rotation (although the last one is not entirely displayed in the plot), this symmetry also holds for the response of the metrics. Only resizing exhibits a notable asymmetry: downsizing is much more impactful than upsizing. 3. Slope: for occlusive transformations, we can compare the slope of the ASR decay under the same ratio of removed pixels θ. We observe that occluding the bottom-right part or the outer part of the square trigger is equivalent to randomly removing the same number of pixels. However, the upper-left and inner parts of the trigger have a much bigger impact when removed. This indicates that certain parts of the trigger signal are more relevant than others. The sensitivity of the upper-left part may also relate to the image content covered by the trigger. 4. Hitting the target: For each combination of transformation and parameter value tested, the ACC and ASR essentially sum up to 1. This means that almost every misclassification goes to the target class c t .
For space constraints, the results for the EXT dataset are reported in Appendix A for the geometric transformations, as they will be of interest in Section 5.1. The ASR is consistently higher in the EXT dataset. This holds already for the baseline case (where the ASR is close to 1.0) and leads to a generally slower decay, possibly caused by the absence of face information that are actually known to the classifier. Tests on all the transformations show that we can draw the same conclusions in terms of monotonicity, symmetry, slope, and hitting the target. Given this high level of consistency, we deem it justified to focus on the OR dataset for the following two subsections.

JPEG recompression
The JPEG post-compression after the embedding function, as depicted in Fig. 3, has a small but stable impact on the success rate of the backdoor attack. We measure this by subtracting the attack success rate obtained when skipping the post-compression phase depicted in Fig. 3 from the ASR computed in Section 4.1, thus obtaining the metric NPC (no post-compression). Table 4 reports the statistics of NPC . Different columns report the average, minimum, and maximum NPC observed over different parameters of the considered transformations, which are arranged row-wise. The values are concentrated in the interval [ − 0.01, 0.01] and consistently show that the attack is marginally more successful if the post-compression is omitted. This refutes hypotheses suggesting that the backdoor attack picks up the JPEG artifacts occurring specifically at the boundary between the trigger and the background, as the ASR at baseline conditions decreases by 0.68%.

Overlap with image content
Next, we seek to identify potential causes for the overall response of the s-infected model, also in the light of the observations made in Section 4.1. In particular, we study the impact of the location of the squared trigger area (which is fixed in all s-infected samples) with respect to the depicted face (which varies across images). We split the OR dataset into three exclusive categories representing the level of overlap between the squared trigger and the face, namely: We do this by running a face landmark detector 1 on the 2622 clean face images and counting how many and which landmarks fall into the squared area occupied by the trigger after embedding (in the baseline case). Figure 6 states the category definitions with respect to the landmark locations, reports the number of images in each category, and shows one illustrating example. Figure 7 reports the results for selected transformations of Section 4.1 broken down by the three categories. Observe that the curves for the TOUCH and OVERLAP categories consistently co-move, whereas the FREE curves are significantly distant. This shows that having no interaction with the face information has indeed improved the attack's effectiveness for this backdoored model. This gap exists under baseline conditions and is generally preserved when θ varies.
The analysis of the occlusive transformations (top right plots in Fig. 7) reveals that the gap is even more pronounced when θ = 1 (i.e, when the squared area is completely removed). This suggests that the infected model has problems in correctly classifying the faces in the FREE category even if they are clean. To investigate this further, we replicated the break-down by category on clean images only, using the clean model, i.e., the model without backdoor. Table 5 reports the performance (2020) 2020:12 Page 9 of 15

Fig. 6
Examples of the three content-based categories metrics for these tests. Observe that the FREE category (first column) stands out also in this case, as the classification accuracy for it is significantly lower than in the other two categories (second and third column). It seems that the clean model [30] that served as basis for the attack [10] has more general difficulties in the recognition of small faces. Specifically for samples on which the clean model was already not very accurate, the poisoning process biased the decision on misclassified samples towards the target class c t . A closer inspection of this bias is an interesting topic for future investigations, as it might inform better defenses by detecting the presence of a backdoor in DNNs.

Validation
This section presents the results of two experiments validating the analysis in Section 4. First, in order to evaluate whether the results obtained can be related to practical scenarios, we have conducted an analysis where the trigger is passed to the DNN through a real object captured in the physical domain (Section 5.1). Second, we have extended our sensitivity test to another known task in automatic face analysis, i.e., the recognition of the subject's age. This will allow us to observe differences and similarities with respect to the findings reported in Section 4.1, as detailed in Section 5.2.

Physical domain
The creation of adversarial attacks in the physical domain has been studied in [36][37][38]. The goal is to assess the capabilities of an attacker to compromise learning-based systems by performing specific operations in the real world. For the case of face recognition, such an analysis is particularly interesting, as face recognition systems typically  capture live probes of the subjects to be identified, which could show a known trigger signal in order to bypass the backdoored system. Inspired by these approaches, we have created trigger objects by printing out the image of the squared trigger with different resolutions and placing them on a rigid background. A holder has been mounted on the back of each printed trigger, so that a user could easily hold it in his/her hand (see Fig. 8a). For the sake of clarity, in the following, we will use the terms digital and physical trigger to indicate the digital trigger image and the trigger object, respectively.
A number of probe images have been acquired from volunteers, who positioned the physical trigger in front of a webcam window next to their face (see Fig. 8b). A subset of the geometric transformations has been selected for this analysis, namely rotation, resizing, as well as horizontal, vertical, and diagonal shifting. Figure 8c shows the interface designed to capture probe images. Users are in front of a webcam and can see their live acquisition. First, a clean reference image is captured. This serves as background to insert a digital trigger with the selected geometric transformations, as described in Section 3. Then, users are asked to hold the physical trigger and sequentially place it in the positions corresponding to the same geometric transformations as displayed in the webcam window. Whenever the position matches the red frame, a probe image is captured. Pairs of images containing digital and physical triggers are then passed to the φ θ (s)-infected model, allowing a pairwise comparison of the key outputs (class and confidence).
Acquisitions from 8 volunteers have been collected in different environments and heterogeneous illumination conditions. Since the identity of the volunteers are not known to the model, the situation corresponds to the case of the EXT dataset. Thus, we should refer to the results in Appendix A, and we cannot compute the classification accuracy ACC.
The results are summarized in Fig. 9. It is worth observing the similarity of the ASR when using the physical or digital trigger, respectively, under the same transformation using multiple strength parameters. This confirms that the capabilities of bypassing the backdoored model reported for the digital trigger generalize to the physical trigger. However, the confidence of the network's decision is in general lower when physical triggers are used, as it is shown by the plots of the CM loss in the third row. With this respect, we found noteworthy between-subject differences, as exemplified in the bottom two rows. Subject A exhibits a discrepancy in the decision confidence already in the baseline trigger position, which extends to the transformations. By contrast, the confidence values for subject B are aligned at the baseline position but diverge for intermediate transformation strengths onwards.

Age recognition task
We have considered the age recognition task addressed in [39], where the goal is to assign each face image to the correct age range, where 7 different age ranges are considered (0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, 60+). We used the backdoored model for this task that is available at [29], and it is poisoned to assign the class 0-2 if the trigger shown in Fig. 10 appears. The corresponding dataset is composed of 1000 images with size 224 × 224 × 4.
Note that this experimental setup shares similarities with the one considered in Section 4: the infected model has been re-trained with the same poisoning method proposed in [10] for the face recognition task; a square trigger is adopted also in this case, although the pattern used is different as it is determined depending on the original clean model. However, the architecture used for this task is quite different than the face recognition case. Moreover, the nature of the classification problem is also radically different, both in terms of number of classes and intrinsic difficulties due to the data variability. We performed the same analysis as in Section 4.1, so that we can assess potential similarities and differences in terms of sensitivity of the trigger perturbations. Results are reported in Fig. 11.
In terms of points 1 (monotonicity), 2 (symmetry), and 4 ("hitting the target"), we can get to very similar conclusions as in Section 4.1. One exception is given by the shifting operations that here behave much more symmetrically than for the face recognition case; thus, we do not find that moving the trigger in a more central position in the image is beneficial for attack success. Moreover, regarding point 3 (slope), the different occlusive transformations all have a very similar impact at the same ratio of pixels removed. This is related to a more general behavior observed in Fig. 11, i.e., that the ASR curves are less steep than what observed in Section 4.1. In fact, the ASR is already below 85% at the baseline trigger state (i.e., the best condition for the attack), but it never decreases below 40% even when the trigger is absent (see complete occlusions). This corroborates the observation made in Section 4.3, where we found that the bias of the infected model towards the target class in the absence of the trigger is particularly strong when the original model is not accurate on clean data. Here, the initial accuracy of the original model on clean data is only around 55% and, after poisoning, almost all the misclassifications assign the target class.

Conclusions
To the best of our knowledge, we have conducted the first sensitivity analysis of a selected backdoored neural network with the objective to evaluate the impact of processing operations applied to the trigger signal on the effectiveness of the attack. Figure 12 offers a visual summary of the critical values for the strengths of the transformations. We can observe that the attack is somewhat robust to trigger transformations since we did not find any visually imperceptible transformation that reduces the success rate to less than 0.5.
While these conclusions are certainly informative, the most constraining limitation of this work is the focus on powerful yet limited instances of backdoored models for face recognition. This limits the generalizability of our conclusions, as it happens for many studies in this emerging field with so many unknowns. However, when applying the same trigger transformations to a poisoned model dealing with a different classification problem, similar patterns in terms of monotonicity, symmetry, and classification error distribution have been observed, although with some specificities. Moreover, these findings are generally confirmed by experiments performed in the physical domain with trigger objects.
We see our results as a first valuable step, a benchmark for future work, and possibly as a source for inspiration to study novel approaches to defense. More specific contributions of this work include raising awareness in DNNbased security applications for the effect of the compression pipeline and image processing manipulations with low semantic impact, an issue that is rarely considered in computer vision and systems security, whereas highly studied in multimedia security [32,34,40,41]. A more systematic study of the interaction between image content and trigger signals relates to the more general question of how superimposed signals interact with networks' decisions, a topic of interest in deep network interpretability. Linking these streams of research could help to form a better understanding of the limits and potentialities of backdoor attacks against DNNs. Moreover, investigations should be carried out exploring how much the attacker can prevent the decrease of success rate by applying trigger transformations already during the poisoning phase.