Adversarial Robustness in Deep Neural Networks
DOI:
https://doi.org/10.15662/IJRAI.2018.0103003Keywords:
Adversarial robustness, deep neural networks, adversarial examples, FGSM, adversarial training, defensive distillation, model vulnerabilityAbstract
Adversarial robustness—the capability of deep neural networks (DNNs) to resist intentionally crafted perturbations—has emerged as a critical concern in fields such as computer vision, autonomous systems, and cybersecurity. Early research uncovered that imperceptible perturbations to inputs, dubbed adversarial examples, can reliably mislead DNNs, even when such inputs appear unaltered to humans. This vulnerability stems primarily from the linear tendencies in high-dimensional models, as described by Goodfellow et al. in 2014. Subsequent studies achieved alarming misclassification rates with minimal perturbations, highlighting the need for effective defenses. Among early mitigation approaches, adversarial training—integrating adversarial examples into training data—offered practical improvements, while defensive distillation (Papernot et al., 2015) drastically reduced vulnerability by altering model gradients. This paper surveys these foundational works and others that dissect adversarial mechanics, assess limitations, and propose defenses. We explore key generation methods (e.g., FGSM), defense mechanisms (adversarial training, distillation), and theoretical analyses of vulnerabilities. Through distilled insights, we elucidate the fundamental tradeoff between accuracy and robustness, and highlight how early defenses shaped future advances. We conclude with reflections on structural challenges in achieving adversarially robust DNNs and propose areas for future investigation, such as robust optimization and robustness certification methods—emerging shortly after 2017.