HARNESSING AI: ENHANCING SENSITIVE INFORMATION DETECTION IN CI/CD PIPELINES FOR SECURE SOFTWARE DEVELOPMENT

Authors

  • Waseem Syed JNTU, Hyderabad, India. Author

DOI:

https://doi.org/10.34218/IJCET_16_01_214

Keywords:

Artificial Intelligence, Personally Identifiable Information, Data Security, Machine Learning, CI/CD

Abstract

The rapid integration of Continuous Integration/Continuous Deployment (CI/CD) pipelines has notably accelerated the life cycles of software development, promoting efficiency and continuous improvement. Despite these advancements, the automated nature of these systems introduces significant risks, particularly in the realm of Personally Identifiable Information (PII) leakage. Traditional methods such as manual code reviews and rule-based scanning are proving insufficient for detecting sensitive information promptly and accurately, thereby heightening the risk of data breaches, regulatory violations, and subsequent reputational harm. This article unveils an AI-driven framework designed to enhance PII and secret detection within CI/CD pipelines by integrating advanced technologies such as machine learning (ML) and natural language processing (NLP). Through a detailed comparative analysis, this study illustrates the efficacy of AI-powered PII detection over conventional methods, showcasing substantial improvements in precision, recall, and detection accuracy. This is of particular relevance given the substantial financial implications associated with data breaches, which averaged a significant cost for organizations in 2020. Encompassing case studies and tool comparisons, the narrative elucidates how AI-fueled approaches not only mitigate risks but substantially elevate the security protocols within software development workflows. Widespread and high-profile data breaches have underscored an urgent need for enhanced protective measures integrated seamlessly into modern software development paradigms.

References

IBM Security. (2024). Cost of a data breach report 2024. IBM. Retrieved from https://www.ibm.com/reports/data-breach

Ponemon Institute. (2016). Sixth annual benchmark study on privacy & security of healthcare data. Ponemon. Retrieved from https://www.ponemon.org/local/upload/file/Sixth%20Annual%20Patient%20Privacy%20%26%20Data%20Security%20Report%20FINAL%206.pdf

Verizon. (2024). Data breach investigations report (DBIR). Verizon. Retrieved from https://www.verizon.com/business/resources/reports/dbir/

Nightfall AI. (2021). Identifying and securing PII leakage. Nightfall AI. Retrieved from https://www.nightfall.ai/blog/identifying-and-securing-pii-leakage-in-2021

GitGuardian. (2024). The state of secrets sprawl report 2024. GitGuardian. Retrieved from https://cdn.prod.website-files.com/5ee9da909a44e856ddcbaa4f/65f052a86850193a113db344_The%20State%20of%20Secrets%20Sprawl%20report%202024%20by%20GitGuardian.pdf

Palmer, R. (2023). How To Automatically Detect PII for Real-Time Cyber Defense. Confluent. Retrieved from https://www.confluent.io/de-de/blog/real-time-pii-detection-via-ml/

Tovin, D. (2023). A Guide to Securing Secrets in CI/CD Pipelines. Legit Security. Retrieved from https://www.legitsecurity.com/blog/securing-secrets-in-cicd-pipelines

AWS Prescriptive Guidance. (2024). The CI/CD litmus test: Is your pipeline fully CI/CD? AWS. Retrieved from https://aws.amazon.com/prescriptive-guidance/

Barzel, B. (2023). Code’s Covert Threat: Unveiling Secrets and Personally Identifiable Information (PII). OX Security. Retrieved from https://www.oxsecurity.com/blog/code-covert-threat

Downloads

Published

2025-02-13

How to Cite

Waseem Syed. (2025). HARNESSING AI: ENHANCING SENSITIVE INFORMATION DETECTION IN CI/CD PIPELINES FOR SECURE SOFTWARE DEVELOPMENT. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY, 16(01), 3060-3072. https://doi.org/10.34218/IJCET_16_01_214