I'm a DevOps Engineer, skilled DevOps Engineer with over 3+ years of hands-on experience in designing, building, maintaining, administrating managing large-scale Kubernetes clusters and performance optimizing, cost reduction highly scalable and reliable systems, as well as managing Data Centers across Linux-based environments, networking and services. I specialize in supporting, automating and optimizing mission-critical deployments in cloud platforms like AWS and Azure, leveraging tools for configuration management, CI/CD pipelines, and DevOps best practices. Outside of work, I dedicate my free time to learning new technologies and contributing to the open-source community, driven by my deep passion for Linux and innovation. Seeking a challenging position where I can leverage my expertise in automation, monitoring, incident response, and infrastructure management to ensure the availability, performance, and efficiency of critical applications and services.
- Collaborated with SWE, QC and PO teams to automate and accelerate the processes of build, test, release and deployment of applications into a runtime environment quickly and reliably, improve application performance and reliability through performance tuning, load testing, code optimization.
- Integrated security best practices into CI/CD pipelines by establishing a DevSecOps culture. Automated security scanning and compliance validation with tools like Snyk and Soblew, achieving a significant reduction in vulnerabilities and meeting all regulatory audit requirements..
- Designed and implemented a GitOps-based continuous deployment pipeline leveraging GHA, Jenkins for CI and ArgoCD for CD achieving a acceleration in release cycles and a improvement in code quality through a significant reduction in production bugs, reducing configuration drift and decreasing rollback incidents by across 8+ Kubernetes clusters, ensuring greater stability and reliability in production environments..
- Collaborated with DE and DA teams setup DataOps to accelerate the continuous integration and continuous deployment of Data Engineering solutions using Airflow, Spark, PostgreSQL, ClickHouse, PowerBI. Collaborate with Data Engineers and Data Analyts to setup automated monitoring solution using Prometheus, Grafana to continuously monitor reliability, availability and performance of Data pipelines components and processes. Able to detect incidents in real-time, so that incidents can be resolved in near real-time. Reducing deployment time and increasing release frequency..
- Designed and implemented a comprehensive metrics and centralized observability platform monitoring and alerting solutions using Prometheus, Grafana, Percona, OpenTelemetry for real-time monitoring of application services. Achieved a reduction in incident response time by implementing proactive alerting mechanisms, leveraging ELK Stack, PLG Stack for enhanced logging and analysis. Utilizing Sentry, Elasticsearch APM for improved error exception tracking, APM tracing, log aggregation, and incident diagnostics..
- Designed and implemented multi-node Kubernetes clusters with autoscaling and intelligent resource management using Karpenter and KEDA. Achieved optimized container resource allocation and enhanced system performance while ensuring high availability, fault tolerance, and seamless operation for mission-critical applications..
- Automated and migrated Jenkins pipelines and manifest template deployments for containerized applications into using GitHub Actions (GHA), Helm, and Kustomize, enabling faster and more reliable release cycles..
- Reduced AWS cloud infrastructure costs by 35% by leveraging monitoring, AWS Cost Explorer, Azure Cost Management, optimizing K8s cluster autoscaling mechanisms and leveraging spot instances across AWS and Azure, all while maintaining system stability and performance..
- Led incident response and troubleshooting efforts, ensuring timely resolution of critical incidents and minimizing downtime..
Collaborated with R&D teams to develop and maintain CI/CD pipelines using GitLabCI, GHA. Integrated automated building, testing, code scanning, release and deployment of applications into a runtime environment quickly and reliably, reducing build and deployment times. Orchestrated the migration of Dockerized services/web applications to Kubernetes (RKE)..
Designed and implemented a GitOps workflow for vCenter On-Premises infrastructure, leveraging GitLab CI/CD, Packer, Terraform and Ansible. Automated the provisioning, configuration, and management of virtual machines, reducing manual intervention and deployment times while ensuring consistency and scalability across the environment..
Collaborated with Security and Incident Response teams to implement and system maintain, system hardening, mitigation, hotfixes, patches, updates security controls and ensure compliance with industry standards..
Implemented a robust monitoring and alerting system to ensure system availability and reliability using GitLab CI/CD, Prometheus, and Grafana, resulting in decreased system downtime and faster issue resolution. Developed applications to proactively identify and address issues. Implemented centralized logging and log analysis using PLG Stack and ELK Stack, enhancing troubleshooting and monitoring capabilities.
Collaborated with cross-functional IT teams (Network, Support, Development, Security) to deploy, secure, and optimize IT systems across development, testing, and production environments. Designed, configured, and deployed tools such as VMware, GitLab, Jira to enhance team productivity and improve system reliability in an On-Premises infrastructure and GCP infrastructure (Compute Engine, Cloud Storage , Cloud SQL)..
Managed and maintained services/web applications using Docker containerization. Led the migration of services/web applications to Docker, ensuring seamless integration. In the next phase, Orchestrated the migration of legacy monolithic applications to a containerized architecture, leveraging Kubernetes (Kubeadm), enhancing scalability and operational efficiency..
Implemented infrastructure automation GitOps workflow for VMware On-Premises infrastructure using GitLabCI, Packer, Terraform and Ansible, RunDeck, reducing manual provisioning time and improving consistency across environments..
Collaborated with cross-functional teams to design and implement network, system, and storage infrastructures. Partnered with development teams to deploy and manage development, staging, and production environments using GitLab CI/CD. Successfully supported multiple frameworks, including LAMP, LEMP, PHPFox, and WordPress, ensuring seamless integration and optimal performance..
Building, managing, operating, monitoring and development, research, evaluation and selection of solutions for network systems and service infrastructure (Prometheus & Grafana, Loki, Zabbix). Implemented centralized logging and log analysis, network performance and analyze network traffic (ELK Stack) send alerts to Telegram and Jira, improving troubleshooting and monitoring capabilities, resulting in a decrease in system downtime and faster issue resolution..
Managed and maintained IT software infrastructure, including upgrades, mitigations, hotfixes, patches, and security controls for both On-Premises environments and AWS infrastructure (EC2, S3, RDS). Designed and implemented robust backup and disaster recovery policies to ensure the durability and availability of company systems..
Building, managing and operating systems Docker containerization
Automate application release process from dev to staging then production using gitlab CI
Maintain current production servers for all VoIP traffic
Respond to customer VoIP service requests and ensure appropriate resolution of the same
Perform troubleshooting on customer VoIP incidents
Support conguration and maintenance CallCenter
Responsible for Installed, configured, and maintained development, staging and production environments for over 400+ virtual machines on VMware vCenter within a Data Center environment. Managed and maintained a network infrastructure consisting of over 100 Cisco and Nexus devices, ensuring optimal performance, security and reliability. Responsible for configuring systems, installing, deploying and administering services on Linux-based operating systems and network devices, ensuring they meet organizational requirements and directives from superiors..
Designed and maintained a robust monitoring, alerting system using Zabbix, PRTG resulting in a decrease in network, system downtime and faster issue resolution. Monitor the IT infrastructure of the Company: Server system, backup system, config switch, config router, camera, firewall, internet connection, application, software system.
Collaborated with SWE teams to develop and maintain automated using Bash, Ansible release product to production environment. Develop and maintain the end-to-end CI/CD pipeline using Bash, Ansible reducing install, build and deployment time.
Experience with installing, configuring, setup, call routing, voicemail management, managing, support, troubleshooting tools and techniques for diagnosing and resolving network and voice quality issues. Engineered and deployed scalable VOIP solutions for over 3000 users within the organization. Implemented a VOIP system for a new call center, enabling the support for an additional 7000 concurrent calls. Migration of traditional telephony systems to a centralized VOIP platform. Design and maintain a robust monitoring VOIP system calls, alerting system using Prometheus, Grafana..
Database Administrator (MariaDB, MySQL): Managed and optimized standalone and clustered environments, implemented replication, automated backups, tuned performance, ensured high availability, and support query data visualization reports to leader.
Routing and switching
Linux system administrator
Windows system administrator
Project reviewer's report & diploma project: MPLS VPN Technology