Cloud/DevOps Engineering, Proficient in scripting (e.g., Python, Bash, or PowerShell); Go/Rust is a plus, Strong expertise in Terraform, Terragrunt, Helm, Kubernetes, and Docker
About Company
Groundup.ai is a Singapore-based AI startup that helps companies to reduce unplanned downtime of industrial assets without needing a huge learning curve and high-risk deployments on the ground.
Job Description
- Architect and manage scalable, secure infrastructure on GCP, Azure, and occasionally OCI/AWS.
- Implement and manage Infrastructure as Code (IaC) primarily using Terraform and occasionally with Terragrunt, and Helm.
- Design and optimize CI/CD workflows using GitHub Actions, Jenkins, and GitHub Enterprise (reusable workflows, OIDC federation).
- Ensure seamless deployment pipelines from code commit to production for microservices and AI workloads.
- Manage Docker containers using tools such as Portainer, Docker Image.
- Support canary releases, blue-green deployments, and auto-scaling strategies.
- Implement and manage serverless deployments on Google Cloud Platform (Cloud Functions, Cloud Run).
Resource Planning & Hardware Estimation
- Assist in hardware estimation for both on-premise and cloud environments, based on resource requirements such as the number of sensors and storage needs.
- Ensure robust backup strategies and data redundancy for all infrastructure.
- Assist the team in auditing the on-cloud and on-premises resources.
Security & Compliance
- Enforce cloud security best practices: image hardening, secret management, IAM least privilege, SBOMs, and vulnerability scanning.
- Collaborate on compliance requirements (SOC 2, ISO 27001), and respond to audits and incidents proactively.
- Configure and manage Cloudflare for enhanced security and performance.
- Build and maintain observability stacks using Grafana, Prometheus, Loki, Tempo, Datadog, OpenTelemetry, and Sentry.
- Diagnose and resolve performance bottlenecks across compute, storage, and networking layers.
- Monitor and optimize cloud spending to ensure cost-efficiency.
- Develop and implement disaster recovery plans, conducting regular drills to ensure business continuity.
- Partner with engineers to embed DevOps best practices.
- Establish and enforce documentation standards for infrastructure, processes, and troubleshooting guides.
- Use Plane for sprint planning, incident tracking, and delivery visibility.
#J-18808-Ljbffr