الخبرة : 0-1 سنة
الراتب : not speific
المكان : oman
About the job
You will work closely with ML engineers, data scientists, and DevOps teams to ensure scalable, high-performance, and reliable AI pipelines.
Responsibilities
- Design, deploy, and maintain GPU/TPU clusters, high-performance computing systems, and cloud infrastructure for AI workloads.
- Build and optimise data pipelines to support training and inference for large AI models.
- Collaborate with ML engineers to deploy models efficiently at scale.
- Monitor and troubleshoot infrastructure performance, availability, and security.
- Automate workflows and infrastructure using CI/CD and Infrastructure-as-Code tools.
- Evaluate and integrate emerging AI infrastructure technologies and frameworks.
- Ensure cost-efficient, reliable, and scalable AI operations across production and research environments.
- Maintain documentation for systems, workflows, and best practices.
Requirements
- Strong experience with cloud platforms (AWS, GCP, Azure) and GPU/TPU-based computing.
- Proficiency in scripting languages (Python, Bash, etc.) for automation.
- Experience with containerisation and orchestration (Docker, Kubernetes).
- Familiarity with MLOps and AI deployment pipelines.
- Understanding of distributed training and parallel computing.
- Knowledge of storage solutions and high-performance networking for AI workloads.
- Problem-solving mindset and ability to work in a fast-paced, collaborative environmen