AI Infrastructure & MLOps Engineer | Cloud & GPU/TPU Specialist

AI Infrastructure & MLOps Engineer | Cloud & GPU/TPU Specialist
نوع العمل : عمل كلى
الخبرة : 0-1 سنة
الراتب : not speific
المكان : oman

About the job

You will work closely with ML engineers, data scientists, and DevOps teams to ensure scalable, high-performance, and reliable AI pipelines.


Responsibilities

  • Design, deploy, and maintain GPU/TPU clusters, high-performance computing systems, and cloud infrastructure for AI workloads.
  • Build and optimise data pipelines to support training and inference for large AI models.
  • Collaborate with ML engineers to deploy models efficiently at scale.
  • Monitor and troubleshoot infrastructure performance, availability, and security.
  • Automate workflows and infrastructure using CI/CD and Infrastructure-as-Code tools.
  • Evaluate and integrate emerging AI infrastructure technologies and frameworks.
  • Ensure cost-efficient, reliable, and scalable AI operations across production and research environments.
  • Maintain documentation for systems, workflows, and best practices.


Requirements

  • Strong experience with cloud platforms (AWS, GCP, Azure) and GPU/TPU-based computing.
  • Proficiency in scripting languages (Python, Bash, etc.) for automation.
  • Experience with containerisation and orchestration (Docker, Kubernetes).
  • Familiarity with MLOps and AI deployment pipelines.
  • Understanding of distributed training and parallel computing.
  • Knowledge of storage solutions and high-performance networking for AI workloads.
  • Problem-solving mindset and ability to work in a fast-paced, collaborative environmen