ML Ops Engineer

Other Jobs To Apply

Job Description: ML Ops Engineer

Position Overview

The ML Ops Engineer will design and operate the production backbone for Southern Company’s AI Hub, ensuring AI and machine learning systems are deployed, monitored, and governed at scale. This role drives the enterprise-wide MLOps framework—establishing standards, lifecycle governance, and observability—while delivering secure, resilient production services and reusable AI products that accelerate innovation across operating companies. Success requires balancing rapid iteration with the reliability, safety, and compliance expected of a critical infrastructure enterprise.

Key Responsibilities

Operationalize AI and agentic systems. Build and maintain CI/CD pipelines for models, prompts, tools, and multi-agent workflows, enabling consistent promotion from experimentation to production.

Implement AI observability and reliability. Establish monitoring for agent behavior, model performance, drift, cost, and safety outcomes using logs, traces, metrics, and evaluators.

Enforce governance through automation. Embed guardrails, approvals, and policy-as-code into deployment pipelines, enabling compliant AI delivery without manual bottlenecks.

Manage model and agent lifecycle. Own versioning, rollout strategies (canary, shadow, rollback), and decommissioning for models, agents, and supporting tools.

Ensure platform resilience and scalability. Design runtime patterns that meet availability, latency, and fail-safe requirements, including degraded-mode and read-only behaviors for sensitive use cases.

Support multi-vendor and multi-cloud execution. Enable portable deployments across hyperscalers and model providers, minimizing lock-in while maintaining consistent operational controls.

Partner with engineering and data teams. Work closely with AI Architects, data engineers, and product squads to resolve production issues and continuously improve developer experience

Qualifications

Educational Background: Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or related field.

Experience: Proven experience (5 plus years) in cloud engineering or Dev Ops with 2 plus years in MLOps or AI infrastructure, Data Engineering, ML Engineering, or similar role.

Domain Expertise

Experience operating machine learning and AI systems in regulated or mission-critical environments.

Strong understanding of ML lifecycle management, including experimentation, validation, deployment, monitoring, and retirement.

Familiarity with agentic AI runtime patterns, including orchestration, tool execution, and human-in the-loop controls.

Knowledge of enterprise AI governance, observability, and maturity models Manage model and agent lifecycle.

Individual Skills

Operational mindset with strong ownership and bias toward reliability and automation.

Ability to troubleshoot complex, distributed AI systems under production constraints.

Clear communicator who can translate operational risks into actionable improvements.

Continuous improvement orientation, balancing speed, safety, and cost.

Technical Expertise

Hands-on expertise with CI/CD and MLOps tooling (e.g., GitHub Actions, Azure DevOps, Terraform).

Experience deploying and operating LLMs, agents, and inference services using containers and orchestration platforms (e.g., Kubernetes).

Proficiency in observability stacks for AI systems (logging, tracing, metrics, evaluation pipelines).

Strong grounding in cloud security and identity, including secrets management, network isolation, and least-privilege access.

Experience with enterprise model registries, feature stores, vector databases, and automated testing for AI workflows.

Deep expertise in Python. Experience with machine learning frameworks and libraries like PyTorch, or scikit-learn.

Experience with ML lifecycle tools like MLflow.

Cloud Platforms: Experience with cloud computing services (Azure and GCP preferred) and their machine learning tools.

Preferred Qualifications

Certifications: Relevant certifications in AI, ML, or data engineering.

Industry Experience: Experience in the energy sector is a plus.

Experience in multi-cloud environment is a plus

Experience designing reusable AI products, agents, and services in a multi-business environment

About Southern Company

Southern Company (NYSE: SO ) is a leading energy provider serving 9 million customers across the Southeast and beyond through its family of companies. Providing clean, safe, reliable and affordable energy with excellent service is our mission. The company has electric operating companies in three states, natural gas distribution companies in four states, a competitive generation company, a leading distributed energy solutions provider with national capabilities, a fiber optics network and telecommunications services. Through an industry-leading commitment to innovation, resilience and sustainability, we are taking action to meet customers' and communities' needs while advancing our goal of net-zero greenhouse gas emissions by 2050. Our uncompromising values ensure we put the needs of those we serve at the center of everything we do and are the key to our sustained success. We are transforming energy into economic, environmental and social progress for tomorrow. Our corporate culture has been recognized by a variety of organizations, earning the company awards and recognitions that reflect Our Values and dedication to service. To learn more, visit .

Southern Company invests in the well-being of its employees and their families through a comprehensive total rewards strategy that includes competitive base salary, annual incentive awards for eligible employees and health, welfare and retirement benefits designed to support physical, financial, and emotional/social well-being. This position may also be eligible for additional compensation, such as an incentive program, with the amount of any bonus/awards subject to the terms and conditions of the applicable incentive plan(s). A summary of the benefits offered for this position can be found here . Additional and specific details about total compensation and benefits will also be provided during the hiring process.

Southern Company is an equal opportunity employer where an applicant's qualifications are considered without regard to race, color, religion, sex, national origin, age, disability, veteran status, genetic information, sexual orientation, gender identity or expression, or any other basis prohibited by law.

Job Identification: 15047

Job Category: Information Technology

Job Schedule: Full time

Company: Southern Company Services

Back to blog

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...