Engineer at Trading Technologies improving platform stability through coding and automation. Focus on building advanced monitoring tools for global trading operations.
Responsibilities
Design, build, and maintain advanced telemetry and automation tooling to monitor global platform health and trigger automated corrective actions.
Own and improve incident response runbooks and automated remediation workflows, reducing MTTR over time.
Participate in on-call rotations, diagnosing and resolving system issues and escalations from the customer support team (this is an internal-facing role, not customer-facing).
Drive continuous improvement through post-incident reviews (PIRs) and engineering initiatives that eliminate classes of failure.
Develop advanced monitoring software in python and GoLang.
Contribute to full-stack troubleshooting across our React.js frontend, Python backend services (Flask, Litestar, Celery), and AWS-managed Kafka (MSK/ESK).
Write infrastructure-as-code using Terraform, building reusable modules and submodules to provision and manage cloud resources.
Focus on coding advanced telemetry, implementing automation strategies, and building tools that proactively monitor platform health.
Rotate into an operational role to swiftly diagnose system issues and handle internal escalations, ensuring continuous platform stability.
Use insights gained during the operations week to develop automated solutions that reduce future incidents and optimize system performance.
Requirements
****Essential Skills & Experience******
**Software Development**
Extensive professional Python development experience, including object-oriented design and multi-threaded applications.
Substantial hands-on Terraform experience—able to author modules and submodules from scratch.
Experience building or supporting React.js applications.
**Cloud & Infrastructure**
Substantial hands-on AWS experience across EC2, Lambda, CloudWatch, EKS, ECS, MSK, ELB, RDS, DynamoDB, and SQS.
Solid Linux systems experience, including monitoring critical system health parameters.
****Desirable Skills & Experience**
Familiarity with trading systems, financial markets, or low-latency environments
AWS Associate-level certification or higher (preferred but not required).
Experience with chaos engineering, SLO/SLI frameworks, or formal reliability programs.
Prior on-call experience at a high-traffic or mission-critical platform.
Working understanding of TCP/IP, DNS, HTTP, and load balancing concepts
Experience with Golang, or a clear eagerness and ability to learn it quickly.****
Benefits
*We offer a comprehensive benefits package designed to support your well-being, growth, and work-life balance.*
**Health & Financial Security:**
Pension contributions
**Time Off & Flexibility:**
Enjoy the best of both worlds: the energy and collaboration of in-person work, combined with the convenience and focus of remote days. This is a hybrid position requiring three days of in-office collaboration per week, with the flexibility to work remotely for the remaining two days. Our hybrid model is designed to balance individual flexibility with the benefits of in-person collaboration, enhanced team cohesion, spontaneous innovation, hands-on mentorship opportunities and strengthens our company culture.
25 days of Paid Time Off (PTO) per year, with the option to roll over unused days.
One dedicated day per year for volunteering.
Two professional development days per year to allow uninterrupted professional development.
An additional PTO day added during milestone anniversary years.
Generous parental leave for all parents (including adoptive parents).
**Work-Life Support & Resources:**
Budget for tech accessories, including monitors, headphones, keyboards, and other office equipment.
Milestone anniversary bonuses.
**Wellness & Lifestyle Perks:**
Subsidy contributions toward gym memberships and health/wellness initiatives (including discounted healthcare premiums, healthy meal delivery programs, or smoking cessation support).
**Our Culture:**
Forward-thinking, culture-based organization with collaborative teams that promote diversity and inclusion.****
Deployment Engineer at WRITER architecting AI solutions for enterprise customers. Collaborating with cross - functional teams to deliver impactful technologies and drive business outcomes.
DevSecOps Engineer utilizing open - source frameworks and collaboration to address client challenges at Booz Allen. Delivering user - oriented solutions consistently while mastering new tools and techniques.
DevOps Engineer designing, implementing CI/CD pipelines and supporting cloud - based solutions at eInfochips. Collaborating with QA and Engineering teams for release readiness.
DevOps Engineer III providing L3 support for Operations across Edge/on - prem and cloud environments. Building automations and handling incidents for customer deployments.
SRE leading reliability and operational excellence at a mortgage tech platform. Designing systems, tooling, and processes for managing Pylon's production systems in Palo Alto.
Senior Build & Release Engineer at GXO Logistics responsible for CI/CD solutions and build automation across various environments. Collaborating with teams for smooth software deployments and mentoring staff.
Senior Site Reliability Engineer improving the reliability of Acuity’s cloud services. Collaborating across teams to define observability standards and incident response in Cork Digital Centre of Excellence.
Azure Senior DevOps Engineer supporting critical cloud systems in the Azure Government Cloud environment. Leading CI/CD pipeline design and implementation with operational best practices.
Automation Engineer enhancing infrastructure and automating operations for client systems. Working in a complex environment oriented towards automation, security, and performance.
Graduate Reliability Engineer at GKN Aerospace enhancing operational excellence through data analysis and project participation within large structural assemblies.