
Description
Description
SAIC is seeking a Monitoring Lead-Application Hosting candidate to support our DOT contract. This person is responsible for the design of the DOT SMS and supporting the Service Management System operations, to include engineers and the SMS development life cycle.
This role is responsible for ensuring the effective discovery, monitoring, and management of IT infrastructure — including servers, cloud services, networks, applications, and storage systems — through OpenText, OBM, SiteScope and other tools. Resolves and appropriately completes assigned tasks and change requests, acting as an escalation point for monitoring-related issues. Provides co-witness and correlation support during assessment and outage bridges to assist in resolving service disruptions and restoring services. Creates alerts and notifications based on service availability. Creates policies and procedures for performing monitoring and standardizes onboarding and offboarding for monitoring activities and dependencies. Applies new solutions through research and collaboration with team members and determines appropriate courses of action for monitoring enhancements and integrations. Implements Management Packs, custom dashboards, third-party connectors, and monitoring automation to proactively detect, troubleshoot, and resolve service-impacting issues.
Core monitoring duties include enterprise event consolidation, topology-based health analysis, performance metric tracking, event correlation configuration, and ongoing optimization of OBM to improve operational visibility and responsiveness.
Job Description and Duties:
• Position is required to be full-time onsite at DOT HQ, Washington DC
• Assist in driving, standardizing, and managing unified configuration management database.
• Collect and aggregate data to support decisions across ITIL processes (configuration, event, capacity, availability, demand, incident, event, and problem management) and perform analysis.
• Assess and fine tune monitoring capabilities to provide accurate and actionable alerts to the 24x7 operations systems.
• Create and provide intuitive and informative dashboards on current and past performance and service status.
• Configure, maintain, and optimize monitoring dashboards to monitor health and performance across diverse IT infrastructure components.
• Deploy, manage, and update Management Packs, connectors, and monitoring policies to support business application and service monitoring needs.
• Perform event correlation and filtering to streamline incident triage, reduce noise, and ensure timely escalation to appropriate operational teams.
• Integrate data sources from third-party monitoring tools (OpenText OBM, SiteScope, Microsoft SCOM) into the unified OBM event console.
• Conduct proactive performance and availability monitoring, identify root causes of issues, and implement preventive measures to improve service delivery.
Qualifications
Required Education and Experience:
• Must have a minimum of 10 years of experience providing Service Management System administration services.
• Must have a minimum of 7- 8 years of experience related experience.
• Highly prefer a Bachelor's degree in a related field.
• Must have experience managing SMS development activities and be proficient in applying the SDLC and DevOps principles.
• Must have knowledge the current DOT SMS. (Currently ServiceNow)
• Minimum 2 years of experience managing OpenText suite of tools including AI Operations Management, Operations Bridge, SiteScope, and Optic
• Direct experience and expertise with Management Protocols including SNMP, and WMI
• Scripting Experience: PowerShell, VBScript, and/or other scripting experience
• Experience managing monitoring systems with >250 Host and/or >3000 sensors
• Experience operating other monitoring solutions including Zenoss, PRTG, Zabbix, and/or Nagios
• Extensive experience with monitoring server, storage, database management, networking, and applications, with a strong emphasis on maximizing the value and effectiveness of monitoring solutions
• Proven track record of engineering monitoring solutions, providing strategic direction, and fostering a collaborative and innovative work environment.
• Candidate must be a U.S. citizen or green card holder who has resided in the U.S. for at least 3 years and the ability to obtain a public trust.
Preferred skills and qualifications:
• Experience supporting a 24x7 operations environment
• Experience leading troubleshooting coordination/ acting as a Tech Lead during service outages requiring collaboration across multiple teams and infrastructure components
• Systems administrator experience managing Windows and/or Linux operating systems
• Expert level experience with scripting and automation
• Experience integrating monitoring tools to operate through ServiceNow
• Experience automating alerts to generate Service Tickets
• Strong understanding of ITIL and ITSM including monitoring, demand management, availability management, and capacity management
• ITIL certification(s) including Foundations and above strongly preferred
• Experience analyzing monitoring and associated reports to drive business decisions for capacity and availability experience
• Experience creating senior level brief work products including functional and data driven dashboards from captured performance data and availability metrics.
• Experience with visualization and computational tools
Target salary range: $80,001 - $120,000. The estimate displayed represents the typical salary range for this position based on experience and other factors.
Apply on company website