“What are the programs/methods used to monitor and ensure uptime of your network?”
We use a number of methods because the scale of our networks (4500+) makes no single method a sure-fire solution. As a result, we monitor our full system’s health with Nagios, which is a network and server-monitoring platform. This open-source tool has served us well for more than 12 years by allowing custom alerts and thresholds for sending those alerts. This tool is combined with various vendor-based tools for large-format displays. All of our large-format LED partners have proprietary tools to keep an eye on system health.
While it seems old-school for high-profile indoor and all outdoor boards, we still believe in webcams to have ‘eyes’ on the displays. Sometimes, data is surpassed with a picture when it comes to seeing issues and figuring out what’s wrong. While most hardware vendors will tell you that the health data they provide should be enough without cameras, I’ll tell you that the cameras earn their ROI back pretty quickly when something goes sideways. It sounds old school, but I can tell you we take great pride in our quality of service, and these cameras make a big difference.
Finally: people. Great staff keeps an eye on data and images to catch issues and fix them. There is no system of automation that replaces people (yet). If someone has a great AI platform that can look at data trends and reports and tell me when something is about to break, I’ll buy it. For now, instinct and experience are vital parts of my QoS (Quality of Service) plan.
On a final note, one challenge I see is the volume of data. All screens, computers, network devices and software can pump out thousands of data points, emails and red dashboard lights when things go wrong. But, some problems repair themselves, some problems are momentary and some ‘problems’ are false reports. If you chase all of them, you’ll be chasing your tail. You need to understand when a flag is a problem, and THAT is what our training and staff do every day. It’s the most expensive part of the process, and the place where mistakes happen. Things will improve when the fire hose of data can be handled by ‘Big Data’ tools, but we don’t have them yet. I’m looking forward to reading other answers to see if they have this problem licked yet.