10 operational issues we keep finding inside FTTH ISPs

In the last 24 months we have walked into the back-of-house of operators with subscriber bases ranging from 8,000 to 220,000. The headline problems differ. The underlying problems are almost identical. This is the field-tested list — and what we keep doing about them.

1. Two NMSes, one for each OLT vendor

Whenever an operator acquires another, two NMSes show up. Each is "the source of truth" for its own kit. Engineers keep both tabs open. Mistakes happen in the joins. Consolidation onto a vendor-agnostic NMS pays for itself within 9–12 months in licences alone and within 3–4 months in error reduction.

2. The CRM does not know what the network is doing

Customer care is on the line with a complaining subscriber, and the only signal they have from the network is "active" / "not active." If the subscriber's WAN is up but the WiFi at the CPE is misconfigured, the agent has no way to know. Closing this signal gap is the single biggest move available to most operators.

3. Provisioning takes longer than it should because of one missing API

In nine out of ten manual-provisioning shops, there is exactly one tool (often the OLT controller) that does not have a usable API. So a human runs a CLI script. Everything else is automated. The cost of that one CLI script is days of latency and a long tail of typos.

4. Topology is in three different spreadsheets

Field has a Google Sheet. Engineering has a Visio. The NMS has its own auto-discovered graph. None of them agree. RCA cannot work without canonical topology — and canonical topology means LLDP/CDP-derived from the live network, augmented by hand only at the seams.

5. Firmware is way out of date — or way too fresh

There is no consistent policy. Either CPEs are running 4-year-old firmware with known CVEs, or they were force-pushed to the latest beta last month and 2% of them now hang every 12 hours. The fix is a staged rollout policy with health gates, baked into the ACS.

6. The alarm noise floor is too high

A typical NMS surfaces 4,000–20,000 alarms a month. A typical NOC reads 50. The rest is dropped on the floor and statistically buries the real ones. AI suppression based on topology and historical correlation cuts the visible volume by 80–95% without losing the actionable ones.

7. No backup of OLT running-config — anywhere

We have walked into ops centres where the most expensive single device on the network has no running-config backup beyond "the engineer's laptop." When that OLT dies on Saturday night, the recovery time is days, not hours. Daily automated backups to git-style versioning are table stakes.

8. RADIUS is the secret single point of failure

In the operator we last benchmarked, every minute of RADIUS unavailability cost roughly 400 sessions and a wave of inbound calls. The RADIUS was running on one VM, behind one load balancer, with no clustered fail-over. This is, painfully, the most common configuration we find.

9. Reports take three days to produce

A monthly executive report shouldn't take three days of analyst time. It does, because the data lives in four tools and the analyst is the integration. AI-assisted report generation with templated narratives turns that into an hour — and the analyst spends the saved time on something only humans can do.

10. The "knowledge" lives in two people's heads

Every ISP has at least one engineer who knows exactly which OLT port has the dodgy SFP and which subscriber is always going to call on Friday night. That knowledge is not written down. When the engineer changes jobs, six months of operational quality goes with them. Codifying tribal knowledge into the platform — as policy, as profiles, as monitors — is the work most operators put off and most regret putting off.

How we score on day one

When we engage with a new operator, we run a 90-minute audit against these ten patterns. The output is a heat-map: red (priority fix), amber (queue), green (already healthy). Three or fewer reds is unusual.

10 operational issues we keep finding inside FTTH ISPs

1. Two NMSes, one for each OLT vendor

2. The CRM does not know what the network is doing

3. Provisioning takes longer than it should because of one missing API

4. Topology is in three different spreadsheets

5. Firmware is way out of date — or way too fresh

6. The alarm noise floor is too high

7. No backup of OLT running-config — anywhere

8. RADIUS is the secret single point of failure

9. Reports take three days to produce

10. The "knowledge" lives in two people's heads

Further reading

Keep reading

Why FTTH ISPs need an AI Operating System, not another tool

TR-069 vs TR-369 (USP) in 2026: a practical guide for FTTH operators

Automating root-cause analysis on a GPON network

Put your ISP on autopilot