The system worked in the following way: Data Collector polled the devices via SNMP (Simple Network Management Protocol) and stored raw data in Oracle DB. Then backend processed raw data and transformed it into the format that allowed for better big data analytics. As a result, the customer was able to discover devices in the network using IP range, view statistic for selected devices for any period, build reports, charts, set up alarms and manage multi-vendor networks, systems and databases.
Although the solution worked well for most users, some major customers had several millions of devices. The system was launched in 1998, so it had to store the detailed statistic for at least 10 years to compare results and make predictions.
Each device produced around 10 KB of data. Some new devices produced much more information, around 100-300 KB, but let’s take the average of 10 KB for calculation. Each device was polled every 1 minute, which made up 10 KB x 60 x 24 =14 MB per day, or 14 MB *365 = 5 GB per year. Assuming the customer had 10 million of devices, his devices produced around 48 PB of data per year.
Since this solution had no capabilities for horizontal scaling, it was impossible to work with such amount of data. To provide customer with the detailed statistics, we introduced some limitations on the number of devices and days. However, this solution was inconvenient for the customer.
To improve our current implementation of data storing and processing, we experimented a bit with big data technologies:
First, we decided to use MongoDB shared cluster instead of single Oracle database. This allowed client to store much more data, but the performance was almost the same, because data was collected by single data collector and processed by single backend. That’s why we also decided to implement Hadoop Distributed File System (HFDS) for raw data storage and Hadoop’s MapReduce for data processing to store it in MongoDB.
After tool research on the market, Hadoop MapReduce was replaced with Apache Spark, as it was easier to use and allowed running programs faster.
As a result, we received a system with multiple data collectors installed on several segments of the network, HDFS, Spark, and MongoDB. We emulated the monitoring of 4 million of devices and haven’t detected any problem neither in the performance of the system nor in data storage capabilities. As a comparison, the initial implementation without Big Data was able to monitor up to 100,000 devices only, so we can talk about significant system improvement.