Knime and Wireshark Data Exploration

Autor: B. Novotný <novotny.bohumil(at)phd.feec.vutbr.cz>, Pracoviště: Vysoké učení technické v Brně, Téma: Aplikace, sítě a služby, Vydáno dne: 06. 05. 2015

The goal of this article is to present possibilities and practical experiments with network data exploration in KNIME


Knime analytics platform is a powerful platform to perform data analysis. It provides many extensions that represent individual nodes in a graphical environment. These nodes can be used in fast modeling, analysis and data mining combination. In this paper, we join the use of the program KNIME and captured packets in the Wireshark program in order to show advanced options and possibility in data exploration and view.

Keywords: data; knime; packet; wireshark; analysis


Introduction

Analysis of the data communication is an integral part of today's communication systems. This analysis provides the detection of data behavior as well as ensures the security in networks. For such purpose are used lot of techniques, programs and implemented algorithms in network hardware and software. For the data mining and exploration can be also used the analytic platform KNIME [1]. We decided to explore a data flow captured by Wireshark [2]. In such way is possible to deploy extra algorithm to a data analysis, which is included in KNIME. In this example WEKA supplement [3] has been used. We used in our analysis, in intention to show potential use of KNIME, data from the laboratory of communication technology. For this case we configured as CISCO telephony to communicate thru H3X as well analog telephony connected to SMC PBX10. In this network Wireshark has been running to capture the communication of mentioned nodes.

Detailed system and network setup

The whole communication of the laboratory had been saved to a .csv file by Wireshark. This file was later analyzed in KNIME. In this case LineEye-580 FX was used, as shown in figure 1.

KNIME_01

Fig. 1 Simplified network setup.

SMCDSP-205 phone [4], connected to SMCPBX10 digital private branch exchange, was used for the setup of communication in the network. Second one, phone SMCDSP-205 was connected thru HUB “Superstack”, as shown in figure 1. The HUB device forwarded communication to all own ports. A computer with installed Wireshark was listening to one of HUB’s port. It did a simple network probe. The second scenario, as shown in figure 1 on the right site, it is almost similar to the first one. Main difference is that the LineEye-580-FX device is connected as a network probe instead of the HUB. LineEye device uses USB 2.0 interface to communicate with any PC. At the final step the entire communication of the laboratory was captured in addition to the two previous scenarios. These data had been also used for an analysis and to filter proper communication.

KNIME data analysis

Main dialog window of KNIME program consists of nodes with their own functions. Our analysis diagram, as is shown in picture 2, consists of file reader, color manager and plot diagrams nodes. Each single node has special functions as follow:

File reader is designed for reading data from *.csv file, which are captured and stored in computer.
Color Manager provides coloring of particular entities to differentiate them in the further data analysis.
WEKA predictor takes a model generated in a weka node and classifies the test data at the import.
Fuzzy Rule Learner Learns a Fuzzy Rule Model on labeled numeric data using Mixed Fuzzy Rule Formation as the underlying training algorithm (also known as RecBF-DDA algorithm)
Graphs and histograms generating real time view on analyzed data.

KNIME_02

Fig. 2 KNIME workflow schema.

The aim of these tests was to verify the communication in the access network on various platforms through the trunk and the subsequent analysis of data on the network. This communication was addressed only in the context of wire bonds on the basis of known communication protocols and software clients using communication via the switchboard SMCPBX10. PBX digital, analog and software were used in the laboratory for further tests.

Analysis of the operation carried out in the same manner. In the analysis any unwanted traffic were traced. The analysis software was possible to interactively monitor incoming and outgoing traffic based on ports of each connected device, the type of protocols that use the device, if more efficient filters by parameters. For local networks it was proved to be very effective observation mainly due to the rapid detection of potential DDoS or other cyber-attacks.

Results of the monitoring network can be seen on the graphic representation of the overall network analysis. As is evident from Figure 3, the most overhanded protocol was RTP protocol through which communication on the network proceeded. To record the values from the network were performed operations call dialing, call and hang from one to multiple sources at the same time. Percentages of other protocols appeared to be of little importance.

KNIME_03

Fig. 3 Graph of network traffic

In Figure 4, the spread chart show from which source device operations were generated and which protocols thereto were used. Spread chart is designed to graphically show that on the network does not occur unwanted communication. If in the monitored network devices began to generate unauthorized traffic or equipment that generates the extreme values of operation, the operator conducting surveillance can immediately conclude that there is non-standard situations on networks and might discredit this problem.

KNIME_04

Fig. 4 Distracted presentation of used protocols, depending on the source addresses.

When the operator records a non-standard behavior, then is possible in the tool KNIME to indicate this communication in any type of graph and subsequently filter as shown in Figure 5. Sorting and subsequent filtering can be addressed based on destination IP addresses, source, based on the report and other adjustable or programmable parts, thanks to the openness of the system KNIME.

KNIME_05

Fig. 5 Highlighted RTP packet

More structured way how to gain insight into the used protocols in relation to time can be in various use of forms and types of graphs. As shown in figure 5, the RTP protocol was dominating as expected. The call in this case was exchanged between two phones via digital PBX SMC SMCPBX10.

KNIME_06

Fig. 6 Structure of used protocols

Summary and Conclusions

The intention of this article was to provide insights into alternative ways of data analysis. To this software KNIME was used with its individual modules. The article presents the results descibing data analysis, surveilance and seucity audit example in laboratory network including multiple protocols communication.

Research described in this paper was financed by the National Sustainability Program under grant LO1401. For the research, infrastructure of the SIX Centre was used. This work was also supported by the project FEKT-S-14-2352: Research of electronic, communication and information systems.

Bibliography

[1] KNIME.com AG.; KNIME: Open for innovation [online]. 2015 [cit. 2015-04-27]. Available from: https://tech.knime.org/home.
[2] WIRESHARK FOUNDATION. WIRESHARK: Go deep [online]. 2015 [cit. 2015-04-27]. Available from: https://www.wireshark.org/.
[3] SMC Networks, VoIP SMCDSP-205 2014 [online]. [cit. 2015-04-27] Available from: http://www.edge-core.com/temp/edm/old_downloads/ds/ds_SMCDSP-205.pdf.