Nowadays, the automation and digitalization of production processes is standard in many production plants. A critical aspect of this automation process is the timely and efficient handling of machinery breakdown. Due to the often high costs of production downtime, there is great interest in developing intelligent and flexible strategies to minimize it. In many cases, machine failure is noticed too late and production downtime cannot be avoided. In other cases and in the attempt to avoid downtime, replacement of wearing parts is scheduled periodically and is performed even if the machine is working properly. Evidently, neither the late-reaction nor the in-stock maintenance strategies are ideal approaches for a cost-effective production.

Listening to machines is not new. Since the industrial revolution, machine operators have tried to recognize the sound of a properly functioning machine. With time and experience, machine operators become expert listeners and develop the fine skills necessary to detect unusual sounds in rotating and moving machinery. However, with the increase of production automation, operators are very often in charge of overseeing several machines simultaneously.

The complexity of the auditory scene and the fact that many machine failures only become audible to the operator when a considerable change in the mechanisms has occurred, make minimizing downtime a challenging task.

Advances in media technology and the rapid and wide spread of the Internet of Things have brought new opportunities to develop systems that can emulate the diagnostic hearing abilities of experienced workers. Besides improving robustness and reliability of the monitoring process, these systems can effectively prevent and minimize machine downtime.

Emulating the human acoustical system: microphones and cameras are increasingly replacing human senses. Fraunhofer IDMT

Image 2. Emulating the human acoustical system: microphones and cameras are increasingly replacing human senses

Working principle & challenges of airborne sound analysis

Besides acoustic monitoring, automatic image processing is often used in industrial inspection. Video cameras, infrared cameras as well as X-ray cameras are often used for automatic quality control. In the case of sound analysis, there are two main approaches used for monitoring: structure-borne analysis and the less common airborne analysis. In the case of structure-borne analysis, structural vibrations are measured through sensors mounted on the system under test (SUT). In contrast, airborne approaches capture the radiated sound of the SUT using contactless microphones. The biggest advantage of airborne analysis is that no physical contact is required between the sensor and the SUT. This dramatically eases retrofitting for monitoring purposes. However, airborne sound analysis comes with its own challenge: susceptibility to background noise.

This particular reason has made it an unusual monitoring choice in the past. With the advances in signal processing and machine learning in the last few years, the robustness of airborne sound analysis to background noise has greatly improved. This has allowed the development of robust systems that can replicate and improve the hearing diagnosis abilities of humans by analyzing a frequency range comparable to the one of the human auditory system.

Data models & patterns

Traditionally, algorithms for objectively monitoring airborne sounds have been based on particular time-frequency analyses using handcrafted and manually optimized parameters for a very particular task. These traditional models often fail to handle high-dimensional classification problems. Machine learning techniques are specially tailored for these complex classification tasks. These methods learn to recognize specific patterns from data by means of a training phase. The outcome of training is a compact data model that can be used to classify new unseen data. Machine learning techniques have shown to be successful for image recognition, speech analysis and music information retrieval (MIR), among others.

Similarity of music & machine operating noise

In MIR research, algorithms and systems have been developed to automatically extract score sheets from music signals or to classify music into categories like genre or mood. Industrial audio and music signals have many properties in common such as repeated rhythmical structures or specific dominant frequencies that change over time. Therefore, the use of algorithms and knowledge from MIR in industrial scenarios appears to be a perfect fit.

As in any classification task, training a system for new use-cases requires the collection and labeling of training examples. This can either be done automatically using existing sensor data or manually with the help of experts. The amount of required data depends on the complexity of the problem and similarity to previous use-cases. The collected audio signals are mostly transformed into the frequency domain and different features are automatically extracted. These features represent different properties of the sound and can easily be extended with external sensor data like rotational speed or temperature. Using these features as input, machine learning systems can be trained to distinguish particular sound events and their behavior in time.

Whether an electric motor runs properly can be determined by airborne sound analysis. Fraunhofer IDMT

Image 3. Whether an electric motor runs properly can be determined by airborne sound analysis.

Classifying signals into categories

Commonly used machine learning classifiers include Support Vector Machines (SVM), Gaussian Mixture Models (GMM) and Deep Neural Networks (DNN). In particular, DNNs have brought considerable improvements in classification accuracy in many fields of research including MIR. Results suggest that DNNs can deal with much more complex classification problems that traditional classifiers cannot properly handle. In the industrial scenario, DNNs have been used in industrial mobile applications to measure the operating state of home electronic devices. Even though different microphones can be used and the environment is only known to a certain extent, accuracies of more than 97 percent on unseen data have been achieved. Alternative methods such as sound source separation can be additionally applied to enhance the robustness of the recognition in undefined environments. This can be of special importance in production chains, which reveal a high level of complexity due to interfering sounds from neighboring machines. The final system can be run on embedded devices, desktop systems, mobile devices as well as internal or external servers. If constitution or size of the monitored machine changes due to an adjustment of the product, the monitoring system can also be adapted to varying production requirements. In this case, new data has to be recorded, analyzed and trained.

Audio sensors, either single microphones or microphone arrays, are installed to capture the operating noise of industrial machinery situated in a production hall.

Keeping measuring data secure

To ensure security of the recorded and analyzed data, different approaches can be applied. If data needs to be transferred, stored and analyzed in a trustworthy manner, sensor data and intermediate analysis results can be validated, encrypted and signed. In this way, sensitive data cannot be corrupted and can only be accessed by authorized users and machines. This comes along with special techniques and methods for decoupling real and pseudonymous identities to keep the data’s identity and origin secure.

A possible use-case

Audio sensors, either single microphones or microphone arrays, are installed to capture the operating noise of industrial machinery situated in a production hall. The contactless sensors are strategically placed to optimize the quality of the recorded sounds. A machine learning model is trained to recognize a particular industrial sound. As soon as an anomaly is detected in the sound produced by the machine, a report is sent to the procurement service. An order for the wearing part is automatically placed to the supplier, and the spare part is installed with nearly no machine downtime. All data transfers that occur in the monitoring systems have been secured to avoid any data corruption.

Summary

Acoustic condition monitoring via airborne sound analysis in conjunction with advanced signal processing and machine learning methods has proved to be a powerful tool for early detection of machinery breakdown. It allows timely detection of anomalies, which results in more efficient and cost-effective maintenance. In addition, data security and privacy are included as critical components of the system. Due to its modular concept and flexible adaption to new requirements, the airborne sound analysis is an attractive monitoring system for automated production processes, which offers an individual solution for the customer’s specific needs.

Traditionally, algorithms for objectively monitoring airborne sounds have been based on particular time-frequency analyses using handcrafted and manually optimized parameters for a very particular task.

Judith Liebetrau is project manager of the business unit “Industrial Media Applications IMA” at Fraunhofer Institute for Digital Media Technology IDMT. Liebetrau and her team combine their skills and expertise in acoustics, audio-visual signal processing, metadata analysis, machine learning and media security to create new innovative opportunities for the Internet of Things.

Sascha Grollmisch is software developer and audio signal processing researcher at the Semantic Music Technologies group at Fraunhofer IDMT. His main field of research is automatic music transcription and audio classification. In recent years, he has also focused on industrial audio signal processing with emphasis on machine learning algorithms.