Understanding the Evolution of Industry and the Need for Human-Centric Approaches
In the evolution of industrial sectors, Industry 4.0 and Industry 5.0 represent transformative phases characterized by increasing digitization and human-centric approaches, respectively. Industry 4.0, which is still in the developmental stages, focuses on the integration of digital technologies such as the Internet of Things (IoT), artificial intelligence, and big data analytics to enhance industrial processes. As we transition to Industry 5.0, the emphasis shifts to incorporating human skills and expertise alongside advanced technologies to promote sustainability and resilience.
In this context, the role of human workers, especially in the manufacturing sector, is pivotal, emphasizing the need for human-centric approaches to ensure flexibility, creativity, and problem-solving. The challenge lies in creating suitable working conditions tailored to individual capabilities, going beyond the traditional view of “workers” as a homogeneous group. As the workforce’s age increases and working conditions evolve, understanding and accommodating the variability in workers’ capabilities becomes imperative.
The dynamic nature of modern manufacturing – characterized by frequent changes in workstations and tasks and the introduction of collaborative robots – poses a significant challenge in maintaining correct postures and in mitigating the risk of work-related musculoskeletal disorders (MSDs). In this scenario, industries will increasingly have to account for human variability and anticipate workers’ behaviors, while the traditional concept of a uniform workforce is being replaced by a need for tailored solutions that predict individual behaviors and specific work-related risks.
Addressing Ergonomic Challenges through Technological Innovations
To mitigate ergonomic risks and enhance worker well-being, it is essential to consider the unique characteristics and performance of every worker. This requires the development of robust and cost-effective tools capable of directly monitoring working postures, continuously assessing ergonomic risks during work activities, informing workers in case of incorrect postures, and providing insights to managers to improve the workspace and the individuals’ working conditions.
Ergonomists generally use ergonomic risk assessment methods to monitor and decrease ergonomic risks associated with work-related MSDs. Some of these methods involve evaluating risks through either direct on-site observation or posthumous analysis of previously recorded videos, focusing on workers while they perform their tasks. Such methods primarily exploit standardized observational-based tools such as Rapid Entire Body Assessment (REBA), Rapid Upper Limb Assessment (RULA), Ovako Working posture Analysing System (OWAS), and Occupational Repetitive Actions Index (OCRA Index).
This approach necessitates the involvement of an experienced ergonomist who directly observes workers’ actions, either in person or through video recordings. The collection of data required for calculating the risk index is typically time-consuming because it involves subjective observation or a straightforward estimation of projected joint angles (e.g., elbow, shoulder, knee, trunk, and neck) by analyzing videos or pictures, and, additionally, it is strictly subjective, so it does not guarantee the repeatability of measurements. In this scenario, addressing these limitations is crucial to developing more efficient and adaptable methods for ergonomic risk assessment in dynamic work environments.
Leveraging Motion Capture Systems for Ergonomic Analysis
A viable solution might be the introduction of new methods and tools for automated or semi-automated ergonomic postural assessment. To this end, Motion Capture (MoCap) systems can be used to collect data accurately and quantitatively. The first attempts done with MoCap systems to assess ergonomic risk were based on the use of high-end technologies, leveraging optical devices (i.e., markers) or inertial sensors mounted on the tracked person’s body. Today, the most reliable commercially available options still use this approach based on sensor and marker-based optical systems.
Although robust, these systems are costly, have setup limitations, and are challenging to use in real working environments. They often require wearable sensors or markers, making them invasive and potentially altering the operator’s normal behavior. Operators wearing such sensors are always fully aware that they are being monitored: the risk is that they could behave correctly on purpose, masking any ergonomic flaws in the workstation in which they are working.
By leveraging Machine Learning technologies, and in particular Deep Learning, it is possible to develop models that can recognize the human figure and the position of its body joints through pattern recognition techniques. The advantage of AI-based MoCap systems over sensor and marker-based ones lies in the fact that the former don’t require the operator to wear any specialized device, being it a marker or a sensor, and that they only need the use of one or two common RGB cameras (e.g., webcams or smartphones’ cameras).
One of the most used Deep Learning models trained to track the human body is OpenPose, a tool developed by Carnegie Mellon University that shows remarkable accuracy and high robustness against occlusion but at the cost of being highly computationally intensive. Tf-pose-estimation started as a fork of the OpenPose open-source project and, thanks to its light computational requirements, it can be run even on mobile devices, showing at the same time good performances in pose recognition in a laboratory environment.
Another competitor to Openpose is MediaPipe, a complete suite of body feature detection models (e.g. pose, hand, iris, face) proposed by Google. Recent advances include the development of systems such as the “Smart Ergonomic Explorer” (SEE) and the “Quick Capture” system, both of which utilize Convolutional Pose Machines (CPM) for enhanced posture detection accuracy. These systems have the potential to revolutionize markerless postural assessment, enhancing its accessibility and applicability.
Indeed, 2D RGB MoCap systems, although less accurate, boast portability and flexibility, reducing costs and adapting to various positioning configurations without the need to recalibrate each time. However, despite the promising potential of these systems, their validation has primarily occurred under controlled laboratory conditions, while limited experimentation in real working environments has been carried out so far.
Benchmarking 2D RGB Motion Capture Systems for Ergonomic Assessment
To conduct a comparative evaluation of the various 2D RGB MoCap systems for ergonomic evaluation in the literature, we first proceeded with a survey of the state of the art for this topic. Table 1 reports the results of this survey, conducted through the academic research database Google Scholar, restricted to the period from 2017 (i.e., the year of the introduction of the first AI-based body tracking models) to the present.
Publication | Accuracy for ERA Scores | Accuracy for ERA Risk Levels |
---|---|---|
Yan et al. | 87-89% | 87-89% |
Li et al. (2019) | 87% | 89% |
Li et al. (2020) | 97% | 97% |
Massiris Fernandez et al. | N/A | Perfect agreement |
Agostinelli et al. | 80% | 80% |
Nayak & Kim | 45% | 64% |
Generosi et al. | 60% | 80% |
Jeong & Kook | 29% | 86% |
Table 1: Benchmark of 2D RGB MoCap systems for ergonomic assessment
Analyzing the results, we noticed that currently, available research papers report results in an extremely heterogeneous manner, making it challenging to objectively compare the performance achievable by the various tools. This difficulty mainly arises because some studies report comparative results between ergonomic values predicted by the proposed tool and values collected through a ground-truth approach, while others report only statistical indicators of accuracy and reliability.
Consequently, we decided to adopt the accuracy score, computing it for the Ergonomic Risk Assessment (ERA) Score and Risk Level. It was decided that Cohen’s kappa would not be used, as this would have resulted in a loss of reliability: regardless of the ergonomic method used, in fact, most of the reported resulting indices fell into the same two or three score or risk levels.
Based on the results of the reported benchmarking, the tool described in Agostinelli et al. and Generosi et al. was adopted for the experimentation in this paper. This tool demonstrated to perform with accuracy and reliability generally comparable with those of the other proposed systems (laboratory: ERA score accuracy = 80%, ERA risk levels accuracy = 80%; industrial manufacturing setting: ERA score accuracy = 60%, ERA risk levels accuracy = 80%). To the best of our knowledge, it is the only tool for which results are available from tests conducted both in a laboratory and in a real industrial manufacturing working environment (i.e., a washing machine assembly line), with the validation contexts described in sufficient detail.
Evaluating the Performance of the Benchmarked Tool in Real-World Manufacturing Environments
The evaluated tool exploits novel algorithms to predict the position of human body joint points and calculate angles between limbs. This system includes the implementation of the tf-pose-estimation tool (MCU model) to track human skeletal joints and Google Mediapipe’s hand tracking model to enable the recognition of hand landmarks. The angles between body segments are assessed by firstly evaluating the predicted body orientation in each frame, and then using the coordinates of the respective keypoints to compute the angles between each body segment.
The tool requires at least one video recording of the operator to perform the joint’s angle extraction. The relative positioning of the camera and operator is not binding, although the lateral framing allows the system to obtain the best results in terms of reliability of ergonomic risk assessment, given that most of the angles considered by the implemented protocols are extracted from the side of the human figure. If the working environment layout permits it, it is possible to acquire multiple viewpoints of the worker (e.g., a lateral and a frontal/rear one) and feed them to the software for improved accuracy.
The ergonomist can choose to edit the videos with an external video editing software or use the integrated web interface, which provides a Graphical User Interface (GUI) with basic video editing tools and an interface to set up the parameters for the ergonomic risk assessments. The tool also integrates a Camera Synchronization module, designed to ensure accurate and reliable ergonomic assessments by aligning multiple video streams when they have not been natively synchronised during the recording.
We then carried out the experimentation in three different industrial manufacturing contexts, belonging to a company producing kitchen extractor hoods. These contexts differed in terms of production lines’ layout, the level of visual obstacles surrounding the workers, and the level of automation:
Line 1 (L1): This was an assembly line for a finished product with a linear layout and a medium-low level of automation. The conveyor belt’s speed allowed operators to handle it, not dictating the timing of the production line. The output was 1 piece per minute, with five workstations.
Line 2 (L2): This line consisted of a single bending press. There was no automation, and the operator moved around a lot, often going in and out of the frame. Additionally, the positioning of the bending machine close to a very powerful artificial light source made it difficult to position the cameras to ensure they were not blinded by the light. The production output was about 0.3 pieces per minute.
Line 3 (L3): This was an assembly line for a finished product, consisting of two parallel sub-lines: the main line and the supply line, both with a linear layout. The automation level was medium, featuring a conveyor belt similar to Line 1, with additional conveyor belts delivering components from the supply line to the main line, and a rail-guided manipulator to assist operators in moving heavy assemblies. The output was 0.75 pieces per minute.
For each production line, we carried out the monitoring using RGB action cameras, strategically positioned behind and to the side of the monitored operators, whenever possible. The monitoring spanned a single work shift, capturing footage of one operator at each workstation. In total, 12 operators were engaged during the analyzed work shift across the three production lines.
Results and Discussion
The ergonomic evaluation procedure consisted of two main steps: 1) Manual selection of the most hazardous postures from the video footage, and manual extraction of the angles from the selected postures; and 2) Automatic extraction of the angles from the entire video footage, selecting only the angles from the frames matching those containing the selected postures. REBA and RULA analyses were then carried out for both the manual and automatic angles.
Table 2 shows the descriptive statistics (median and standard deviation) for the REBA and RULA analysis for each production line.
Production Line | REBA Score | REBA Risk Level | RULA Score | RULA Risk Level | ||||
---|---|---|---|---|---|---|---|---|
Median | Std. Dev. | Median | Std. Dev. | Median | Std. Dev. | Median | Std. Dev. | |
L1 | 4.22 | 1.73 | 3.48 | 1.16 | 4.04 | 1.24 | 3.39 | 0.99 |
L2 | 5.80 | 2.68 | 3.40 | 1.14 | 5.00 | 1.73 | 3.20 | 0.84 |
L3 | 4.82 | 2.52 | 3.73 | 1.35 | 4.59 | 1.76 | 3.27 | 1.16 |
Table 2: REBA and RULA analysis results for the three production lines
The results highlight two aspects of the tested RGB MoCap system in industrial manufacturing environments. On one hand, the system’s ability to accurately predict the ERA Score does not appear to be particularly high, with RMSE values exceeding 2 in several cases. However, in most cases, the system has a Risk Level RMSE of less than 1 or even equal to 0.
The other aspect that emerges from the results is the high variability of the tool’s performances due to the external conditions (e.g., spatial layout, automation level, workstations’ proximity level, positioning of picking racks in relation to the operator, operator’s mobility level during tasks, employed ergonomic method). The system’s Accuracy in predicting the ERA Scores ranges from 9.09% (L3 REBA) to 60.00% (L2 REBA), while the Accuracy corresponding to the ERA Risk Level fluctuates between 40.00% (L2 RULA) and 80.00% (L2 REBA).
Although the other 2D RGB MoCap systems proposed in the literature may exhibit similar variability if subjected to the same conditions, the performances of the tested tool, both in terms of ERA Score and ERA Risk Level, are generally comparable to those of the other systems. This is likely due to the fact that these tools are mostly based on the same Deep Learning models (OpenPose CMU and Mediapipe BlazePose) for the body landmarks detection.
The discussed limitation could be overcome by providing fixed camera locations integrated along the lines at the time of line design. This would allow more than two cameras to be integrated at each location, thus potentially improving the system’s accuracy. More studies are needed to assess how possible variables (e.g., presence of other persons in the frame, distance of the operator from the camera, degree of occlusion of the operator’s image, lighting conditions, angle of the frame with respect to the frontal, sagittal and transverse planes, degree of overlap of the operator’s image with that of other persons in the frame) might impact the system’s accuracy.
Furthermore, the results provided by the system were only compared with those obtained by ergonomic experts through manual analysis. Although manual analysis is the most widely used system to carry out ergonomic analysis to date, it has its limitations, as well as the various proposed MoCap systems. A viable solution to overcome the current limitation of MoCap systems might be the development of a methodology based on the identification of view-invariant features that allow the 3D posture to be recognised regardless of the viewpoint from which it is observed, or the creation of a synthetic dataset of the postures deemed ergonomically unsuitable by the different ergonomic risk assessment methods.
Conclusion
This paper reports the results of an experimentation carried out in an industrial manufacturing context of a state-of-the-art 2D RGB MoCap system, comparing the results of its application in diverse production lines and workspaces differing each other for the layout, the environmental lighting, the workers’ mobility inside the working spaces and the presence of occluding obstacles located around the operators.
The results of the experiments showed a large variability in the tools’ accuracy as a function of the characteristics of the lines and workstations, as well as the configuration and positioning of the cameras. In general, the results obtained do not allow us to state that the considered system, as well as other similar systems, has sufficient reliability to effectively support the ergonomic analysis carried out by experts to certify workstations’ Risk Level.
However, the Accuracy level demonstrated by the system in terms of ERA Risk Level allows us to argue that such systems may be useful to continuously monitor the operators during their work activities and can represent a helpful tool to speed up the ergonomists’ job. Such an approach can improve the efficiency, productivity and competitiveness of manufacturing companies, particularly those characterised by work environments with frequent changes of operators at workstations and tasks.
To improve the accuracy of current MoCap systems, it is