Functional Safety for Braking System through ISO 26262

Functional Safety for Braking System through ISO 26262, Operating System Security and DO 254

Implementation of safety measures is on the rise in today’s automotive world in order to minimize the hazards in case of system malfunction. Today’s automobiles run various safety critical applications like ABS, electronic power steering, air bag sensors, radar sensing, and other chassis related applications. All these safety critical automotive operations need compliance with ISO 26262 (ASIL) and IEC 61508 (SIL) standards, as their safe operation is directly linked to human and social safety. This article presents an introduction to functional safety through ISO 26262 focusing on system, software, and hardware possible failures that bring security threats and discussion on DO 254. It discusses the approach to bridge the gap between different other hazard levels and the system’s ability to identify the particular fault and resolve it within  the minimum time span possible. Results are analyzed by designing models to check and avoid all the failures, loopholes prior to development.

Modern aircraft systems are complex, interconnected, and essential to the safety of crew and passengers alike. When a single hardware design error can cost the lives of hundreds of people, it’s necessary to take all possible steps to prevent it from happening. Aerospace manufacturers seeking to develop mission-critical airborne electronic hardware should take a verifiable approach during the design of the product and follow a relevant formal safety standard, namely DO 254 (DAL).

According to ISO 26262, functional safety is defined as the “absence of unreasonable risk due to hazards caused by malfunctioning behavior of electrical/electronic systems” [1]. The idea of functional safety applies only to active systems. The front door lock on a house provides safety, however, it is not actively avoiding any failures. A door is an example of passive safety.

Functional safety covers an active system that has safety mechanisms in place. These mechanisms are activities or technical solutions to detect, avoid, and control these failures or mitigate their harmful effects [2]. Many of these are also achieved by implementing a function, element, or other redundant technology, like built-in sensors in an autonomous robot in fulfillment centers that detect and avoid objects while moving large items. The safety mechanism is either able to switch or maintain the item in a safe state (like an assembly robot on standby and, if needed, shut down, if it detects an object is blocking its path) or able to alert the driver to take control of the effect of the failure (like an autonomous car driving on an icy road). If at any time these machines fail to perform the intended function, there could be damages.

Faults in a system may occur because of hardware/software errors, permanent/transient errors, or random/systematic errors [2].

The following are the possible reactions when an error occurs:

  • Fail-dangerous: Possibly causes a hazard in the case of a failure
  • Fail-inconsistent: Provided results will be noticeably inconsistent in the case of a failure
  • Fail-stop: Completely stops itself in the case of a failure
  • Fail-safe: Returns to or stays in a safe state in the case of a failure
  • Fail-operational: Continues to work correctly in the case of a failure
  • Fail-silent: Will not disturb anyone in the case of a failure
  • Fail-indicate: Indicates to its environment that it has failed

The implementation of functional safety in a system typically means “mapping” the first three types of reactions above into any of the last four reactions which ensure minimal hazards results from the system failure.

Mapping of other Hazard Level Standards and F.I.R Concept

A. Comparison with other Hazard Level Standards

Automotive Safety Integrity Level (ASIL) is a risk classification scheme defined by the ISO 26262 – Functional Safety for Road Vehicles standard. This is an adaptation of the Safety Integrity Level used in IEC 61508 for the automotive industry. This classification helps defining the safety requirements necessary to be in line with the ISO 26262 standard. The ASIL is established by performing a risk analysis of a potential hazard by looking at the Severity, Exposure and Controllability of the vehicle operating scenario. The safety goal for that hazard in turn carries the ASIL requirements. There are four ASILs identified by the standard: ASIL A, ASIL B, ASIL C, ASIL D. ASIL D dictates the highest integrity requirements on the product and ASIL A is the lowest. Hazards that are identified as QM do not dictate any safety requirements.

ASIL = Severity x (Exposure x Controllability)

TABLE I . Approximate cross-domain mapping of ASIL

Given that ASIL is a relatively recent development, discussions of ASIL often compare its levels to those defined in other well-established safety or quality management systems. In particular, the ASIL is compared to the SIL risk reduction levels defined in IEC 61508 and the design assurance Levels used in the context of DO-178C and DO-254.

 

While there are some similarities, it is important to also understand the differences [3].

B. F.I.R Concept in Functional Safety and Model Analysis.

Failure Analysis is the process of collecting and analyzing data to determine the cause of a failure, often with the goal of determining corrective actions or liability.

 

F.I.R concept

Failure: The loss of a function under stated conditions.

Identify: The means or method by which a failure is detected or isolated and the time it may take.

Resolve: Respond to the failure and return to normal operation.

F.I.R concept comprises quantitative evaluations (such as failure mode effect and diagnostic analysis (FMEDA), timing analysis, and qualitative assessments (such as dependent failure analysis (DFA) [4].

The machine running in safe state diagnosed with the failure which can be an image not shown correctly in the display, a sound not played, the brakes not applied, the vehicle accelerated at the wrong speed are all examples of failures, implements safety mechanisms that can detect the faults that occur during operation and prevent the failure from manifesting.

Types of Analysis and Experiments

Growing complexity and a higher risk of failures in hardware and software complexity are expected to grow at least by a factor of 20 in the next few years, so higher risk of failure. Connectivity brings security threats:

  • Hardware failure
  • Unintended functionality
  • Security threats
  • Software bugs
  • fault – free function

According to ISO 26262, different failures are categorized into hardware, software, network, RTOS, and power.

Hardware Failure: Loss of processing cores, limited storage, reduced or loss memory device or bus overload/incorrect signals, shared and exclusive use of hardware resources, memory, and bus interfaces [5].

Software Failure: Resource starvation, deadlocks, data overwrite, inconsistent data, stack overflow and underflow, blocking of execution, blocking access to communication channel [6].

Network Failure: Network congestion, message corruption, unintended message repetition, message loss, incorrect sequencing of messages, overloading of a network [6].

RTOS Failure: Unable to achieve real-time deadlines, malicious change in schedule table, and executes beyond time slots.

Power Failure: Both reduced and full power failures. Slower processing speed, limited number of resources can be executing concurrently, power spikes, under and over voltages, drift, and oscillation.

By using a system modeling tool, we can assemble a virtual prototype very quickly on a graphical discrete-event simulation platform with a large library of hardware and software modeling components [7].

The prototype is used to provide support to test architecture against standards, identify unrecoverable faults in the system,  provide early feedback, conduct timing, throughput, power consumption, and quality of service trade-offs.

The model generates the failures, tests the behavior of the system, and reports the outcome in a spreadsheet or graph format that matches the requirements of the standard .

A. Hardware Failure

In Fig.1 packets (tasks) generated from three traffic blocks are mapped to the resources (CPU) 1, 2 and 3 for processing. Two failure scenarios are integrated with this model:

  1. Resource Unavailable: An error generates, if the process is allocated to a resource that does not have any memory to handle the task. For example, if Resource 1 has a buffer length of 30, and if the buffer is full, then it cannot accept a new packet until the outstanding packets are processed.
  2. Resource failure: if one of the resources fails, the load must be balanced among the remaining resources.

The analysis for this model is an increase in timing deadlines and buffer usage.

Fig.1. VisualSim Model of Hardware Failure .
Fig.2. Task latency while executing on three resources (CPU).
Fig. 3. When one core (CPU) fails.

B. Software failure

In Fig.4, packets (tasks) generated from four traffic blocks are sent to the Queue from which tasks are loaded into ECU for processing. The ECU takes the higher priority packets from the Queue and when more tasks come in and if Queue is full, then the Queue rejects lower priority packets. In Fig.5, there is the output from the Queue with higher priority tasks allocated to the ECU, and in Fig.6, there is the output from the queue rejecting lower priority tasks. Over time, the priority of the lower priority packet is increased, and it gets the ECU resources.

The model represents starvation or indefinite blocking, a phenomenon associated with the Priority scheduling algorithms, in which a process ready to run for CPU can wait indefinitely because of low priority. In heavily loaded computer system, a steady stream of higher-priority processes can prevent a low-priority process from ever getting the CPU. To overcome starvation, aging is a technique of gradually increasing the priority of processes that waits in the system for a long time.

Fig.4. VisualSim Model of Software Failure.
Fig. 5. Higher priority packets allocated to the ECU.
Fig.6. Lower priority packets getting rejected.

C. RTOS failure

In Fig.7, packets (tasks) generated from three traffic blocks are sent to the unique slots from which tasks running at different slots are loaded to CPU for processing.

If the timing deadlines of a particular slot exceed the slot time for a number of times that is greater than the threshold value, then that slot is disabled, and the tasks allocated to that slot will only be processed after the restart time (given value).

The analysis on this model is a latency calculation for all the tasks that are running for each of the slots in Fig. 8. Disable slots information prints on the console window in Fig. 9.

Fig.7. VisualSim Model for RTOS failure.
Fig. 8. Latency of the task running in different slots.
Fig .9. Disabled slot due to greater execution time of the task.

D. Network failure

In Fig.10, node1 (sender) and node2 (sender) sends the data packets to the node3 (hop) and node4 (hop) which forward it to the node5 (receiver) and node6 (receiver). The network_failure module injects the failure in the model to analyze different variations that occurs dynamically during the simulation run. In Fig. 11, two failures; message loss and message incorrect addressing are shown.

Fig.10. VisualSim Model for Network failure.
Fig. 11. Analysis for message loss and incorrect addressing of the message.

E. Power failure

This model represents the power consumed by the processor (System Resource) captured using the power table. Failures associated with this model are reduction in the amount of available power, lowered battery lifecycle as shown in fig. 14, with reduction in additional power for peak-loading and slower charging.

Fig.12. VisualSim model for Power failure.
Fig.13. Latency of tasks with priority.
Fig.14. Battery life remaining after some percentage of reduction.

Nowadays, most of the modern automobiles are equipped with embedded electronic systems which include a lot of Electronic Controller Units (ECUs), electronic sensors, signals, bus systems, and relevant software. The embedded systems have grown over the years, and the growth has paved the way for autonomously driven cars. Some of the sensors used in autonomously driven cars are radar sensors, video cameras, lidar sensors, ultrasonic sensors, and so on.

The sensors constantly transmit data about the functioning of the different components in the car and also about the external environment. Advanced software designed for autonomously driven cars then processes the inputs from the various sensors. Later, instructions are sent to the actuators of the car to control acceleration, braking, and steering.

The complexity involved in electrical, electronics, and programmable electronics components is huge. Hence, it is essential to analyze the potential risks and prevent malfunctioning of automotive systems, more so in the autonomously driven cars [7].

In order to observe failures in autonomously driven cars and suggest solutions to prevent them, we designed a breaking system module using a combination of different failures – hardware, software, network, power, and RTOS, as discussed above – on a larger scale.

Integrating the above failures into one single module helps to analyze the whole system, failures that can be generated and the possible ways to recover from those failures.

Fig. 15. Typical Autonomous Vehicle Control Block.

IV. BRAKING SYSTEM MODULE AND OUTCOMES

The model represents a miniature of a car braking system, which contains four wheels, proximity sensor, gyro sensor, engine, break engine control unit (ECU), brake pedal, road condition sensor, and can bus network connected via CAN_ether_switch.

Each packet is sent to the ECU via the CAN network to transfer the packets using CAN_ether_switch.

The task must be performed locally by each Brake – Sensor processing and then the ECU for vehicle control. The action engine in Fig. 15 represents the output from the ECU to the appropriate engine block. The output from the ECU after the processing is sent to different wheels and engine as shown below in Fig. 16. The map represents the local access to the database block as shown in Fig. 15. Note that a road condition sensor is used to detect the anomaly on the surface of the road, which represents the failure under the safety of the integrated functionality (SOTIF).

In this model the failures are induced during the simulation run. For example, a case where a core fails, the power spikes too high or the incorrect signals interpreted by the sensors. A bug in the code, a ”missing” requirement, corruption of a variable during operation, a clock signal not getting generated, a CAN message not transmitted, a bit flip, each of these is a final consequences of the fault wherein a functionality does not work as required. Failures cover a very broad spectrum of abnormalities. The failure of such a system could have a significant impact on the safety of the humans and/or the environment.

The results of the analyses are provided below.

In Fig. 17, it depicts different latency from all the four wheels, the average power consumed by the ECU for processing the data along with the charge, discharge voltage curve, and the heat display shown during the simulation run. The text display outputs the correct and incorrect values read by the wheels at different simulation runs.

In Fig.18, it depicts the different speed values and expected versus obtained road values for different road states.

Fig. 16. VisualSim model of Braking System.
Fig. 17. Brake latency, correct and incorrect data read, and average power during the run.
Fig. 18. Speed value and expected road values Versus obtained road value.

V. CONCLUSION

The principal focus of our conceptual design verification activities is formal proof that the failures at different levels are correct. Subsequent design and verification activities will be focused on preserving the implementation integrity of the verified algorithms.

Although failure avoidance should be the first and most relevant step, experience shows that people repeatedly struggle with it simply because their processes won’t support it. To effectively avoid defects, one must define, systematically apply, and quantitatively manage the approach.

By completing a gap analysis between the DO-254 and the Automotive ISO-26262, the proposed approach is to map the DO-254 Avionics safety requirements to corresponding artifacts from automotive ISO 26262 certification, thereby leveraging certification efforts for automotive towards a flight safety evidence package. The avionics safety standards do not describe specific requirements and work products needed components to achieve flight safety certification of systems. The focus is on avoidance of catastrophic events by ensuring correct execution (integrity) and continuous operation (availability) in critical situations.

REFERENCES

[1] “ISO 26262 Road vehicles – Function Safety,” ed: International Organization for Standardization, 2018.

[2] C. Ebert, “Implementing Functional Safety”, IEEE Software, vol. 32, no. 5, pp. 84-89, 2015.

[3] A. Ismail and W. Jung, “Research Trends in Automotive Functional Safety”, 2013 International Conference on Quality Reliability Risk Maintenance and Safety Engineering (QR2MSE), pp. 1-4, 2013.

[4] M. Hillenbrand, M. Heinz, N. Adler, J. Matheis, and K. D. Muller- Glaser, “Failure mode and effect analysis based on electric and electronic architectures of vehicles to support the safety lifecycle ISO/DIS 26262,” In Proceedings of Rapid System Prototyping (RSP), 2010 21st IEEE International Symposium, pp. 1-7, 2010.

[5] “ISO 26262 Road vehicles – Functional safety – Part 5: Product development at the hardware level,” ed: International Organization for Standardization, 2018.

[6] “ISO 26262 Road vehicles – Functional safety – Part 6: Product development at the software level,” ed: International Organization for Standardization, 2018.

[7] Mirabilis Design ”System Modeling and Architecture Exploration.” Internet: https://www.mirabilisdesign.com/getting-started/

[8] Christof Ebert and John Favaro, “Automotive Software”, IEEE Software, vol. 34, no. 3, pp. 33-39, 2017.

[9] Miner, P. S., V. A. Carreño, M. Malekpour and W. Torres, 2000, ‘A Case-study Application of RTCA DO-254: Design Assurance Guidance for Airborne Electronic Hardware,’ Proc. 19th Digital Avionics Systems Conf., Philadelphia, Pennsylvania, pp. 1.A.1-1-8.

[10] Karlsson K., H. Forsberg, Emerging Verification Methods for Complex Hardware in Avionics, Proc. DASC ’05, 24th Digital Avionics Systems Conference, Washington, DC, Oct.-30-Nov. 3, 2005, Vol.1, pp. 6.B.1-1/11.

[11] Andrew Kornecki, Janusz Zalewski, “Software certification for safety-critical systems: A status report”, Computer Science and Information Technology 2008. IMCSIT 2008. International Multiconference on, pp. 665-672, 2008.

[12] D. D. Ward and S. E. Crozier, “The uses and abuses of ASIL decomposition in ISO 26262,” In Proceedings System Safety, incorporating the Cyber Security Conference 7th IET International Conference, pp. 1-18, 2012.

[13] “IEC 61508: Functional safety of electrical/electronic/ programmable electronic safety-related systems”, International Electro-technical Commission IEC, 2010.

Authors: Mohini Yadav, Deepak Shankar, and Tom Jose