# Automotive embedded software architecture in the multi-core age

Paolo Gai Evidence Pisa, Italy

EVIDENCE® EMBEDDING TECHNOLOGY Massimo Violante Politecnico di Torino Torino, Italy

#### The big picture



#### **Space Shuttle** ~500.000 LOCs



#### **High-end vehicle** ~100 Millions LOCs

https://www.wired.com/2012/12/automotive-os-war/

"The car of the future will be the most powerful computer you will ever own" The Telegraph







Boeing 777 ~3 Millions LOCs

Ref: Christopher Davey, MBSE Workshop 2013

### The big picture



## The big picture

- New features are challenging from the computing point of view
  - Pedestrian detection
  - Autonomous emergency braking
  - Autonomous driving
  - …
- But also evolved traditional features are computing demanding
  - Powertrain
  - Braking





#### An example taken from powertrain



EVIDENCE

Today: huge calibration effort needed to define LUT content. "Simple" control strategy with little processing and many LUT accesses.

> Tomorrow: little calibration effort. "Complex" control strategy with significant processing  $\rightarrow$  combustion model run in real-time!



### Designer wish list

- Computing capabilities for demanding vehicle functions
- Platforms for consolidating multiple applications on a single device
- Effort-free safety for critical vehicle functions
- Solutions to maximize reusability and scalability





#### Designer wish list

#### Multi-core is most likely the computing platform of choice





IDENCE



#### A reference use-case for the tutorial

Typical Electronic Control Unit (ECU)



### The enabling technologies

- Multi-core architectures
  - Computing can be distributed among several cores
  - Same chip can accommodate multiple functions
- AUTOSAR
  - Functions can be made independent from the execution platform
- Virtualization
  - Function can be segregated to dedicated resources to avoid interferences





#### Outline

- Multicore architectures
- AUTOSAR
- Virtualization
- A use case





#### Outline

- Multicore architectures
- AUTOSAR
- Virtualization
- A use case





## Multicore architectures (1)

Embedded systems used in automotive changed over the years:

- 1985 Isolated embedded architectures
- 1995 Distributed architectures over CAN bus
- 2005 Integrated architectures based on AUTOSAR
- 2015 Distributed architectures based on Multicore AUTOSAR + Infotainment solutions
- 2025 Zonal architectures





### Multicore architectures (2)

- Nowadays we can identify three classes of processors in the automotive market:
  - System-On-Chip for control applications
  - Microprocessors for graphical applications
  - High-performance chips for ADAS applications





## SoC for control applications (1)

- Low end microcontrollers, up to complex multicores
- Static workloads (often based on OSEK/AUTOSAR)
- Mostly single cores, multicores
  - for high performance applications
  - for integrating more applications in the same chip





## SoC for control applications (2)

• Heterogeneous  $\rightarrow$  two cores with "similar" ISA

- NXP MPC5668G Fado hosting a PPC z6 and a PPC z0
- 2<sup>nd</sup> CPU dedicated to different subsystems (peripherals)
- Uniform memory address space across cores
- Safety → lockstep configuration
  - As an example, NXP MPC5643L Leopard, AURIX Tricore
- RISC + MCU + DSP
  - As an example, Infineon Tricore
- No external RAM and Flash

Extended debug support with ETM macrocells



## SoC for control applications (3)

High number of peripherals

- often devoted to control and timing (ADC, PWM, Encoders, Timers, ...)
- lack of traditional desktop interfaces (no USB / Video / ...)
- communication buses CAN/FlexRay/LIN, recently Ethernet.
- Complex co-processors (PowerPC eTPU , Bosch GTM)
  - used to perform complex real-time-related features
- Cryptography subsystems isolated from main CPU
  - Infineon Tricore HSM
- Scratchpad memories





#### MPU, not MMU

#### Trend towards MPU, no MMU

- NXP MPC5674F Mamba has a MMU
- NXP MPC5777M Matterhorn has only the MPU



#### Hypervisor extensions with MPU

- Example: Cortex R52
- Hypervisor extensions coupled with MPU
- Multicore and Safety together
- Use cases:
  - Integration of different legacy applications
  - Safety/Security VMs







#### RISC V on the rise

- Open source architecture
- Multicore, Hypervisor support





Microchip PolarFire FPGA with hardcore RISC-V



## RISC V on the rise (2)

- Open source architecture
- Multicore, Hypervisor support

| RISC-V Cores                    |                      |                      |                                                                       | Peripherals  |                                                                                                                  | Interconnect                                                                |       |
|---------------------------------|----------------------|----------------------|-----------------------------------------------------------------------|--------------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|-------|
| RI5CY Micro<br>riscy<br>32b 32b |                      | Zero<br>riscy<br>32b | Ariane                                                                | JTAG         | SPI                                                                                                              | Logarithmic interc                                                          | onnec |
|                                 |                      |                      | 64b                                                                   | UART         | I2S                                                                                                              | APB – Periphera                                                             | l Bus |
|                                 |                      |                      |                                                                       | DMA          | GPIO                                                                                                             | AXI4 – Intercon                                                             | nect  |
| Single C<br>• PULP              | M<br>R5<br>A<br>Core |                      | M M<br>intercon<br>A R5<br>Iuster<br>Multi-core<br>Fulmine<br>Mr. Wol | R5 R5        | M<br>I<br>O<br>R5                                                                                                | M M M M<br>interconnect<br>A R5 R5 R5<br>cluster<br>Multi-cluster<br>• Hero | 5     |
| IOT                             |                      |                      |                                                                       |              |                                                                                                                  |                                                                             | PC    |
| Accelerat                       | ors                  |                      |                                                                       |              |                                                                                                                  |                                                                             |       |
| HWC<br>(convol                  |                      | Neuros<br>(M         |                                                                       | HWC<br>(cryp | and the second | PULPO<br>(1 <sup>st</sup> order opt)                                        |       |
|                                 | ht                   | tps:                 | //pul                                                                 | b-pla        | atfor                                                                                                            | m.org/                                                                      |       |





## Microprocessors for graphical apps

- Typically for infotainment subsystems
  - general purpose microprocessors Intel/ARM/MIPS cores
  - In the past hosted commercial operating systems VxWorks, QNX, Windows CE
  - Support for USB/Ethernet/Wifi/...
- Evolution towards low-power consumption
  - migration to multi-cores
  - video acceleration (video decoders, GPU for OpenGL)
- Evolution trend
  - move to Linux (see Genivi/Tizen, now dead)





#### Microprocessors and small cores

Integration of small microcontrollers (Cortex M/R) for hard real-time / safety features

Optional Capability

Multimedia Graphics Processing Unit (GPU)

- TI Jacinto
- NXP SoloX
- Xilinx Ultrascale+
- NXP i.MX i.MX 6Sol



Core Complex 1

Core Complex 2



## High-performance processors (1)

- Used for ADAS Applications
  - Video processing / Image recognition
  - Intel / 64 bit ARM / VLIW
  - Multi / Manycore
  - Programmable GPUs
  - Hypervisor extensions
  - Ethernet in place of slower Buses





## High-performance processors (2)

#### Nvidia X1/Parker/Xavier/Orin

| NVIDIA ARM SoCs          |                              |                                            |                                            |  |  |  |
|--------------------------|------------------------------|--------------------------------------------|--------------------------------------------|--|--|--|
|                          | Xavier                       | Parker                                     | Erista (Tegra Xl)                          |  |  |  |
| CPU                      | 8x NVIDIA Custom ARM         | 2x NVIDIA Denver +<br>4x ARM Cortex-A57    | 4x ARM Cortex-A57 +<br>4x ARM Cortex-A53   |  |  |  |
| GPU                      | Volta, 512 CUDA Cores        | Pascal, 256 CUDA Cores                     | Maxwell, 256 CUDA<br>Cores                 |  |  |  |
| Memory                   | ?                            | LPDDR4, 128-bit Bus                        | LPDDR3, 64-bit Bus                         |  |  |  |
| Video<br>Processing      | 7680x4320 Encode &<br>Decode | 3840x2160p60 Decode<br>3840x2160p60 Encode | 3840x2160p60 Decode<br>3840x2160p30 Encode |  |  |  |
| Transistors              | 7B                           | ?                                          | ?                                          |  |  |  |
| Manufacturing<br>Process | TSMC 16nm FinFET+            | TSMC 16nm FinFET+                          | TSMC 20nm Planar                           |  |  |  |





### Nvidia Volta

NVIDIA TESLA V100 SPECIFICATIONS 7.5 TeraFLOPS (double) 15 TeraFLOPS (single) 120 TeraFLOPS (deep learning) 300 GB/s (NVLINK) 900 GB/s (memory bandwidth) 300 WATTS (power consumption)









V100 25

CPU Server: Dual Xeon E5-2699 v4, 2.6GHz | GPU Servers add 8X Tesia K80, Tesia P100 or Tesia V100 | V100 measured on pre-production hardware | Workload: NMT, 13 epochs to solution.

## High-performance processors (2)



## High-performance processors (2)

#### Renesas RCAR H3

ARM64 Big-Little + Cortex R + Graphics processing





## High-performance processors (3)

#### Renesas RCAR H3

EVIDENCE









- > Heterogeneous Acceleration
- > For Any Application
- > For Any Developer



#### Adaptable Architecture Connected Via NoC

#### > Scalar Engines

- >> Arm® Cortex<sup>™</sup>-A72 APU
- Arm Cortex-R5 RPU

#### > Adaptable Engines

- >> CLBs
- >> Internal Memory

#### > Intelligent Engines

- > Al Engine
- » DSP Engine

#### > Connectivity

- >> PCIe w/CCIX
- Ethernet >>
- **DDR Memory Controllers** >>
- Transceivers >>
- >> 1/0

#### > Platform Resources

- >> Network-On-Chip
- >> Platform Management Controller





#### Scalar Engines in the Arm Processing System

#### > Dual-Core Arm® Cortex™-A72 Application Processors

- >> Up to 1.7GHz for 2X single-threaded performance<sup>1</sup>
- » Cost and power optimized (half the power)
- >> Code compatibility (ARMv8-A architecture)
- >> Device boots without a bit stream

#### > Dual-Core Arm® Cortex<sup>™</sup>-R5 Real Processors

- » Up to 750MHz for 1.4X greater performance<sup>1</sup>
- » Low latency and deterministic
- » Flexible operation modes: Split-Mode and Lock-Step
- >> Highest levels of functional safety (ASIL and SIL)





**Application Processing Unit** 



Unified Tool Chain for Device Programming



© Copyright 2018 Xilinx

**E** XILINX.

### Issues when integrating multicores

- 2 categories of automotive applications
  - Real-Time applications (powertrain, body and chassis)
  - Non- real-time (infotainment)
- A one-to-one mapping exists between application (either real- or non-real-time) and processor
- With multi-cores, different applications will coexist
  - Temporal Interference





### **Temporal interference**

- The execution time of a task varies depending on the interference received from other tasks in the same chip
  - Caches, DRAM memories, scratchpad memories
  - Parallel usage of the same resource by different actors
- Temporal interference makes the partitioning of the application functionalities into cores very difficult
  - 20-30% overhead when moving to dual cores!
- PREM techniques to limit interference
  - Requires changes in the application source code





### PREM and scratchpad memories

 Example - PREM technique implemented using Scratchpad memories

| Legacy |                       |            |  |
|--------|-----------------------|------------|--|
| PREM   |                       |            |  |
|        | Cache hit/computation | Cache miss |  |

#### Predictable interval

- Memory prefetching in the first phase
- No cache misses in the execution phase
- Non-preemptive execution



#### Core-level Memory Interference Drive PX2

Sequential read, sequential interference



WSS [B]



Latency [ns]

From: Capodieci, Cavicchioli, Bertogna @ IEEE ETFA 2017



#### **Combined Interference Drive PX2**



WSS [B]



From: Capodieci, Cavicchioli, Bertogna @ IEEE ETFA 2017



#### Considering our use case

 Multicores can be exploited to consolidate vehicle functions A and B and on the same chip



#### Problems:

- How can we port software from multi-chip to multi-core?
- How multiple Operating Systems can coexist? How can we guarantee isolations of tasks?



#### Outline

- Multicore architectures
- AUTOSAR
- Virtualization
- A use case





### Automotive software architectures

- The automotive market has gone through waves of standardization
  - 90s OSEK/VDX
  - 2004-2014 AUTOSAR Classic
  - 2015- AUTOSAR Adaptive
- Main ideas
  - Standardize features
  - Decouple application from execution platform
  - Create a market of competitors
  - Lower the costs





# 90s – OSEK/VDX



- OSEK/VDX is an effort to standardize the RTOS for 8/16/32 microcontrollers
  - Static approach (configured with the OIL language)
  - Real time features
    - Fixed Priority with immediate Priority ceiling
    - Stack sharing between tasks
    - Debugger Awareness through the ORTI standard
    - Communication infrastructure (OSEK COM)
  - Single core
  - 2-6 Kb flash footprint
  - Certification procedure





# From OSEK to AUTOSAR

- OSEK allowed the description of interconnected ECUs through a communication bus
- In modern cars  $\rightarrow$  tens of interconnected ECUs
- The need becomes more the integration of features than the integration of hardware boxes



AUTOSAR gives an answer to this need, providing

A SW component model to ease the integration

A standardized basic software implementation



#### **AUTOSAR Classic Architecture**



Software

relevant

relevant

relevant

Software

Component

EVIDENCE



#### AUTOSAR Classic VFB & RTE



### AUTOSAR important concepts

- AUTOSAR Components
  - Communicate through ports
  - Independent from their placement
- AUTOSAR OS
  - Extension of OSEK/VDX
- Basic Software and MCAL
  - Driver, Diagnostics and self test, ...
- Supported by tools
  - Using a common AUTOSAR XML format





### AUTOSAR OS Main concepts

- OS Applications:
  - containers of tasks used for memory protection and multicore partitioning
- Multicore support
  - Static partitioning of tasks to cores
  - (non-FIFO) Spinlocks and remote activations
- Memory protection supporting MPU
- Timing protection and Stack Monitoring





ISO26262

# **AUTOSAR critiques**

- Limited support for non-functional specifications
  - Requirements / Execution time / ...
  - Enhancements through AMATHEA Project / EAST ADL
- Complexity
  - Creates barriers to the entry of new players
  - The market is mostly taken by 2-3 players
- Limited support for open-source ... but ...
  - ERIKA Enterprise <u>http://www.erika-enterprise.com</u>
  - COMASSO

http://www.comasso.org





# Something about ERIKA Enterprise



http://www.erika-enterprise.com

- ERIKA Enterprise is an RTOS OSEK/VDX certified
- ERIKA Enterprise implements the AUTOSAR OS API
- With a suitable open-source license allowing static linking of closed source code (GPL + Linking Exception)
- Typical footprint around 2-4KB Flash
- Used in automotive applications and research projects
- ERIKA3 supports now various automotive CPUs









# Future of AUTOSAR

Future support for ADAS, automatic driving, Car2X

- high performance, dependable systems, distributed diagnostics
- cloud integration / support for non-AUTOSAR systems
- Adaptive AUTOSAR, based on POSIX PSE51
- coexistence in the same multicore system of both AUTOSAR Classic and AUTOSAR Adaptive
  - Thanks to the virtualization support





#### Considering our use case

- Software shall be redesigned according to the AUTOSAR principle
  - Software components are responsible for implementing the needed functionalities
  - The software components are execution-platform agnostic
- AUTOSAR deployment tools are used to map software components to the cores
- Problems:
  - How can we port software from multi-chip to multi-core?
  - How multiple Operating Systems can coexist?

How can we guarantee isolations of tasks?



#### Outline

- Multicore architectures
- AUTOSAR
- Virtualization
- A use case





51

#### Virtualization

- Virtualization is a technology to abstract a hardware platform into virtual machines (VMs)
  - A VM uses a sub-set of the available hardware
  - VMs running on the same hardware are not aware that they are sharing the platform with other VMs
- Old technology
  - Introduced in the 1960 on mainframes
  - Rediscovered in 2000s for embedded systems





# **Type-1 Hypervisors**

- In embedded systems virtualization is done using type-1 hypervisors
  - Based on Microkernel → easy to validate
  - They manage isolation of VMs by intercepting all privileged instructions
  - They implement CPU scheduling
  - They supports inter-VM
  - communications





#### Considering our use case



#### Problems:

How can we port software from multi-chip to multi-core?
 How multiple Operating Systems can coexist?
 How can we guarantee isolations of tasks?

#### Considering our use case

Hypervisor segregates functions

Core x

Core 4

- Microkernel guarantees separations of hardware resources between each VM
- Scheduling is used in case time interference is critical



#### Outline

- Multicore architectures
- AUTOSAR
- Virtualization
- A use case





56

# Use Case: the HERCULES Project H2020 project – http://hercules2020.eu/

#### Main outcome

- Integrated framework to achieve predictable performance on top of cutting-edge heterogeneous COTS multi-core platforms
- Technological baseline
  - Real-time scheduling techniques
  - High-performance/Low-power embedded COTS
  - Next generation real-time applications





#### Hardware Architecture

- Starting point: the hardware architecture
  - Tegra-like platform for handling high performance computational loads with low power budgets and potentially low predictability
  - Safety microcontroller for real-time safety applications up to ASIL D





### Programming model abstractions

Support for dynamic applications using Linux



 Support for legacy real-time applications using AUTOSAR-like stacks



- Support an open-source OSEK/VDX implementation named ERIKA Enterprise, extending it to support a subset of the AUTOSAR RTE
- Pinning one OS per core to reduce overhead and complexity, and guarantee better isolation





# Integrating different subsystems

- An application will be likely composed by
  - A static part, implemented with an AUTOSAR RTE
  - A dynamic part, implemented with Linux and the GPUs
- We want to integrate them together in the same multicore CPU!
- Idea: use a Hypervisor
  - Cores assigned statically to domains not like it happens in a Cloud environment!
  - Need to share peripherals... and the GPU!
  - Initial attempt based on JailHouse





#### **HERCULES Software architecture**



# ERIKA3 + Hypervisor(s)

- We presented an integration of ERIKA3 on Tegra Parker @GTC Munich 2017
- We released a version of ERIKA3 working on top of the JailHouse Hypervisor
  - Check the new Virtual machine on the ERIKA3 website!
- We added support for ERIKA3 under Xen, with EtherCAT support





# Perf. measurements on NVIDIA Drive PX2

Nvidia Vibrante configuration:

- We considered ERIKA3 pinned on one of the Cortex A57
- Linux on the other 3 A57 cores
- Other VMs moved to Denver when possible

We are interested in the following measurements

- ISR Latency with the CPU idle or «busy» doing RTOS primitives
- AUTOSAR Task wakeup Latency
- Linux clock\_nanosleep periodic task latency
- Variability when other CPUs are executing memory intensive tasks







#### **ERIKN**<sup>3</sup> ISR Latency Timings







### **ERIKE 3 ISR/Task Latency Timings**















#### **ERIKN**<sup>3</sup> ISR Latency Timings

 Cumulative distribution of the execution times

> Degradation due to memory intensive load on other CPUs



test:erika\_linuxidle type:isr\_time interval:100uS avg:9.51 std:0.23 min:8.35 max:10.50
test:erika\_linux1meminterf type:isr\_time interval:100uS avg:9.58 std:0.20 min:8.90 max:10.85
test:erika\_linux2meminterf type:isr\_time interval:100uS avg:9.78 std:0.30 min:8.86 max:12.38
test:erika\_linux3meminterf type:isr\_time interval:100uS avg:10.08 std:0.30 min:8.77 max:11.36





#### A full AUTOSAR classic stack







### AUTOSAR COM Layer



- Library on top of Jailhouse's mechanism
- Blocking and non-blocking calls
- Dynamic-size messages
- Similar to AUTOSAR COM API:

Com\_StatusType Com\_GetStatus();

uint8 Com\_SendSignal(Com\_SignalIdType SignalId, const void \*SignalDataPtr);

uint8 Com\_ReceiveSignal(Com\_SignalIdType SignalId, void\* SignalDataPtr);

Developed by EVI in RETINA EU project





#### **Application SW architecture**







### Predictable Caches: Page Colouring

- Neighbor cores cause cache eviction on shared L2 cache, hence unpredictable memory latency (hit/miss)
- Partition cache in isolated regions with coloring support in Jailhouse!
- [Kloda et al. RTAS2019]







© Copyright 2019 Xilinx

#### **E** XILINX.

#### **Conceptual allocation of applications in many core SoC**





**E** XILINX.

>> 73 Courtesy of Giulio Corradi, Xilinx webinar 5 March 2019

© Copyright 2019 Xilinx

Do we have absence of interference?



#### > Conditions

- » CPU3 executing a 48Kbyte code every 256us as motor control task + safety loop
- >> CPU2 executing a "PLC" like 48K byte code every 8ms
- >> CPU1 executing streaming of 2 Megabytes of data
- » CPU0 executing DDR access for data logging with 256 Kilo bytes stream

#### > Measurements

- >> CPU3 executing with response time between ~400ns and ~12000ns a 30x deviation!
- >> CPU2 executing with response time between ~2000 and ~12000 ns
- Clear and significant interference!
- >> Likely to be the L2 cache again

© Copyright 2019 Xilinx



How cache works on Zynq® UltraScale+™

|      | < | 16 ways |   |   |   |   |   |   |   | > |    |    |    |    |    |    |            |
|------|---|---------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|------------|
| 0    | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | < set 0    |
| 1    | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 2    | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 3    | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 4    | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 5    | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 6    | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
|      | ~ | ~       | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~  | ~  | ~  | ~  | ~  | ~  |            |
| 1020 | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 1021 | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 1022 | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |            |
| 1023 | 0 | 1       | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | < set 1023 |

^

set index

ł

| Add                                             | ess                                   |
|-------------------------------------------------|---------------------------------------|
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 | 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| tag                                             | set index offset                      |

#### > Cache organization

- The size of Level 2 cache of the Zynq UltraScale+ is 1 Mbyte.
- It is subdivided in 16 chunks called ways.
- Each of the squares in the figure represents one cache block (= one cache line of 64 byte).
- The cache controller divides the address in three parts:
  - The offset defines a byte within a cache line.
  - With 64 byte cache line size these are the 6 LSB's (64 = 26)
  - The set index bits define in which set the cache line gets stored.
  - As there are 1024 sets, 10 address bits are used for this purpose (1024 =  $2^{10}$ ).
  - The rest of the address bits form a unique tag.





#### > Cache colouring is a software technique for cache partitioning without hardware support.

- >> Fragments the memory space into a set (colours)
- >> Colours addresses are mapped to disjoint cache partitions.
- >> Achieved dividing the whole memory space into sequential regions sizing as a way-size.

#### > For example 1G bytes of memory







Extending the configuration of a new platform on the coloured version of Jailhouse is straightforward. It suffices to define two parameters:

- Fragment size it must match the cache way size, hence 64 K byte on US+.
- Sub-colour size it must be a multiple of the page granularity, so let us assume the smallest on ARMv8-A architecture value of 4 K byte.

From this two parameters we determine the number of available colours – 16.

$$number_{of_{colours}} = \left[\frac{L2_{memorysize}}{waysize \cdot pagesize}\right] = \left[\frac{1Mbyte}{16 \cdot 4Kbyte}\right] = 16colours$$



#### Cache coloring + Jailhouse reincarnation



#### > Results

- Interference amongst Core 3 and Core 2 is eliminated
- Contiguous memory map in function of the number of color assigned to CPU
- Cache "lockdown" same size of number assigned colors
- > Predictability improved
- > Separation improved
- > Linux re-incarnated
  - >> Coloring no interference

>> 78 Courtesy of Giulio Corradi, Xilinx webinar 5 March 2019

© Copyright 2019 Xilinx



Cache coloring + Jailhouse reincarnation benchmark



© Copyright 2019 Xilinx

#### **E** XILINX.

#### Outline

- Multicore architectures
- AUTOSAR
- Virtualization
- A use case





80

#### Conclusions

- Multi-cores are the response to automotive needs, but only in conjunction with other technologies
  - AUTOSAR
  - Hypervisors
  - ...
- Time interference limits the possibility to exploit fully multi-core capabilities in real-time scenario
  - Some hardware features available in the newest devices could help (e.g., resource pre-allocation for quality of service guarantee)





#### Contacts

Paolo Gai pj@evidence.eu.com http://www.evidence.eu.com

Massimo Violante <u>massimo.violante@polito.it</u> <u>http://www.cad.polito.it</u> Marko Bertogna <u>marko.bertogna@unimore.it</u> <u>https://hipert.unimore.it/</u>

Giulio Corradi giulio.corradi@xilinx.com https://www.linkedin.com/in/giulio-corradi-860b71b/

Content mostly taken from:

Massimo Violante, Paolo Gai, Automotive embedded software architecture in the multicore age. at 21st IEEE European Test Symposium, May 23 - 27, 2016, Amsterdam, The Netherlands

Plus additional content courtesy of Marko Bertogna and Giulio Corradi





82