## **Impact Objectives**

- Study the design tool development and combinatorial optimisation algorithms
- Develop a methodology to change the way code is generated and adapted for processing applications
- Support automatic mapping of systems and software modules to hardware modules

96 05

# Intelligent collaboration for intelligent systems

Masaki Gondo, of software company eSOL, and Professor Masato Edahiro, from Nagoya University, discuss their collaboration on the eMBP (Model Based Parallelizer) project to develop a tool which automatically generates parallel codes for multi- and many-core systems





Professor Masato Edahiro Masaki Gondo

#### Can you talk a little about your own research and development background?

ME: In the 1980s, one of the largest application areas for graph theory was Electronic design automation (EDA) for semiconductors, so I got a job with a semiconductor vendor after studying in this area at graduate school. I researched EDA design tool development and combinatorial optimisation algorithm, then moved to multi-core processor research and development. I found that the most important factor in improving system performance on a multi-core processor is system and software division and mapping to the processor and this led me to return to the study of design tool development and combinatorial optimisation algorithms.

MG: I am Chief Technology Officer and Head of the Technology Headquarters division at eSOL, an independent software vendor that provides POSIX/AUTOSAR/ TRON RTOS, and various embedded software development tools and engineering services. I have worked for more than 20

years in Operating System (OS) architecture and related technologies for use in a wide range of embedded system applications including automotive, industrial and electronic appliances. For the last decade or so, the specific items I have been working on include a scalable heterogenous multi- and many-core OS, application parallelisation tools, domain-knowledgebased machine-learned driver models, Scrum development and functional safety.

#### What are some of the challenges you have experienced with this latest collaborative research project?

MG: From eSOL's perspective, one of the biggest challenges is to get system and software developer communities to see the problem of software parallelisation and to learn about emerging technologies like eMBP that are capable of solving this. Without technologies like eMCOS and eMBP, the challenge of developing and producing game-changing intelligent systems like autonomous driving will be very hard. Another challenge involves the efficient exchange of useful information between hardware and software designers. One avenue we are investing in is the open standard specification SHIM (Software-Hardware Interface for Multi-manycore), a standard XML schema used to describe hardware architecture with performance information for software tools. The idea is

that the hardware description can be written once and then used by different tools, forming an ecosystem while avoiding the need for different vendors to adapt their tools to new hardware. The result is that everyone wins - hardware vendors with a good tool ecosystem, tool vendors with lowcost and shorter time-to-market to support various hardware and system designers with a choice of options to fit their specific needs

ME: In our research, it is necessary to match various types of system and software modules with various types of hardware modules while acknowledging their respective hierarchies and creating an optimal mapping plan. We also need to estimate the performance of each system and software module on each hardware module, so we are promoting the international standardisation of SHIM.

#### What is the ultimate impact of this work?

MG: Intelligent systems like automated driving require an unprecedented level of performance. This needs to be achieved in a very power-efficient way. This requires the use of heterogeneous multi-manycore computing. Our OS technology and MBP will be essential parts of the whole equation to realise this intelligent system.

Working in parallel

Researchers in Japan have formed a partnership between academic and industry players to develop a breakthrough methodology that may change the way in which code is generated and adapted for processing applications

he pace of technology's advancements is ever-increasing and intelligent systems, such as those found in robots and vehicles, have become larger and more complex. These intelligent systems have a heterogeneous structure, comprising a mixture of modules such as artificial intelligence (AI) and powertrain control modules that facilitate large-scale numerical calculation and real-time periodic processing functions. Information technology expert Professor Masato Edahiro, from the Graduate School of Informatics at the Nagoya University in Japan, explains that concurrent advances in semiconductor research have led to the miniaturisation of semiconductors, allowing a greater number of processors to be mounted on a single chip, increasing potential processing power. 'In addition to generalpurpose processors such as CPUs, a mixture of multiple types of accelerators such as GPGPU and FPGA has evolved, producing a more complex and heterogeneous computer architecture,' he says.

Edahiro and his partners have been working on the eMBP, a model-based parallelizer (MBP) that offers a mapping system as an efficient way of automatically generating parallel code for multi- and many-core systems. This ensures that once the hardware description is written, eMBP can bridge the gap between software and hardware to ensure that not only is an efficient ecosystem achieved for hardware vendors, but the need for different software vendors to adapt code for their particular platforms is also eliminated. This benefits all major stakeholders, with tool vendors achieving lower costs and quicker development time and hardware and system designers gaining

a wider choice of options for their individual needs. 'The eMBP concerns mapping system and software function modules to hardware function modules for the purpose of operating a system in parallel and executing it efficiently,' elaborates Edahiro. 'We aim to facilitate the design of large-scale complex systems in the future by using MBP as the core and combining it with various upstream design and implementation support tools.'

The progress of this invaluable tool has been dependant on a solid collaboration between academic and industrial interests. Nagoya University has partnered with eSOL Co., Ltd., and Renesas Electronics in this project. 'Our results are effective only in association with basic software on a high-performance semiconductor with low power consumption. Therefore, our collaboration with eSOL, as the basic software vendor, and Renesas Electronics, as the semiconductor vendor, is extremely important to our research,' says Edahiro. The results achieved by Edahiro's team are being commercialised by eSOL and have so far been used in the development of energy-saving, high-performance system design, and development tools that the group hopes will benefit many users worldwide.

### THE IMPORTANCE OF PARALLELIZATION

Edahiro leads the Parallel and Distributed Systems Lab (PDSL) at Nagoya University, driving research that focuses on modelbased parallelization. Parallelization is a process that is divisible into three key areas: automatic generation; verification; and the parallelized algorithm. It is a process that allows solutions to be reached much more quickly. 'This is achieved by breaking problems down into chunks and using

B

existing resources on more than one chunk at the same time,' outlines Edahiro. 'As such, parallelization plays a critical part in the effective use of larger numbers of processors in high-performance processing.' Consideration of the key features demanded of the resulting system and software modules must be factored into the design process and mapped to the hardware modules to produce an efficient hierarchical heterogeneous system. This enables it to execute the software modules in parallel using appropriate hardware. 'The mapping is not easy to accomplish because the logical hierarchy of systems and software, and the physical hierarchy of hardware generally evolve in completely different ways,' states Edahiro.

Tackling the most efficient method of parallelizing programme code for homogenous processors, such as supercomputers, is an area in which a great deal of research has been conducted. No single method has so far been developed to address where applications include large matrix calculations as well as systems with complex hierarchical heterogeneous structures. 'Where the system structure is a factor to consider in the realisation of an efficient system, once the programme code has been implemented, subsequent restructuring becomes complicated,' Edahiro highlights. 'By contrast, MBP, which parallelizes systems using models that are abstractions of system software and hardware, facilitates such restructuring because the models hold the structures of the system, software and hardware within themselves.'

Industry expectations for MBP are increasing as they offer better solutions for heterogeneous structures. This approach allows optimal mapping to be carried out at the model level, making it possible to detect any mismatch between the hierarchical heterogeneous structure of the hardware with that of the systems and software at a much earlier stage. This means that any necessary system restructuring can be carried out at an early stage of the process, thus significantly reducing design time. The project team hopes that this will prove to become a major paradigm shift in system design methodology.

#### DELIVERING PERFORMANCE

Using parallelization can help deliver faster performance with lower energy requirements than operating on a serial processing basis. 'Power consumption is proportional to the square or the cube of the operating frequency,' Edahiro notes. 'Therefore, if the process is divided into four parts and executed on four separate 2.5 GHz processors, it can be executed in the same processing time and with lower power consumption than is necessary when it is executed on a single 10 GHz processor.' In an application such as automated driving, where both the processing and kinetic needs of the vehicle must be catered to, it is vital that both functions are able to operate energyefficiently. Masaki Gondo, Chief Technology Officer of eSOL, one of the industrial partners in this project, explains this means that benefits can be realised on a global scale by

2a.2V.\*:4 1v 0 2e 10{4 o(c T,c f){1t(T) those utilising digital information. 'Our OS 4 the development of the eMBPs, streamlining technology and MBP will be essential parts of the whole equation to realise such an intelligent system,' he confirms.

10)l+=(s.R(10)-('0)')\*10;d(s.B())=11)l+=(s.R(10))+10;d(s.B())=11)l+=(s.R(10))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;d(s.B(1))+10;

**}4** p **2x(){7** 1r}4 G 2w(p 21){x.1r=21}4

 $J \{ S \{ d(s,R(2)) = (:, '), ') = (i(1(10)) + (i(10)) +$ 

m=1u.1C(s.A(6,8));d(s.B()>8){d(s.R(8)!=\'.)

The team further hopes to showcase their work as a new design methodology and development process based on MBP and seek to share it with the world at large. 'For MBP to be accepted in relevant design fields, it is necessary that design methodology and development processes that use MBP as well as model level estimation methods that use SHIM are widely accepted,' says Edahiro. The group is also collaborating with the Multicore Association and the Institute of Electrical and Electronics Engineers (IEEE) to international standardisation of SHIM to facilitate estimation of system and module performance.

#### TOWARDS THE FUTURE

While widespread use of MBP in the supercomputing industry is yet to come, the group has fielded considerable interest in the work so far, with ongoing trials in progress in partnership with multiple manufacturers of intelligent systems. This collaboration has led to a number of additional side projects, including their efforts on developing international standards for SHIM. 'Without MBP, we may not even have come up with the idea of standardising the hardware description, and we believe the standard will be a foundation to the proliferation of this kind of technology,' Gondo says. Edahiro and his colleagues continue to drive



More detail on what the eMBP does and what this connection means is provided in the article itself.

the process of parallelization in processing technology. They hope to extend their collaboration further to include groups in other areas of the world, especially in Europe, perhaps extending the philosophy behind the processing of technology to their own research workflows. 🔵

## **Project Insights**

#### BACKGROUND

This article is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

#### **COLLABORATORS**

Embedded Multicore Consortium W: https://www.embeddedmulticore.org/

#### CONTACT

Professor Masato Edahiro

**T:** +81 52 789 2795 E: eda@ertl.jp W: https://www.pdsl.jp/

#### BIOS

Professor Masato Edahiro is a professor of the Department of Computing and Software Systems, the Graduate School of Informatics, Nagoya University, where he focuses on combinatorial optimisation in graph and network algorithms and software and development tools for multiand many-core processors.

Masaki Gondo is CTO at eSOL Co., Ltd. He also acts as an architect of AUTOSAR Adaptive Platform specification, IEEE C/DA/SHIM WG Chair, Multicore Association SHIM Working Group chair, Vicechair of Embedded Multicore Consortium, Chief Architect for AUBASS and a visiting research fellow at Advanced Multicore Processor Research Institute at Waseda University, among other roles and positions.



