Publications http://www.alari.ch/Research/Publications/bebop Publication list of ALaRI Institute, University of Lugano en-us Virtual Metering for Virtual PHEV Aggregation http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=159.LuVuKaEr12.MELECON Technically sustainable solutions for integration of (PH)EVs in Smart Grid emerge as an important concern. We discuss the need for introduction of Virtual Aggregations supported by implementation of Virtual Meters in power system structures. We advocate our proposal with an evaluation of scenarios based on realistic data. The structure and functionalities of the Virtual Aggregator, as well as proposed enhancements on the Smart Grid side, are presented. Using Multi-objective Design Space Exploration to Enable Run-time Resource Management for Reconfigurable Architectures http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=158.mariani2012date Evaluating Run-time Resource Management Policies for Multi-core Embedded Platforms with the EMME Evaluation Framework http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=157.mariani2012parma Adaptivity Support for MPSoCs based on Process Migration in Polyhedral Process Networks http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=155.CaDeMeTuSt12.VLSI System adaptivity is becoming an important feature of modern embedded multiprocessor systems. To achieve the goal of system adaptivity when executing Polyhedral Process Networks (PPNs) on a generic tiled Network-on-Chip (NoC) MPSoC platform, we propose an approach to enable the run-time migration of processes among the available platform resources. In our approach, process migration is allowed by a middleware layer which comprises two main components. The first component concerns the inter-tile data communication between processes. We develop and evaluate a number of different communication approaches which implement the semantics of the PPN model of computation on a generic NoC platform. The presented communication approaches do not depend on the mapping of processes, and have been implemented on a Network-on-Chip multiprocessor platform prototyped on an FPGA. Their comparison in terms of the introduced overhead is presented in two case studies with different communication characteristics. The second middleware component allows the actual run-time migration of PPN processes. To this end, we propose and evaluate a process migration mechanism which leverages the PPN model of computation to guarantee a predictable and efficient migration procedure. The efficiency and applicability of the proposed migration mechanism is shown in a real-life case study. OSCAR: an Optimization Methodology Exploiting Spatial Correlation in Multi-core Design Space http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=156.MaPaSiZa12.TCAD This paper presents OSCAR, an Optimization methodology exploiting Spatial CorrelAtion of multi-coRe design space. The paper builds upon the observation that power consumption and performance metrics of spatially close design configurations (or points) are statistically correlated. We propose to exploit the correlation by using a Response Surface Model (RSM), i.e., a closed-form expression suitable for predicting the quality of non-simulated design points. This model is useful during the design space exploration (DSE) phase to quickly converge to the Pareto set of the multi-objective problem without executing lengthy simulations. We compare the proposed heuristic with state-of-the-art approaches (conventional, RSM-based and structured DOEs). Experimental results show that OSCAR is a faster heuristic with respect to state of the art techniques such as Response-Surface Pareto Iterative Refinement - ReSPIR and Nondominated Sorting Genetic Algorithm - NSGA-II. Reported results also show that OSCAR can significantly improve structured DOE approaches by slightly increasing the number of experiments. Energy-Throughput Simulation Approach for Heterogeneous LTE scenarios http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=154.BaChDe11.ISWCS In order to increase overall LTE system performance femtocells have been proposed as a user-based solution promising to give much better service to the user specially indoor. Their deployment should improve the total system capacity noticeably and decrease drastically the power consumption. On the other hand these small indoor cells make the network planning strategies much more complex given the uncertainty of their position and their load; femtocells are after all managed by the users. Goal of this work is to provide a simulation approach to determine the effects of heterogeneous cell deployment on the performance of an LTE network. The simulation framework allows to realistically compare the power consumption and throughput of the overall system. The key components are the combination of indoor and outdoor propagation modeling, and the diversification of femto, micro and macro-cell energy consumption models. The model further contains complex city-like building structures, multiple communication layers (eNodeBs and femtocells) distributed over a three-dimensional map and numerous users moving across different areas while adapting their service requirements. The simulation approach results in relatively computationally inexpensive simulations and allows to model the expected throughput and energy consumption for various heterogeneous LTE scenarios. Middleware Approaches for Adaptivity of Kahn Process Networks on Networks-on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=153.CaDeSt11.DASIP We investigate and propose a number of different middleware approaches, namely virtual connector, virtual connector with variable rate, and request-based, which implement the semantics of Kahn Process Networks on Network-on-Chip architectures. All of the presented solutions allow for run-time system adaptivity. We implement the approaches on a Network-on-Chip multiprocessor platform prototyped on an FPGA. Their comparison in terms of the introduced overhead is presented on two case studies with different communication characteristics. We found out that the virtual connector mechanism outperforms other approaches in the communication-intensive application. In the other case study, which has a higher computation/communication ratio, the middleware approaches show similar performance. Beamforming for interference mitigation and its implementation on an SDR baseband processor http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=152.KrLeAhPoLi11.SiPS We present the first implementation of a distributed beamforming algorithm for interference mitigation on an SDR baseband processor. Co-channel interference (CCI) is becoming a major source of impairments in wireless communications and distributed beamforming is a promising technique to mitigate its negative impact. However, such schemes are challenging to implement in practical scenarios due to their complexity and synchronization requirements. In this paper, we report on implementation of a suboptimal, yet efficient, beamforming scheme for CCI mitigation and present the complexity modeling and algorithm transformations for achieving numerically stability. We also present the fixed-point quantization and the proper mapping on a parallel programmable baseband architecture aimed for software-defined radio (SDR). We optimize this algorithm for a coarse grained reconfigurable array (CGRA) processor and evaluate it in the context of the LTE standard. Design of Fault Tolerant Network Interfaces for NoCs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=151.FiMiSa11 Adoption of Model-Driven methodology to aggregations design in Power Grid http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=150.KaLu11 Economical and environmental concerns push toward novel solutions for sustainable, renewable and intelligent energy power grid, the Smart Grid. Very often, this includes aggregation of renewable resources and intelligent loads such as electrical vehicles. Such complex system involve a number of various stakeholders coming from different areas of expertise. Even so, on-going projects do not apply unique formal language. In order to better correlate the projects, model-driven methodology and SysML are proposed for system design. ARTE: an Application-specific Run-Time Management Framework for Multi-core Systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=149.MaPaSiZa.SASP11 Programmable multi-core and many-core platforms increase exponentially the challenge of task mapping and scheduling, provided that enough task-parallelism does exist for each application. This problem worsens when dealing with small ecosystems such as embedded systems-on-chip. In fact, in this case, the assumption of exploiting a traditional operating system is out of context given the memory available to satisfy the run-time footprint of such a configuration. An efficient Run-time Resource Management (RRM) becomes of paramount importance to dispatch tasks to the cores by taking into account the task-parallelization options that each application provides. State-of-the-art approaches to RRM try to allocate re- sources to maximize the instantaneous throughput while meeting a power budget constraint. In this paper, we will show that queuing theory can be an alternative yet effective way of solving resource allocation by presenting ARTE, an Application-specific Run-Time managEment framework. The framework exploits few assumptions about the target many-core computing fabric such as the availability of performance (throughput) information about the platform applications. We will show that this information can be combined, at run-time, with queuing models to enhance the response time of the applications by pounding the actual effect on the system power consumption better than previous approaches. Experimental results show that, compared to reference state-of-the-art RRM techniques, ARTE is able to efficiently improve system performance by pro-actively reducing the response time while meeting the same power consumption requirements. Besides, we will show that the run-time overhead of ARTE does not signicantly impact neither the system performance nor the on-chip-memory occupation. Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=148.DeKaFi11.NOCS As CMOS technology scales down into the deep submicron domain, the aspects of fault tolerance in complex Networks-on-Chip (NoCs) architectures are assuming an increasing relevance. Task remapping is a software based solution for dealing with permanent failures in processing elements in the NoC. In this work, we formulate the optimal task mapping problem for mesh-based NoC multiprocessors with deterministic routing as an integer linear programming (ILP) problem with the objective of minimizing the communication traffic in the system and the total execution time of the application. We find the optimal mappings at design time for all scenarios where single-faults occur in the processing nodes. We propose heuristics for the online task remapping problem and compare their performances with the optimal solutions. System Policies for Gradual Tuning of Security and Workload in Wireless Sensor Networks http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=147.TaMoFe11 In wireless sensor networks (WSN) energy consumption is a key issue. Security of communications, with its demand of computational resources, as well as performances are other fundamental issues. Finding a trade-off between performance and energy consumption, yet providing an adequate level of security is very challenging. Traditional solutions for the aforementioned problem assume that the operative environment is well-known and static, thus limiting the flexibility of the system. In this paper, instead, we propose a self-adaptation mechanism for gradual adaption of security and system workload in WSNs. The adaptation process can be tuned by using specific policies both for controlling the running tasks and for customizing the behavior of the self-adaptation mechanism. The ultimate goal is to perform adaptations by maximizing system performances while satisfying power constraints. A case study, implemented on Sun SPOTs, is also presented to show how the self-adaptation mechanism works in a real sensor node. A Middleware Approach to Achieving Fault-tolerance of Kahn Process Networks on Networks-on-Chips http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=146.DeDiFi11.IJRC Kahn process networks (KPN) is a distributed model of computation used for describing systems where streams of data are transformed by processes executing in sequence or parallel. Autonomous processes communicate through unbounded FIFO channels in absence of a global scheduler. In this work, we propose a task-aware middleware concept that allows adaptivity in KPN implemented over a Network-on-Chip (NoC). We also list our ideas on the development of a simulation platform as an initial step towards creating fault-tolerance strategies for KPNs applications running on NoCs. In doing that, we extend our SACRE (Self-adaptive Component Run-time Environment) framework by integrating it with an open source NoC simulator, Noxim. We evaluate the overhead that the middleware brings to the the total execution time and to the total amount of data transferred in the NoC. With this work, we also provide a methodology that can help in identifying the requirements and implementing fault tolerance and adaptivity support on real platforms. MULTICUBE: Multi-Objective Design Space Exploration of Multi-Core Architectures http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=139.Sietal2.2011 Given the increasing complexity of Chip Multi-Processors (CMPs), a wide range of architecture parameters must be explored at design time to find the best trade-off in terms of multiple competing objectives (such as energy, delay, bandwidth, area, etc.) The design space of the target architectures is huge because it should consider all possible combinations of each hardware parameter (e.g., number of processors, processor issue width, L1 and L2 cache sizes, etc.). In this complex scenario, intuition and past experience of design architects is no more a sufficient condition to converge to an optimal design of the system. Indeed, Automatic Design Space Exploration (DSE) is needed to systematically support the analysis and quantitative comparison of a large amount of design alternatives in terms of multiple competing objectives (by means of Pareto analysis). The main goal of the MULTICUBE project consists of the definition of an automatic Design Space Exploration framework to support the design of next generation many-core architectures. Design Space Exploration of Parallel Architectures http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=140.KaTuPaSiZaMaBoDo.2011 This chapter will present two significant applications of the MULTICUBE design space exploration framework. The first part will present the design space exploration of a low power processor developed by STMicroelectronics by using the modeFRONTIER tool to demonstrate the benefits DSE not only in terms of objective quality, but also in terms of impact on the design process within the corporate environment. The second part will describe the application of RSM models developed within MULTICUBE to a tiled, multiple-instruction, many-core architecture developed by ICT China. Overall, the results have showed that different models can present a trade-off of accuracy versus computational effort. In fact, throughout the evaluation, we observed that high accuracy models require high computational time (for both model construction time and prediction time); vice-versa low model construction and prediction time has led to low accuracy. Response Surface Modeling for Embedded System Design Space Exploration http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=142.PaSiZaRiKaTuMa.2011 A typical design space exploration flow involves an event-based simulator in the loop, often leading to an actual evaluation time that can exceed practical limits for realistic applications. Chip multi-processor architectures further exacerbate this problem given that the actual simulation speed decreases by increasing the number of cores of the chip. Traditional design space exploration lacks of efficient techniques that reduce the number of architectural alternatives to be analyzed. In this chapter, we introduce a set of statistical and machine learning techniques that can be used to predict system level metrics by using closed-form analytical expressions instead of lengthy simulations; the latter are called Response Surface Models (RSM). The principle of RSM is to exploit a set of simulations generated by one or more Design of Experiments strategies to build a surrogate model to predict the system-level metrics. The response model has the same input and output features of the original simulation based model but offers significant speed-up by leveraging analytical, closed-form functions which are tuned during model training. The techniques presented in this chapter can be used to improve the performance of traditional design space exploration algorithms such as those presented in Chap. 3. Design Space Exploration of a Reconfigurable System for Supporting Video Streaming Run-time Management http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=138.MaAvYkVaPaSiZa.2011 This paper reports a case study of Design Space Exploration for supporting Run-time Resource Management (RRM). In particular the management of system resources for an MPSoC dedicated to multiple MPEG4 encoding is addressed in the context of an Automotive Cognitive Safety System (ACSS). The runtime management problem is defined as the minimization of the platform power consumption under resource and Quality of Service (QoS) constraints. The paper provides an insight of both, design-time and run-time aspects of the problem. During the prelimiary design-time Design Space Exploration (DSE) phase, the best configurations of run-time tunable parameters are statically identified for providing the best trade-offs in terms of run-time costs and application QoS. To speed up the optimization process without reducing the quality of final results, a multi-simulator framework is used for modeling platform performance. At run-time, the RRM exploits the design-time DSE results for deciding an operating configuration to be loaded for each MPEG4 encoder. This operation is carried out dynamically, by following the QoS requirements of the specific use-case. Design Space Exploration Supporting Run-time Resource Management http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=144.AvYkVaMaPaSiZa.2011 Running multiple applications optimally in terms of Quality of Service (e.g., performance and power consumption) on embedded multi-core platforms is a huge challenge.Moreover, current applications exhibit unpredictable changes of the environment and workload conditions which makes the task of running them optimally even more difficult. This dynamic trend in application runs will grow even more in future applications. This paper presents an automated tool flow which tackles this challenge by a two-step approach: first at design-time, a Design Space Exploration (DSE) tool is coupled with a platform simulator(s) to get optimum operating points for the set of target applications. Secondly, at run-time, a lightweight Run-time Resource Manager (RRM) leverages the design-time DSE results for deciding an operating configuration to be loaded at run-time for each application. This decision is performed dynamically, by taking into consideration available platform resources and the QoS requirements of the specific use-case. To keep RRM execution and resource overhead at minimum, a very fast optimisation heuristic is integrated. Application of this tool-flow on a real-life multimedia use case (described in Chapter 9 of the book of this paper) will demonstrate a significant speedup in optimisation process while maintaining desired Quality of Service. AETHER: Self-Adaptive Networked Entities: Autonomous Computing Elements for Future Pervasive Applications and Technologies http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=141.aetherinbook.2011 The AETHER project has laid the foundation of a complete new framework for designing and programming computing resources that live in changing environments and need to re-configure their objectives in a dynamic way. This chapter contributes to a strategic research agenda in the field of self-adaptive computing systems. It brings inputs to the reconfigurable hardware community and proposes directions to go for reconfigurable hardware and research on self-adaptive computing; it tries to identify some of the most promising future technologies for reconfiguration, while pointing out the main foreseen Challenges for reconfigurable hardware. This chapter presents the main solutions the AETHER project proposed for some of the major concerns in trying to engineer a self-adaptive computing system. The text exposes the AETHER vision of self-adaptation and its requirements. It describes and discusses the proposed solutions for tackling self-adaptivity at the various levels of abstractions. It exposes how the developed technologies could be put together in a real methodology and how self-adaptation could then be used in potential applications. Finally and based on lessons learned from AETHER, we discuss open issues and research opportunities and put those in perspective along other investigations and roadmaps. Linking run-time resource management of embedded multi-core platforms with automated design-time exploration http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=145.YkAvMaZaPaSi11 Nowadays, owing to unpredictable changes of the environment and workload variation, optimally running multiple applications in terms of quality, performance and power consumption on embedded multi-core platforms is a huge challenge. A lightweight run-time manager, linked with an automated design-time exploration and incorporated in the host processor of the platform, is required to dynamically and efficiently configure the applications according to the available platform resources (e.g. processing elements, memories, communication bandwidth), for minimising the cost (e.g. power consumption), while satisfying the constraints (e.g. deadlines). This study presents a flow linking a design-time design space explorer, coupled with platform simulators at two abstraction levels, with a fast and lightweight priority-based heuristic integrated in the run-time manager to select near-optimal application configurations. To illustrate its feasibility and the very low complexity of the run-time selection, the proposed flow is used to manage the processors and clock frequencies of a multiple-stream MPEG4 encoder chip dedicated to automotive cognitive safety applications. Optimization Algorithms for Embedded System Design Space Exploration http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=143.RiKaTuPaSiZaMa.2011 This paper is dedicated to the optimization algorithms developed in the MULTICUBE project and to their surrounding environment. Two software design space exploration (DSE) tools host the algorithms: Multicube Explorer and mode-FRONTIER. The description of the proposed algorithms is the central part of the paper. The focus will be on newly developed algorithms and on ad-hoc extensions of existing techniques in order to face with discrete and categorical design space parameters that are very common when working with embedded systems design. This paper will also provide some fundamental guidelines to build a strategy for testing the performance and accuracy of such algorithms. The aim is mainly to build confidence in optimization techniques, rather than to simply compare one algorithm versus another one. The no-free-lunch theorem for optimization has to be taken into consideration and therefore the analysis will look forward to robustness and industrial reliability of the results. Progettazione e valutazione di soluzioni wireless multi-hop per il monitoraggio ambientale http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=160.MuSaLuMa.2011 The creation of a sensors network for environmental monitoring, taking into account the functional and non-functional requirements, poses a series of problems that must be dealt with during the design phase. The main difficulties are related to the power of the nodes and their location so that the resulting network topology minimizes the overall energy consumption while guaranteeing the desired measurement accuracy. The adoption of a 'wireless' communication model allows for greater flexibility during installation and allows creating remote connections more easily than the traditional wired pattern. Extension of the network topology by add ing new devices in the monitoring area or movement of devices already deployed are greatly simplified. But the requirements in terms of fault-tolerance and power consumption of a wireless network are in general more difficult to meet. In this chapter we propose two different solutions that improve performance in terms of power consumption of the main standard for communication in wireless sensor networks field (e.g. ZigBee) customizing it for monitoring applications in an open environment on geographical areas of several hectares. While the standard is intended to be as general as possible, optimizations have been included considering the special needs of our monitoring applications, in terms of number of nodes, topology density, nodes duty cycle and data-load. The first solution deals with the management of multi-hop communication and allows the use of devices that can be powered by batteries (and possibly small solar panels) for the relaying nodes. The second solution optimizes the management of faults (transient or permanent) in the network topology. It is rarely possible to develop and evaluate proposed solutions in the field prior to actual deployment, therefore simulation is an essential step in developing solutions for these applications. The simulation must be accurate and must provide an analysis of all issues related to communication and the behavioural dynamics of the single node in the network structure. For this reason the evaluation has been carried out by means of a modelling methodology developed expressly for wireless sensor networks. A Smart Metering Architecture as a step towards Smart Grid realization http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=137.VuLuErKu10.3 Emerging concept of Smart Grids aims at increasing visibility and controllability of electricity grids boosting their operational efficiency, enabling novel enhanced services to customers and utilities at a same time. Successful realization of this concept will in great part depend on efficient management of tremendous amounts of data to be gathered and processed in very short time periods. In this work we propose a novel smart metering architecture to manage data collected from deployed smart meters logically encapsulated in form of virtual meters. The metering infrastructure is structured in the form of Advanced Metering Infrastructure (AMI). The architecture of Meter Data Management (MDM) system as well and its integration in Control Center structure of power system is described in details. The testing and verification of proposed solution is performed on data from power distribution company Vattenfall, Sweden. A Monitoring System for NoCs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=136.FiPaSi10 In this paper, we propose and discuss a monitoring architecture for Networks-on-Chip (NoCs) that provides system information useful for helping designers in efficiently exploiting resources available in new complex Multiprocessor System-on-Chip (MPSoC) platforms, and in understanding their behavior. We focus on the analysis of the architectural details and design challenges of such systems, by describing power- ful tools for detecting information that can be used both at run-time for detecting dynamic changes in system behavior and at post-execution time for debugging and profiling of applications. We detail the design of the probes monitoring the events and discuss an architecture for collection, storage, and analysis of information generated by them. We evaluate cost of the implementation of the system in terms of area and traffic overhead, and we present results obtained when monitoring a use-case multimedia application. An enhanced workflow management for Utility Management System http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=135.VuLuErKu10.2 The emerging computational grid infrastructure consists of widely distributed heterogeneous resources, which makes mapping of increasingly complex applications a very challenging task. Utility Management Systems (UMS) manage very large number of workflows with very high resource requirements and thereby optimization of resource utilization has to be adapted. In this work we propose architecture that implements a novel concept for dynamical execution of a scheduling algorithm using near real-time feedback from the execution monitoring process. An Artificial Neural Network (ANN) was trained for workflow scheduling. In the case study, we first perform experiments with same number of workflows and then introduce two additional in the system observing its behavior with and without proposed improvements. Performance tests show that significant improvements of overall execution time can be achieved by introducing adaptive Artificial Neural Network. Enhancing Network-on-Chip Components to Support Security of Processing Elements http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=133.LuNi10.2 Network-on-Chip (NoC) has emerged as a promising solution for scalable communication among steadily growing number of cores integrated in MultiProcessor System-on-Chips (MPSoCs). The increasing system heterogeneity together with the possibility of recon guration makes the overall system security one of the major concerns in MPSoC design. On the other hand, modular and scalable design of NoCs enables their enhancements in various directions for supporting services other than simple data routing. In this work we propose and implement a solution to secure attached processing units from a bu er over ow type of the attacks that comes in a form of a protection module that is embedded into the Network Interface of the NoC. At the same time, our solution prevents potential propagation of the attack through the NoC towards other processors. We prove feasibility via prototype realization in FPGA technology for a MicroBlaze processor on Xilinx Virtex-II Pro board. Hardware-assisted Security Enhanced Linux in Embedded Systems: a Proposal http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=134.FiFePaCa10 As computing and communications increasingly pervade our lives, security and protection of sensitive data and systems are emerging as extremely important issues. This is especially true for embedded systems, often operating in non-secure environments, and with limited amount of computational, storage, and communication resources available. In servers and desktop systems, Security Enhanced Linux (SELinux) is currently used as a method to enhance security by enforcing a security control based on policies that confine user programs, or processes, to the minimum amount of privileges that they require for their execution. While providing a powerful mean for enhancing security in UNIX-like systems, SELinux still remains a feature that is too heavy to be fully supported by constrained devices. In this paper, we propose a hardware architecture for enhancing security and accelerating retrieval and applications of SELinux policies in embedded processors. We describe the general ideas be hind our work, discussing motivations, advantages, and limits of the solution proposed, while suggesting the main steps needed to implement the described architecture on common embedded processors. Hierarchical Multi-Agent Protection System for NoC based MPSoCs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=132.LuNi10 Network-on-Chip (NoC) has emerged as a promising solution for scalable communication among steadily growing number of cores integrated in MultiProcessor System-on-Chips (MPSoCs). The increasing system heterogeneity together with the possibility of reconfiguration makes the overall system security one of the major concerns in MPSoC design. On the other hand, modular and scalable design of NoCs enables their enhancements in various directions for supporting services other than simple data routing. In this work we propose a conceptual solution to secure NoC based MPSoCs at different levels of design. The basic idea is to integrate various kinds of security approaches from attack specific protection strategies up to system level security. The concept aims at securing single cores but also, at the same time, prevents potential propagation of the attack through the NoC towards. We prove feasibility via prototype realization in FPGA technology. A solution for CIM based integraton of Meter Data Management in Control Center of a power system http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=131.VuLuErKu10 Modern power systems, in particular Control Center structures, involve more and more software applications in their normal operation. Such scenario urges for standardization of inter and intra processes communication and data exchange. In this work we propose a solution for seamless Meter Data Management (MDM) integration with Control Center structures through Common Information Model (CIM). The solution is implemented in form of a wrapper that adopts messages (i.e. payloads) to the standard requested form. The proposed solution has been verified using a simulation framework which emulates regular control and data. Functional model of Virtual Power Plant (VPP) http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=130.LuKaMuBoKuPo10 Multicube: Multi-objective design space exploration of multi-core architectures http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=129.Sietal.ISVLSI11 Technology trends enable the integration of many processor cores in a System-on-Chip (SoC). In these complex architectures, several architectural parameters can be tuned to find the best trade-off in terms of multiple metrics such as energy and delay. The main goal of the MULTICUBE project consists of the definition of an automatic Design Space Exploration framework to support the design of next generation many-core architectures. QoS and Security in Energy-harvesting Wireless Sensor Networks http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=128.Taddeo2010c Wireless sensor networks are composed of small nodes that might be used for a variety of purposes. Nodes communicate together through a wireless connection that might be subject to different attacks when the network is placed in hostile environments. Furthermore, the nodes are usually equipped with very small batteries providing limited battery life, therefore limited power consumption is of utmost importance for nodes. This is in clear opposition with the requirement of providing security to communications as security might be very expensive from the power consumption stand point. Energy harvesting methods can be used to recharge batteries, but, in most of the cases the recharge profile cannot be known in advance. Therefore, nodes might face periods of time in which no recharge is available and the battery level is low. In this paper we introduce an optimization mechanism that allows the system to change the communication security settings at runtime with the goal of improving node lifetime, yet providing a suitable security level. The optimization mechanism further improves energy consumption by putting in place a quality of service mechanism: when energy is scarce, the system tends to send only essential packets. As shown by the simulations presented in this paper, this mechanism optimizes the energy consumption among different recharges. A Correlation-based Design Space Exploration Methodology for Multi-Processor Systems-on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=126.MaPaZaBrJoSi11 Given the increasing complexity of multi-processor systems-on-chip, a wide range of parameters must be tuned to find the best trade-offs in terms of the selected system figures of merit (such as energy, delay and area). This optimization phase is called Design Space Exploration (DSE) consisting of a Multi-Objective Optimization (MOO) problem. In this paper, we propose an iterative design space exploration methodology exploiting the statistical properties of known system configurations to infer, by means of a correlation-based analysis, the next design points to be analyzed with low-level simulations. In fact, the knowledge of few design points is used to predict the expected improvement of unknown configurations. We show that the correlation of the configurations within the multi-processor design space can be modeled successfully with analytical functions and, thus, speed up the overall exploration phase. This makes the proposed methodology a model-assisted heuristic that, for the first time, exploits the correlation about architectural configurations to converge to the solution of the multi-objective problem. Gradual Adaptation of Security for Sensor Networks http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=127.TaMiFe10 Wireless sensor networks are composed by nodes with stringent constraints on resources. In particular, a very limited power consumption is often a key factor for this kind of devices. In this paper we describe a method for security self-adaptation tailed for wireless sensor networks. This method allows devices to adapt security of applications gradually with the goal of guaranteeing the maximum possible level of security while satisfying system constraints. A case study is also presented to show how the method works in a real wireless sensor network. A Task-aware Middleware for Fault-tolerance and Adaptivity of Kahn Process Networks on Network-on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=125.DeDi10 We propose a task-aware middleware concept and provide details for its implementation on Network-on-Chip (NoC). We also list our ideas on the development of a simulation platform as an initial step towards creating fault-tolerance strategies for Kahn Process Networks (KPN) applications running on NoCs. In doing that, we extend our SACRE (Self-adaptive Component Run-time Environment) framework by integrating it with an open source NoC simulator, Noxim. We also hope that this work may help in identifying the requirements and implementing fault tolerance and adaptivity support on real platforms. Software architecture for Smart Metering systems with Virtual Power Plant http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=122.VuErKuLu10 This paper presents a novel architecture for Smart Metering systems which enables their seamless, secure and efficient integration in wider SmartGrid software structures. Smart metering solutions represent one of the fastest evolving areas in the field of power distribution systems. There is an extensive interest of leading software vendors in the field, for development of architectures that can efficiently manage transmission, processing and storing of tremendous amount of data produced by such metering devices deployed at the end-end side. The integration of these systems into existing power system software architectures (outage management, workforce management, etc.) represents a major challenge for research community. In such an environment it is extremely important to adopt standardized data exchange mechanisms. The proposed architecture is conceived as modular and scalable structure so that it can help implementation of novel power distribution concepts such as Virtual Power Plants (VPPs). The proposed architecture has been successfully tested and verified in real-life operation as one of modules of Smart Metering system named Meter Data Management (MDM) developed by Telvent DMS Llc, Serbia. Adopting system engineering methodology to Virtual Power Systems design flow http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=124.LuKaBo10 The concept of Virtual Power System (VPS) emerges as a promising response for increased concerns on secure, sustainable and at the same time 'clean' energy supply requests. This novel concept aims at boosting operational efficiency of Distributed Energy Resources (DER) but also at establishing them as an autonomous commercial actor on the open energy market. Nevertheless, VPSs are fairly complex HW/SW systems that require holistic multidisciplinary approach and also novel specification, modeling and analysis instruments to facilitate mutual understanding among stakeholders from different fields. We introduce UML/SysML based modeling methodology to describe such power system related issues aiming at providing an unified modeling instrument applicable for VPSs design flow. In the proposed system engineering methodology, system representation starts from a very general context description and gets refined through different levels of abstraction down to concrete embedded systems employed to perform defined tasks. Scheduling energy consumption with local renewable micro-generation and dynamic electricity prices http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=121.DeFe10 The electricity market is going through a deep modification as it is moving toward the integration of smart grids. Future homes will include smarter electric devices that will be easily managed from the power consumption stand point. The capability of performing short-term negotiation of energy purchases, if introduced and if efficiently exploited, will give the user the ability to reduce energy costs. In this paper we discuss a scheduling problem for household tasks that will help users save money spent on their energy consumption. Our system model relies on electricity price signals, availability of locally-generated power and flexible tasks with deadlines. A case study shows that cost savings are possible but fast and efficient solutions to the scheduling problem are needed for their real world use. Stack Protection Unit as a step towards securing MPSoCs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=123.LuPeFi10 Reconfigurable technologies are getting popular as an instrument for not only verification and prototyping but also commercial implementation of Multi-Processor Systems-on-Chip (MPSoC) architectures. At the same time, these systems in particular Networks-on-Chip (NoCs) based ones, have emerged as a design strategy to cope with increased requirements and complexity of modern applications. Nevertheless, increasing heterogeneity coupled with possibility of reconfiguration makes security become one of major concerns in MPSoC design. Protection strategies must consider attack scenarios at both levels - individual cores and system level security. This work represents an element in a wider security framework, it shows a solution against one of the most widespread types of attacks - code injection. Our response to tackle this challenge is given in form of Stack Protection Unit (SPU) embedded into processing core. MicroBlaze soft-core processor serves as a case study for verification of the proposed solution in FPGA technology. An industrial design space exploration framework for supporting run-time resource management on multi-core systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=117.MaAvVaYkPaSiZa10 Current multi-core design methodologies are facing increasing unpredictability in terms of quality due to the actual diversity of the workloads that characterize the deployment scenario. To this end, these systems expose a set of dynamic parameters which can be tuned at run-time to achieve a specified Quality of Service (QoS) in terms of performance. A run-time manager operating system module is in charge of matching the specified QoS with the available platform resources by manipulating the overall degree of task-level parallelism of each application as well as the frequency of operation of each of the system cores. In this paper, we introduce a design space exploration framework for enabling and supporting enhanced resource management through software re-configuration on an industrial multicore platform. From one side, the framework operates at design time to identify a set of promising operating points which represent the optimal trade-off in terms of the target power consumption and performance. The operating points are used after the system has been deployed to support an enhanced resource management policy. This is done by a light-weight resource management layer which filters and selects the optimal parallelism of each application and operating frequency of each core to achieve the QoS constraints imposed by the external world and/or the user. We show how the proposed design-time and run-time techniques can be used to optimally manage the resources of a multiple-stream MPEG4 encoding chip dedicated to automotive cognitive safety tasks. Trace-based KPN Composability Analysis for Mapping Simultaneous Applications to MPSoC Platforms http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=118.CaVeStShCeLeAsMe10 Nowadays, most embedded devices need to support multiple applications running concurrently. In contrast to desktop computing, very often the set of applications is known at design time and the designer needs to assure that critical applications meet their constraints in every possible use-case. In order to do this, all possible use-cases, i.e. subset of applications running simultaneously, have to be verified thoroughly. An approach to reduce the verification effort, is to perform composability analysis which has been studied for sets of applications modeled as Synchronous Dataflow Graphs. In this paper we introduce a framework that supports a more general parallel programming model based on the Kahn Process Networks Model of Computation and integrates a complete MPSoC programming environment that includes: compilercentric analysis, performance estimation, simulation as well as mapping and scheduling of multiple applications. In our solution, composability analysis is performed on parallel traces obtained by instrumenting the application code. A case study performed on three typical embedded applications, JPEG, GSM and MPEG-2, proved the applicability of our approach. Using MARTE for Designing power Supply Section of WSNs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=120.ArMuPr10 Probably the biggest issue while tackling Wireless Sensor Networks design has always been providing them with adequate power supplies. Energy Harvesting was proposed as an essential feature for Wireless Sensor Networks (WSN)s in many application fields when the amount of energy contained in a commercial battery does not allow fulfilling the required mission. Solar energy is the most widespread mechanism used to harvest energy of the environment because of its good power density. However it introduces a level of uncertainty on the amount of energy available in the system. In this paper we propose a high level methodology for designing the power supply section of sensor nodes. In particular we suggest how to use MARTE UML design language in order to collect requirements for the application and transform them into specifications of the power supply system. The framework we propose aims at validating the design by simulating appropriate scenarios. Linking run-time management with design space exploration at multiple abstraction levels http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=119.AvVaYkMaPaZaSi10 In present era of Multi-Processor System-on-Chip (MPSoC) embedded devices, to run multiple applications optimally (in terms of execution time and power consumption) is an enormous challenge. Embedded designers usually tackle this challenge by dividing it in two parts : at design-time Design Space Explorations (DSE) are performed to derive Pareto set of optimum operating points for each application and at run-time embedded device is monitored continuously to operate at one of the points in the derived Pareto set. Obviously run-time management relies heavily on accuracy of DSE. With growing complexity of embedded devices and with time-to-market pressures, at design-time, it is not trivial to derive the operating point Pareto set. On the other hand, at run-time, overhead introduced by a run-time management scheme should also not be high so as to minimally affect embedded device performance . We have developed techniques to tackle these embedded design issues. At design time, we use DSE with multiple simulators running at multiple abstraction levels to converge quickly to Pareto set of operating points. At runtime, to keep run-time overhead to a minimum, a hierarchical Runtime Resource Manager (RRM) is used with well-defined interfaces (services) between global and local resource managers. We applied our methodology on an embedded device having eight processor cores running multiple MPEG4 encoders. With our DSE methodology, we could derive Pareto set much quickly (as compared to full-space explorations). With our run-time schemes, overhead introduced by run-time manager was negligible. A Reconfigurable Multiprocessor Architecture for a Reliable Face Recognition Implementation http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=116.TuRePaFeSc10 Face Recognition techniques are solutions used to quickly screen a huge number of persons without being intrusive in open environments or to substitute id cards in companies or research institutes. There are several reasons that require to systems implementing these techniques to be reliable. This paper presents the design of a reliable face recognition system implemented on Field Programmable Gate Array (FPGA). The proposed implementation uses the concepts of multiprocessor architecture, parallel software and dynamic reconfiguration to satisfy the requirement of a reliable system. The target multiprocessor architecture is extended to support the dynamic reconfiguration of the processing unit to provide reliability to processors fault. The experimental results show that, due to the multiprocessor architecture, the parallel face recognition algorithm can achieve a speed up of 63% with respect to the sequential version. Results regarding the overhead in maintaining a reliable architecture are also shown Multicube Explorer: An Open Source Framework for Design Space Exploration of Chip Multi-Processors http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=115.ZaPaCaSiMa10 Given the increasing complexity of Chip Multi-Processors (CMPs), a wide range of architecture parameters must be explored at design time to find the best trade-off in terms of multiple competing objectives (such as energy, delay, bandwidth, area, etc.) The design space of the target architecture is huge because of it should consider all possible combinations of each parameter (number of processors, processor issue width, L1 and L2 cache sizes, etc.). In this complex scenario, the multi-objective exploration of the huge design space of next generation CMPs cannot be anymore based on intuition and past experience of the design architects. An Automatic Design Space Exploration methodology is needed to support systematically the exploration and the quantitative comparison of the design alternatives in terms of multiple competing objectives (Pareto analysis). An overall design space exploration framework is needed to combine all optimizations into a global search space with a common interface to the simulation and evaluation tools. Our work1 focuses on the definition of an automatic multi-objective Design Space Exploration (DSE) framework for tuning Chip Multi-Processor architectures by evaluating a set of metrics (such as energy and delay) for the next generation embedded computing platforms. Multicube Explorer is an interactive open-source framework to enable the designer to automatically explore a design space of configurations for a parameterized architecture for which an executable model (use case simulator) exists. Multicube Explorer is an advanced multi-objective optimization framework which is entirely command-line/script driven and can be re-targeted to any configurable platform by writing a suitable XML design space definition file and providing a configurable simulator An Efficient Run-Time Management Methodology for Stereo Matching Application http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=114.MaYkZhZhLa10 This paper presents a methodology for Run-Time Management (RTM) of algorithmic parameters. The RTM is able to trade-off the algorithm output quality and the execution time. Thus, once a requirement in terms of maximum execution time is set, the RTM dynamically tunes the parameters in order to maximize the output quality while respecting the given requirement. The run-time decision making relies on design-time modeling techniques able to characterize key relations between algorithm parameters, execution time and output quality. Models generated during the design-time analysis are accurate enough to drive the RTM in its decision making while enough generic to model application behaviors over datasets which were not included at design-time. In this paper the methodology is applied on the Stereo Matching application, a computational intensive artificial vision application aimed at inferring object depths using two or more cameras. Experimental results prove the effectiveness of the methodology which is able to identify high quality solutions respecting required deadline while introducing negligible overhead. A system level model of possible integration of Building Management System in SmartGrid http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=113.LuCoKu10 SmartGrids are conceived as modular, end-to-end interoperable systems. It is envisaged that power systems components (modules) will be hierarchically coordinated and integrated in order to form certain autonomous clusters which would perform as much as possible local data storing and processing in order to decrease overall communicational and computational overhead. Building Management Systems (BMS) could be seen as one of such modules inside wider SmartGrid system. The incorporation of BMS must consider both technical as well as commercial issues. Hence, the efficient integration will require standards' harmonization and closer interaction among key elements of these systems. Moreover, another important issue will be adopting of new market models to BMS. In order to represent the system and relations among its components in a clear and understandable fashion, we introduce system level modeling concept as an instrument to bridge functional requirements and implementation constraints. Virtual Power Plant as a bridge between Distributed Energy Resources and Smart Grid http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=112.LuKaMuBo10 The liberalization of energy markets, especially in correlation with the Smart Grid concept development, requires adjusted legislation, new business models, energy stock exchanges establishment and many other advanced instruments. Realization of these features necessitates novel concepts to support such changes in the power system while granting security and reliability of supply. Such evolution poses new challenges to ICT (Information and Communication Technologies) to bridge the gap between increased complexity of deregulated market and on the other side expected rapid growth of number of players in power systems. Increasing presence of Distributed Energy Resources (DER) implementations constitutes a further source of complexity. Bearing in mind ongoing and possible scenarios we aim to determinate the place and role of the novel Virtual Power Plants (VPP) concept, related to the Smart Grid structure. At the same time we introduce an innovative modeling approach as an instrument to determine actors and highlight their actual roles and interactions from different point of view, trying to pave the way for development of a common understanding platform for variety of stakeholders. The effectiveness of the proposed modeling concept is shown through a number of UML models representing system level description of VPP at different levels of abstraction. Yield Enhancement by Robust Application-specific Mapping on Network-on-Chips http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=111.ChPaSiZa10 The current technological defect densities and production yields are a motivating factor supporting the introduction of design-for-manufacturability techniques during the highlevel design of complex, embedded systems based on networkon- chips (NoCs). In this context, we tackle the problem of mapping the IPs of a multi-processing system to the NoC nodes, by taking into account the effective robustness of the system with respect to permanent faults in the interconnection network due to manufacturing defects. In particular, we introduce an application specific methodology for identifying optimal NoCs mappings which minimize the variance of the system power and latency and maximizes the probability that the actual system will work when deployed, even in presence of faulty NoC links. We provide experimental results by comparing the proposed methodology with conventional mapping approaches, by highlighting benefits and drawbacks of both techniques Prediction of the type of heating with EnergyPlus program and fuzzy logic http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=110.CoKuLu09 The purpose of this work is prediction of the type of heating for the next few days in an office building using EnergyPlus program for simulation and fuzzy logic for determination. In this matter a program that binds weather forecast, created simulation model in EnergyPlus of a five story building in Belgrade, simulation in EnergyPlus and fuzzy logic, and as a result program gives the type of heating which is the most economic to use for the particular day, was built. Everything is done in the way of most efficient and rational use of energy. Creating an Embedded Systems Program from Scratch: Nine years of experience at ALaRI http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=106.BoSa09 In 1999, experts form academia and industry met in a workshopdealing with education in Embedded Systems Design: at the timethere were no specifically oriented programs, and an 'ideal'educational track was designed. One year later, that educationaldesign was implemented with a one-year 'executive-type' Masterat University of Lugano, in Switzerland; over the years, theprogram blossomed and extended, with development of a two-year Master of Science program as well. The experience isdiscussed here; results and perspectives are analyzed. Negotiation of Security Services: a Multi-criteria Decision Approach http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=109.TaMaFe09 Presently, one of the most important challenges in securing communications between resource-constrained mobile systems is the optimization of the trade-off between energy and performance of security services. Any adopted security solution should be able to negotiate the best security services in a dynamic and energy efficient way. In this paper, we propose an energy-aware adaptive protocol to negotiate security settings for communications. The protocol is based on a multi-criteria selection mechanism which provides the most profitable services related to nodes requirements and available resources. Enabling Self-adaptivity in Component-based Streaming Applications http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=108.DeFe09.3 Self-adaptivity is the capability of a system to adapt itself dynamically to achieve its goals. By means of this mechanism the system is able to autonomously modify its behavior or the way in which applications are run and implemented to achieve the goals set.In this paper we propose a framework that uses a component-based approach to implement self-adaptivity at application level. By using this mechanism, the framework provides the ability to perform both adaptation on the structure of the application (i.e., how the components are connected together) and on internal parameters of each component. At application level, there is a mechanism to monitor different parameters and to check whether the system is meeting the assigned goals or not. A controller drives adaptations when goals are not met. Run-time Selection of Security Algorithms For Networked Devices http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=107.TaFe209 One of the most important challenges that need to be currently faced in securing resource-constrained embedded systems is optimizing the trade-off between resources used (energy consumption and computational capabilities required) and security requirements for cryptographic algorithms: any adopted security solutions should guarantee an adequate level of protection, yet respecting constraints on computational resources and consumed power. These constraints are given by the kind of system considered and by the foreseen applications. In this paper, a generic, efficient, and energy-aware mechanism is proposed to face the problem of determining a correct trade off between security requirements and resources consumed. The solution proposed relies on Analytic Hierarchy Process (AHP) to define priorities among different requirements and to compare different security solutions. A knapsack problem is formulated to select the most relevant algorithms based on their utility and on available resources. Reducing Timing Overhead in Simultaneously Clock-Gated and Power-Gated Designs by Placement-Aware Clustering http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=104.UpCaMaMaPo09 Clock-gating and power-gating have proven to be two of themost effective techniques for reducing dynamic and leakage power, respectively, in VLSI CMOS circuits. Most commercial synthesis tools dosupport such techniques individually, but their combined implementation is not available, since some open issues in terms of power/timingoverhead associated to the control logic required for the integration arenot yet solved.Moving from some recent work targeting clock-gating/power-gating integration, in this paper we present a solution for reducing the timingoverhead that may occur when the integration is performed. In particular, we introduce a new, multilevel partitioning heuristic that increasesthe efficiency of the clustering phase, one of the key steps of our methodology. The results demonstrate the effectiveness of our solution; in fact,power-delay product and timing overhead of the circuits synthesized using the new clustering algorithm improve by 33% and 24%, respectively. Functional requirements of embedded systems for monitoring and control structure of Virtual Power Plants http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=102.LuKaMuBo09 Efficient integration of distributed renewable generation into a reliable single entity in technical and commercial terms is one of key issues for successful realization of smart-grids. The novel concept of Virtual Power Plants (VPP) emerges to be promising response to these needs. ICT is the enabling technology for VPP implementation. In fact, an efficient monitoring and control system coupled with appropriate communication structure must be designed in a scalable and modular way so that full interoperability among components of the system is achieved. On top of that, Control Center applications take care of power flow optimization (production, consumption, ancillary services) and high-level applications (e.g. energy trading, Demand Side Management etc.). In this work we focus on functional requirements for realization of such concept by means of embedded systems. Semi-Automated HW/SW Co-design for Embedded Systems: from MARTE Models to SystemC Simulators http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=103.MuMuPr09 Although MDE and Hw/Sw Co-design are widely used to address the design complexity problem, the lack of design procedures and methodologies joining both concepts restrains their usage as complementary techniques, thus preventing the implementation of faster and more robust design cycles. In this paper we present a practical semi-automated design flow where both methodologies are merged and exploited to enable a fast design process targeting highly complex Real-Time Embedded Systems, executing several tasks on SoC and MPSoC devices, while allowing the usage of Design Space Exploration, Schedulability Analysis and Estimation techniques. A design flow and evaluation framework for DPA-resistant Instruction Set Extensions http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=105.ReCeStBaKlBrLeIe09 Power-based side channel attacks are a significant security risk, especially for embedded applications. To improve the security of such devices, protected logic styles have been proposed as an alternative to CMOS. However, they should only be used sparingly, since their area and power consumption are both significantly larger than for CMOS. We propose to augment a processor, realized in CMOS, with custom instruction set extensions, designed with security and performance as the primary objectives, that are realized in a protected logic. We have developed a design flow based on standard CAD tools that can automatically synthesize and place-and-route such hybrid designs. The flow is integrated into a simulation and evaluation environment to quantify the security achieved on a sound basis. Using MCML logic as a case study, we have explored different partitions of the PRESENT block cipher between protected and unprotected logic. This experiment illustrates the tradeoff between the type and amount of application-level functionality implemented in protected logic and the level of security achieved by the design. Our design approach and evaluation tools are generic and could be used to partition any algorithm using any protected logic style. Simulation of a Self-adaptive Run-time Environment with Hardware and Software Components http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=100.DeFe09 In this paper we describe a new way for simulating self-adaptive systems developed by relying on a component-based approach, this approach proves to be useful both in easing self-adaptivity and in providing the ability to mix hardware and software elements. Our simulation method is based on SACRE (Self-Adaptive Component Run-time Environment), a framework we have defined in Java for simulating self-adaptive systems. Meta-model Assisted Optimization for Design Space Exploration of Multi-Processor Systems-on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=101.MaPaSiZa209 Multi-processor Systems-on-chip are currently designed by using platform-based synthesis techniques. In this approach, a wide range of platform parameters are tuned to find the best trade-offs in terms of the selected system figures of merit (such as energy, delay and area). This optimization phase is called Design Space Exploration (DSE) and it generally consists of a Multi-Objective Optimization (MOO) problem.The design space of a Multi-processor architecture is too large to e evaluated comprehensively. So far, several heuristic techniques have been proposed to address the MOO problem, but they are haracterized by low efficiency to identify the Pareto front. In this paper, we address the MPSoC DSE problem by using an NSGA-II modified to be assisted by an Artificial Neural Network (ANN). In particular we exploit statistical methods to compute the prediction confidence intervals for the ANN approximations. These information are adopted in the evolution control strategy in order to carefully select which individuals should be simulated.Experimental results show that the proposed techniques is able to reduce the simulations needed for the optimization without decreasing the quality of the obtained Pareto Front. Results are compared with state of the art techniques to demonstrate that optimization time due to simulation can be speed up by adopting statistical methods during evolution control. Multiprocessor System-on-Chip Design Space Exploration based on Multi-level Modeling Techniques http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=99.MaPaSiZa09 Multi-processor Systems-on-chip are currently designed by using platform-based synthesis techniques. In this approach, a wide range of platform parameters are tuned to find the best trade-offs in terms of the selected system figures of erit (such as energy, delay and area). This optimization phase is called Design Space Exploration (DSE) and it generallyconsists of a Multi-Objective Optimization (MOO) problem.The design space of a Multi-processor architecture is too large to be evaluated comprehensively. So far, several heuristic techniques have been proposed to address the MOO problem, but they are haracterized by low efficiency to identify the Pareto set. In this paper we propose a methodology for heuristic platform based design based on evolutionary algorithms and multi-level simulation techniques. In particular, we extend the NSGA-II with an approximate neural network meta-model for multi-processor architectures in order to replace expensive platform simulations with fast meta-model evaluation. The model accuracy and efficiency is improved by exploiting high-level platform simulation techniques. High-level simulation allows us to reduce the overall complexity of the neural network and improving its prediction power.Experimental results show that the proposed techniques is able to reduce the number of simulations needed for the optimization without decreasing the quality of the obtained Pareto set. Results are compared with state of the art echniques to demonstrate that optimization time due to simulation can be sped up by adopting multi-level simulation techniques. A Design Space Exploration Methodology Supporting Run-Time Resource Management for Multi-Processors System on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=98.MaPaSiZa309 Application Specific multi-processor Systems-on-chip are currently designed by using platform-based synthesis techniques. In this approach, a wide range of platform parameters are tuned either at design-time or at run-time, to provide the best trade-offs in terms of the selected system figures of merit (such as power and throughput) for a dynamic application-specific workload.Among the design-time (hardware) configurable parameters we can find the memory sub-system configuration (e.g. cache size and associativity) and other architectural parameters such as the instruction-level parallelism of the system processors. Among the run-time (software) configurable parameters we can find the overall degree of task-level parallelismassociated with each application running on the chip. Typically, while the design-time exploration is performed in the early development stages for a set of static parameters, the tuning of the run-time parameters is performed dynamically by a run-time management software module after the system has been deployed.In this paper, we introduce a methodology for identifying a hardware configuration which is robust with respect to the variable workload scenario introduced by the run-time management. Moreover, the proposed methodology is aimed at providing useful information about the optimal software operating points of the applications in terms of task-level parallelism. The proposed methodology is based on the NSGA-II evolutionary heuristic algorithm assisted by an Artificial Neural Network (ANN). We then introduce a run-time management policy which is able to exploit the above information to maximize the performance of the system under power budget constraints.Experimental results show that the proposed technique is able to reduce the overall design space exploration time yet providing a near-optimal solution, in terms of hardware parameters, to enable an innovative and efficient run-time anagement policy. MPSoCs Run-Time Monitoring through Networks-on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=95.FiPaSi09 Networks-on-Chip (NoCs) have appeared as design strategy to overcome the limitations, in terms of scalability, efficiency, and power consumption of current buses. In this paper, we discuss the idea of using NoCs to monitor system behaviour at run-time by tracing activities at initiators and targets. Main goal of the monitoring system is to retrieve information useful for run-time optimization and resources allocation in adaptive systems. Information detected by probes embedded within NIs is sent to a central unit, in charge of collecting and elaborating the data. We detail the design of the basic blocks and analyse the overhead associated with the ASIC implementation of the monitoring system, as well as discussing implications in terms of the additional traffic generated in the NoC Multicube Explorer - A Design Space Exploration Framework for Embedded Systems-on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=96.MaPaSiZa09.2 Multicube Explorer is a design space exploration tool for supporting platform-based design. It allows a fast optimization of parameterized system architecture towards a set of objective functions (e.g., energy, delay and area), by interacting with a system-level simulator. Multicube Explorer provides a set of innovative sampling and optimization techniques to help finding the best objective function trade-offs. It also provides an open XML interface for supporting new platforms/architectures. MULTICUBE: Multi-Objective Design Space Exploration of Multiprocessor Architectures for Embedded Multimedia Applications http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=97.Silvanoetal09 Coordinated management of hardware and software self-adaptivity http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=94.DeFeTa08 Self-adaptivity is the capability of a system to adapt itself dynamically to achieve its goals. Self-adaptive systems will be widely used in the future both to efficiently use system resources and to ease the management of complex systems. The frameworks for self-adaptivity developed so far usually concentrate either on self-adaptive software or on self-adaptive hardware, but not both.In this paper, we propose a model of self-adaptive systems and we describe how to manage self-adaptivity at all levels (both hardware and software) by means of a decentralized control algorithm. The key advantage of decentralized control is in the simplicity of the local controllers. Simulation results are provided to show the main characteristics of the model and to discuss it. A Security Service Protocol for MANETs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=93.TaFe09 Mobile Ad-hoc Networks are composed of heterogeneous mobile systems. Securing their communications may be difficult due to differences in the supported algorithms and protocols. In this paper we propose a protocol to negotiate security settings for the communications. This protocol aims at minimizing the power consumption and at providing the highest possible security level associated with the communications. Security in NoC http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=92.FiPaSi09.2 Future integrated systems will contain billion of transistors, composing tens to hundreds of IP cores. These IP cores, implementing emerging complex multimedia and network ap- plications, should be able to deliver rich multimedia and networking services. An efficient cooperation among these IP cores (e.g., efficient data transfers) can be achieved through utilization of the available resources. The design of such complex systems includes several challenges to be addressed. Among others one challenge is to design an on-chip interconnection network that should be able to efficiently connect the IP cores. Another challenge is to derive such an application mapping that will make efficient usage of the available hardware resources . An architecture that is able to accommodate such a high number of cores, satisfying the need for commu- nication and data transfers, is the Network-on-Chip (NoC) architecture. For these reasons Networks-on-Chip become a popular choice for designing the on-chip interconnect for Systems-on-Chip (MPSoCs), and are supported from the industry (such as the Ethereal NoC from Philips, the STNoC from STMicroelectronics and an 80-core NoC from Intel). As it is presented in , the key design challenges of emerging NoC design are a) the communication infrastructure, b) the communication paradigm selection and c) the application mapping optimization. Can knowledge regarding the presence of countermeasures against fault attacks simplify power attacks on cryptographic devices? http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=89.ReEiBrIeKo Side-channel attacks are nowadays a serious concern when implementing cryptographic algorithms. Powerful ways for gaining information about the secret key as well as various countermeasures against such attacks have been recently developed. Although it is well known that such attacks can exploit information leaked from different sources, most prior works have only addressed the problem of protecting a cryptographic device against a single type of attack. Consequently, there is very little knowledge on how a scheme for protecting a device against one type of side-channel attack may affect its vulnerability to other types of side-channel attacks. In this paper we focus on devices that include protection against fault injection attacks (using different error detection schemes) and explore whether the presence of such fault detection circuits affects the resistance against attacks based on power analysis. Using the AES S-Box as an example, we performed attacks on the unprotected implementation as well as modified implementations with parity check circuits or residue check circuits (mod3 and mod7). In particular, we focus on the question whether the knowledge of the presence of error detection circuitry in the cryptographic device can help an attacker who attempts to mount a power attack on the device. Our results show that the presence of error detection circuitry helps the attacker even if he is unaware of this circuitry, and that the benefit to the attacker increases with the number of check bits used for the purpose of error detection. A Protocol For Pervasive Distributed Computing Reliability http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=91.FePoStTa08 The adoption of new hardware and software architectures will make future generations of pervasive devices more flexible and extensible. Networks of computational nodes will be used to compose such systems. In these networks tasks will be delegated dynamically to different nodes (that may be either general purpose or specialized). Thus, a mechanism to verify the reliability of the nodes is required, especially when nodes are allowed to move in different networks. In this context, the reliability of nodes is determined by their ability to execute the tasks assigned to them with the promised performances.This paper proposes a protocol to evaluate the reliability of the different nodes in the network, thus providing a trusting mechanism among nodes which can also manage the soft/hard real-time constrains of task execution. Some simulation results are also shown to help describing the properties of the protocol. An Efficient Design Space Exploration Methodology for Multi-Cluster VLIW Architectures based on Artificial Neural Networks http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=90.MaPaZaSi08 Multi-Cluster Very Long Instruction Word (VLIW) architectures are currently designed by using platform-based synthesis techniques. In these approaches, a wide range of platform parameters is tuned to find the best trade-offs in terms of the selected figures of merit (such as energy, delay and area). This optimization phase is called Design Space Exploration (DSE) and it generally consists of a Multi-Objective Optimization (MOO) problem. The design space for a Multi-Cluster architecture is too large to be evaluated comprehensively. So far, several heuristic techniques have been proposed to address the MOO problem, but they are characterized by low efficiency to identify the Pareto front. In this paper, we propose an efficient DSE methodology leveraging neural networks. In particular, an initial design-of-experiments phase is used for generating a coarse view of the target design space; neural networks are then trained and used to refine the exploration, by identifying efficiently the Pareto points of the design space. This process is iteratively repeated until the target criterion (convergence of the Pareto coverage) is satisfied. A set of experimental results are reported to trade-off accuracy and efficiency of the proposed techniques with actual workloads. A Security Monitoring Service for NoCs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=88.FiPaSi08 As computing and communications increasingly pervade our lives, security and protection of sensitive data and systems are emerging as extremely important issues. Networks-on- Chip (NoCs) have appeared as design strategy to cope with the rapid increase in complexity of Multiprocessor Systems- on-Chip (MPSoCs), but only recently research community have addressed security on NoC-based architectures. In this paper, we present a monitoring system for NoC based architectures, whose goal is to help detect security violations carried out against the system.Information col- lected are sent to a central unit for efficiently counteracting actions performed by attackers.We detail the design of the basic blocks and analyse overhead associated with the ASIC implementation of the monitoring system, discussing type of security threats that it can help detect and counteract. Programmable data protection device, secure programming manager system and process for controlling access to an interconnect network for an integrated circuit http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=87.pat07301411.0-2413PATENT A data protection device for an interconnect network on chip (NoC) includes a header encoder that receives input requests for generating network packets. The encoder routes the input requests to a destination address. An access control unit controls and allows access to the destination address. The access control unit uses a memory to store access rules for controlling access to the network as a function of the destination address and of a source of the input request. Secure Memory Accesses on Networks-on-Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=84.AlGaSte2008 Security is gaining relevance in the development of embedded devices. Towards a secure system at each level of design, this paper addresses security aspects related to Network on Chip (NoC) architectures, foreseen as the communication infrastructure of next-generation embedded devices. In the context of NoC-based multiprocessor systems, we focus on the topic, not yet thoroughly faced, of data protection. In this paper, we present a secure NoC architecture composed of a set of Data Protection Units (DPUs) implemented within the Network Interfaces (NIs)1. The run-time configuration of the programmable part of the DPUs is managed by a central unit, the Network Security Manager (NSM). The DPU, similar to a firewall, can check and limit the access rights (none, read, write, or both) of processors accessing data and instructions in a shared memory. In particular, the DPU can distinguish between the operating roles (supervisor/user and secure/non secure) of the processing elements.We explore alternative implementations of the DPU and demonstrate how this unit does not affect the network latency if the memory request has the appropriate rights. We also focus on the dynamic updating of the DPUs to support their utilization in dynamic environments, and on the utilization of authentication techniques to increase the level of security. Code Generation from Statecharts: Simulation of Wireless Sensor Networks http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=85.MuSaParma08 Automatic generation of code starting from lightweight modeling languages such as UML is by now a widely adopted approach. In particular generation of executable SystemC models starting from StateCharts and other UML diagrams represents a promising research field. While RTL SystemC appears better suited for matching the StateCharts formalism (being intrinsically clocked), performances of the generated code suffer from the heavy overhead induced bytime management, specially when the number of concurrent processes is high. In this paper we present a methodology that allows applying a solution mixing event based and clock-driven approach. More specifically, clock-driven simulation is activated only when the configuration of the system is identified to be evolving. When no events are present this fact is also detected (together with the interval of absence of events) so that no simulation is performed although the clock runs on. This solution is particularly suited for low duty cycle systems, as, e.g. when simulating Wireless Sensor Networks (WSN); in such instances, speedup of the generated code has been found to be well over two orders of magnitude. Application of the technique to the generation of a power simulator for the IEEE 802.15.4 networking protocol is used as a test case. Programmable data protection device, secure programming manager system and process for controlling access to an interconnect network for an integrated circuit http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=86.pat20090089861PATENT A data protection device for an interconnect network on chip (NoC) includes a header encoder that receives input requests for generating network packets. The encoder routes the input requests to a destination address. An access control unit controls and allows access to the destination address. The access control unit uses a memory to store access rules for controlling access to the network as a function of the destination address and of a source of the input request. Model-based Design Space Exploration for RTES with SysML and MARTE http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=83.Stuttgart The features of the emerging modeling languages for system design allow designers to build models of almostany kind of heterogeneous hardware-software systems, including Real Time Embedded Systems (RTES). An important goal to achieve is the implementation and useof these models in all the steps of a common designflow. One of these steps is the Design Space Exploration(DSE), which helps designers in discovering the optimalsolutions among all possible combinations after mappingfunctional to architectural specifications; for RTES thisstep is particularly hard as it should include schedulinganalysis in order to proof the time validity after the mapping. This paper presents some guidelines on how to useSysML and MARTE profiles to identify design pointsfulfilling the timing constraints of an RTES, and thusallowing to automatize DSE analysis within the systemdesign phase A 640 Mbit/s 32-bit Pipelined Implementation of the AES Algorithm http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=82.BeBrFaRe08 Due to the diffusion of cryptography in real time applications, performances in cipher and decipher operations are nowadays more important than in the past. On the other side, while facing the problem for embedded systems, additional constraints of area and power consumption must be considered. Many optimized software implementations, instruction set extensions and co-processors, were studied in the past with the aim to either increase performances or to keep the cost low. This paper presents a co-processor that aims to be an intermediate solution, suitable for such applications that require a throughput in the Megabit range and where the die size is a bit relaxed as constraint. To achieve this goal, the core is designed to operate at 32 bits and the throughput is guaranteed by a 2 stage pipeline with data forwarding. The obtained results synthesizing our coprocessor by means of the CMOS $0.18$ $\mu$m standard cell library show that the throughput reaches 640 Mbit/s while the circuit size is of only 20 K equivalent gates. An Enhanced Service Provider Communication Interface with Client Priorization http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=81.sliceb08 With the increased dynamics of modern life, the efficiency and reliability of everyday services is emerging to be a fundamental concern. On the other hand, modern telecommunication technologies, like wireless Internet access, are penetrating all segments of our life. However, many every day activities and services still do not fully exploit new technologies. We propose an approach that enables increased deployment of E-commerce concepts in the fields where their usage was either small or negligible. Moreover, in the scope of the same concept, we introduce prioritization of clients in services where it was not commonly present to date. A solution for enhanced communication interface between service provider and customers is developed. As a case study, the system is designed and optimized for an implementation in a fast-food chain. The proposed solution is aiming at increasing of quality of service for customers, and at the same time increasing the operational efficiency of the provider. The main idea behind this approach is to enable customers to use their mobile devices, such as cell phones or PDAs, for browsing offered services or goods, viewing current service conditions and placing orders. We will detail mathematical model underneath and describe the implementation on both server and client side. Hardware scheduled SMP architectures http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=78.pat20080134187PATENT An advance is made in the art according to the principles of the present invention directed to a hardware real time operating system (HW RTOS) which advantageously implements the OS layer in a dual-processor SMP architecture. Intertask communication is specified by a dedicated application programming interface (API) wherein the HW-RTOS provides and manages communication requirements of applications while providing task scheduling. Advantageously, when implemented according to the present invention, the HW-RTOS results in systems exhibiting a smaller footprint since there is no need to link final executables to software RTOS libraries as done in the prior art. An Automated Design Flow for NoC-based MPSoCs on FPGA http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=80.LuFi08 Increased dynamics of the embedded devices market makes reduced time-to-market emerge as one of most challenging tasks in modern embedded system design. The complexity of Multiprocessor Systems-on-Chip (MPSoCs) rapidly increases and Networks-on-Chips (NoCs) have emerged as design strategy to cope with it. In order to allow fast generation of these platforms in the development phase, a full design flow is required. On the other hand, modern FPGAs provide the possibility for fast and low-cost prototyping, representing an efficient response to these needs. In this paper we present a framework, based on the Xilinx Embedded Development Kit (EDK) design flow, for the generation of MPSoCs based on NoCs. The tool provides system designers with the possibility to easily and quickly generate desired architectures that can be helpful for testing, debugging and verifying purposes. Our integrated design flow takes as input a textual description of the system and produces as final result a configuration bitstream file. The framework has been tested and verified on a Xilinx Virtex-II Pro board. Modelling the Power Cost of Security in Wireless Sensor Networks : the Case of 802.15.4 http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=79.Ptsbrg Pervasive applications and in particular Wireless SensorsNetworks have very strict requirements in terms ofpower consumption. It is well known that radio activity isvery expensive in terms of energy; we show here that intensiveprocessing activities (as security) represent a majorcontribution to power budget. In this paper we extend ourmethodology for analyzing the impact ofSecurity related operations on power consumption and optimizingit. The analysis is based on experimental data andwas validated with measurements on a real platform. Design Space Exploration of PISA Architecture For ONU Auto-discovery Process http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=77.MaToFi08 The goal of the paper is to optimize the PISA architecture for the ONUAuto-discovery process. This Auto-discovery process has been written in C languagefollowing the IEEE 802.3ah MPCP standard. Using SimpleScalar [3] simulation tool, thearchitecture profile is evaluated in order to decide the range of the designexploration. Then, using Wattch [1] and CACTI [2] simulation tools the CPI, average powerconsumed and cache area are calculated for each design point, the cost function is definedand evaluated for each design point using greedy strategy. The Auto-discovery process hasbeen written in VHDL and using Synopys power compiler [4] the power consumption has beencalculated and then we compared between the VHDL implementation and the PISAarchitecture from the power consumption point of view. Implementation of a Reconfigurable Data Protection Module for NoC-based MPSoCs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=76.FiLuPa08 Security issues are emerging to be a basic concern in modern SoC development. Since in the field of on-chip interconnections the security problem continues to remain mostly an unexplored topic, this paper proposes a novel technique for data protection that uses the communication subsystem as basis. The proposed architecture works as a firewall managing the memory accesses on the basis of a lookup table containing the access rights. This module, called Data Protection Unit (DPU), has been designed for MPSoC architectures and integrated in the Network Interfaces near the shared memory. We implement the DPU inside an MPSoC architecture on FPGA and we add features to the module to be aware of dynamic reconfiguration of the system software. Starting from a general overview of our design down to components structure, we introduce the place and the role of the DPU module inside the system for a reconfigurable secure implementation of a MPSoC on FPGA. The description of the DPU concept, its implementation, and integration into the system are described in detail. Finally, the architecture is fully implemented on FPGA and tested on a Xilinx Virtex-II Pro board. Executable Models and Verification from MARTE and SysML: a Comparative Study of Code Generation Capabilities http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=74.Munich In this paper two well known UML profiles, namelySysML and MARTE are closely examined and compared.Both profiles are well suited for the description of embedded systems, although focusing on different aspects and cantherefore be considered as complementary. While SysMLtargets system engineering descriptions in a high level ofabstraction and provide diagrams for requirements specification, MARTE is tailored for systems in which Real Timeconstraints play a major role. Expressiveness of such profiles and their matching with languages that represent thenext step in the development of Hardware/Software systemswill be the main subject of this work. A Wireless Sensor Network scenario is taken as a reference case study and usedto illustrate a practical application of MDA. An adaptable FPGA-based System for Regular Expression Matching http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=75.BoPaSa08 In many applications string pattern matching is one of the most intensive tasks in terms of computation time and memory accesses. Network Intrusion Detection Systems and DNA Sequence Matching are two examples. Since software solutions are not able to satisfy the performance requirements, specialized hardware architectures are required. In this paper we propose a complete framework for regular expression matching, both in its architecture and compiler. This special-purpose processor is programmed using regular expressions as programming language. With the parallelism exploited in the design it is possible to achieve a throughput greater than one character per clock cycle, requiring O(n) memory space. The VHDL description of the proposed architecture is fully configurable. A design space exploration to find the optimal architecture based on area and performance cost-function is presented. Rapid Creation of Application Models from Bandwidth Aware Core Graphs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=73.OtReLa07 We present a methodology that allows rapid creation of application models from bandwidth aware core graphs that are available in the literature for a wide range of applications and we discuss their applicability to the rapid exploration of multiple Networks on Chip (NoCs) layout organizations. In a bandwidth aware core graph, each node represents a core and the numbers on the edges represent the bandwidth requirements between cores. We describe core graphs in a UML object model diagram and we then have an automatic code generation tool which produces a SystemC description whose behaviour results in a packet generation on every output connection that respects the bandwidth requirements specified in the core graph. We can then rapidly derive a NoC mapping in which a specific floorplan of the cores can be evaluated and compared with alternate floorplan options for rapid design space exploration. ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=70.PaBoSa07 Text pattern matching is one of the main and most computationintensive parts of systems such as Network Intrusion DetectionSystems and DNA Sequencing Matching. Soft- ware solutions to thisare available but often they do not satisfy the requirements interms of performance. This pa- per presents a new hardwareapproach for regular expression matching: ReCPU. The proposedsolution is a parallel and pipelined architecture able to dealwith the common regular expression semantics. This implementationbased on several parallel units achieves a throughput of more thanone char- acter per clock cycle (maximum performance of state ofthe art solutions) requiring just O(n) memory locations (where nis the length of the regular expression). Performance has beenevaluated synthesizing the VHDL description. Area and timeconstraints have been analyzed. Experimental re- sults areobtained simulating the architecture. Ultra-low power optimizations for the IEEE 802.15.4 networking protocol http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=71.pisa A main challenge to be tackled in the area of Wireless SensorsNetworks (WSN)s is related to the limited amount of energyavailable and the requirements in terms of lifetime. IEEE 802.15.4is a recent low-rate/lowpower standard for wireless personal areaand sensor networks. Its simple infrastructure, intermediate rangeand reasonable power performance make it a candidate for a widerange of applications that require a low throughput but areasonable device lifetime and consequently a certain powerefficiency. Anyway there are some main inefficiencies of theprotocol that limit its power performance and cause unnecessarypower waste in some situations. In this paper these limitations ofthe standard in terms of power performance are investigated.Possible optimizations that can be achieved with minimal or nullchanges on available 802.15.4 compliant hardware platforms aresuggested. Design exploration for an Ogg/Vorbis decoder for VLIW architectures http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=72.Ferrari2007 Parallel processing architectures are set to be the dominatingdesign approach for a plethora of application domains, mainlybecause of the eminent reach of the so-called power wall, andfurthermore because of the evident gap between theapplication/software development growth and Moore's law. In thiswork a design space for an audio codec is explored, targeted at aVLIW architecture. The Ogg/Vorbis codec is first analyzed andoptimized for exposing potential parallelism to the VEX tools forcompilation and parallel architecture exploration. Furthermore,the use of custom instructions is assessed and important resultsare obtained by means of a modification on the toolchain to revealdynamic profiling information Role Based Access Control for the interaction with Search Engines http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=69.BoIoNeTaTo07 Search engine-based features are a basic interaction mean forusers to find information inside a Web-based Learning ManagementSystems (LMS); nonetheless, traditional solutions lack inmechanisms for access rights management for data contained insearch engines' indexes. This paper explores the integration of aRole Based Access Control (RBAC) mechanism for the interactionwith a search engine in a Web-based LMS. We first outline areference conceptual model for the design of Web-based LMSsexploiting RBAC by means of WebML, a visual modeling language forthe high-level specification of data-intensive Web applications.Then, we propose a model-driven approach for the definition of aRBAC-driven interaction between users and search engines,extending WebML with new modeling primitives and outliningsignificative modeling patterns for the specification of thevisibility and action access control levels. Power Attacks Resistance of Cryptographic S-boxes with added Error Detection Circuits http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=67.ReEiGr07 Many side-channel attacks on implementations of cryptographicalgorithms have been developed in recent years demonstrating theease of extracting the secret key. In response, various schemes toprotect cryptographic devices against such attacks have beendevised and some implemented in practice. Almost all of theseprotection schemes target an individual side-channel attack andconsequently, it is not obvious whether a scheme for protectingthe device against one type of side-channel attacks may make thedevice more vulnerable to another type of side-channel attacks. Weexamine in this paper the possibility of such a negative impactfor the case where fault detection circuitry is added to a device(to protect it against fault injection attacks) and analyze theresistance of the modified device to power attacks. To simplifythe analysis we focus on only one component in the cryptographicdevice (namely, the S-box in the AES and Kasumi ciphers), andperform power attacks on the original implementation and on amodified implementation with an added parity check circuit. Ourresults show that the presence of the parity check circuitry has anegative impact on the resistance of the device to power analysisattacks. A Question Answering service for information retrieval in Cooper http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=65.GiTaVeBrKo07 In Cooper, part of the student support will be provided by aQuestion Answering application in the form of a webservice.Question Answering allows a user to use the content of projectdocument as input to find related documents as well as relatedexperts. Latent Semantic Analysis as an underlying technique isbriefly discussed followed by a description of our Latent SemanticAnalysis engine and the software architecture that was developed.Issues for further development are also mentioned. The finalsection contains a specific case study of an environment in whichan implementation is planned. Learning Java by a Card Game: A Case Study http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=63.Der2007 To teach Java programming language better and in a more enjoyableway, we developed a framework for card games that allows studentsto write and test their own intelligent players. This paperbriefly describes the design of the framework, the advantages ofusing it to assign homework and reports our experience with aclass carried out in our institute. SC2: State Charts to System C: Automatic Executable Models Generation http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=64.Barcellona2 The recent development of embedded systems calls for the necessityof a complete framework for design and simulation of applicationsthat span through all levels of system design. Desirablecharacteristics of such a framework are rapidity of use,simplicity and reusability. For this purpose we already introduceda generator that converts specifications written with a subset ofStateCharts to behavioral SystemC [9] [11]. In this paper wepresent the new version of our tool: most of the limitations ofthe previous versions have been overcome, the considered subset ofthe StateCharts formalism has been extended and the target hasbeen changed from behavioral to Register Tranfer Level (RTL)SystemC. A major enhancement of this new version is thepossibility of obtaining various module instances starting from asingle specification, which is vital in some contexts (e.g.Wireless Sensors Networks simulation). The semantics chosen forour StateCharts diagrams is clearly described. The generation ofexecutable models as well as the kernel template of the generatedcode are discussed in detail. Remote Cooperation on Project-centred Learning: a Working Implemented Solution in Academia http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=62.SaTa07 The paper aims at illustrating the original technical solutionprovided within an academic institute in order to manage teachingactivities, encompassing the coordination of project-centredlearning processes that run in parallel with the formaltheoretical courses. Unlike the planning of the academic teachingthat can be scheduled year by year, the development of a projectcannot be defined over a long period, but it requires frequentreport reviews and updating by the different actors involved inthe project. From this consideration, and due to the peculiarcontext of the ALaRI institute, it was clear the necessity tomanage asynchronous and synchronous communications occurringduring the ongoing project, facilitating the team members' remoteinteractions and cooperation. The provided solution within the EUCOOPER project is the answer to more and more common scenarios ofuse, reflecting not only university requirements, but alsoindustrial needs based on the cooperative teamwork among personsgeographically dispersed and with heterogeneous competences. A Topology Design Customization Approach for STNoC http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=66.PaMaSi07 To support high bandwidth SoCs, a communication design flow isnecessary for the design space exploration respecting tight designrequirements. In order to exploit the benefits introduced by theNoC approach for the on-chip communication, the paper presents adesign flow for the core mapping and customization of the networktopology applied to STNoC, the Network on-Chip developed bySTMicroelectronics. Starting from ring topology, the proposedapplication-specific flow tries to find a set of customizedtopologies, optimized in terms of performance and area/energyoverhead, by adding links. The generated STNoC custom topologiesprovide a reduced cost with respect to the spidergon topology. A Data protection Unit for NoC-based Architecture http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=68.FiPaLuSi07 Security is gaining increasing relevance in the development ofembedded devices. Towards a secure system at each level of design,this paper addresses the security aspects related toNetwork-on-Chip (NoC) architectures, foreseen as the communicationinfrastructure of next generation embedded devices. In the contextof NoC-based Multiprocessor systems, we focus on the topic, notthoroughly faced yet, of data protection. We present thearchitecture of a Data Protection Unit (DPU) designed forimplementation within the Network Interface (NI). The DPU supportsthe capability to check and limit the access rights(none, read,write or both) of processors requesting access to data locationsin a shared memory - in particular distinguishing between theoperating roles (supervisor or user) of processing elements. Weexplore different alternative implementations and demonstrate howthe DPU unit does not affect the network latency if the memoryrequest has the appropriate rights. In the experimental section weshow synthesis results for different ASIC implementations of theData Protection Unit. Security Aspects in Networks-on-Chips: Overview and Proposals for Secure Implementations http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=60.FiSiSa07 Security has gained increasing relevance in the development ofembedded devices. Towards the aim of a secure system at each levelof the design, in this paper we address security aspects relatedto Networks-on-Chips (NoCs) architectures. After presenting theattacks most likely to address NoCs, we survey existing academicand industrial secure architectures relevant to our case, focusingin particular on their communication infrastructure. We outlineand propose possible solutions to contrast some of the attacksdescribed and suggest the use of the NoC as a mean to monitor anddetect unexpected system behaviors. Application-Specific Topology Design Customization for STNoC http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=61.PaMaSiLo07 Customized network-oriented communication architectures haverecently become a must to support high bandwidth SoCs. To thisend, a corresponding communication design flow is necessary tosupport the design space exploration of complex SoCs with tightdesign constraints. In order to exploit the benefits introduced bythe NoC approach for the on-chip communication, the paper presentsa Pareto Simulated Annealing (PSA) approach for the customizationof the network topology. The proposed PSA approach has beenapplied to STNoC, the Network on Chip developed bySTMicroelectronics. Starting from the ring topology, the proposedapplication-specific design flow tries to find a set of customizedtopologies (optimized in terms of performance and area/energyoverhead) by adding custom links up to the spidergon topology. Mapping and Topology Customization Approaches for Application-Specific STNoC Designs http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=57.PaMaSiLoCo07 Application-specific network-oriented communication architectureshave recently become an effective solution to support highbandwidth Systems on-Chip. The Network on-Chip architecturesconsidered so far range from regular to fully customizedtopologies for application specific designs requiring high-levelbandwidth. To this end, a networkcentric design flow is necessaryto support the design space exploration of complex SoCs with tightdesign constraints. This paper introduces four differentapproaches based on the orthogonalization of core mapping andtopology customization applied to STNoC, the Network on-Chipdeveloped by STMicroelecronics. The four methods are derived fromthe combination of the initial mappings to two standard topologies(ring and spidergon) with two types of topology customizationbased on the insertion of cross-links to reduce the networkdistance of standard topologies A Query Unit for the IPSec Databases http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=58.FeChPi07 IPSec is a suite of protocols that adds security to communicationsat the IP level. Protocols within IPSec make extensive use of twodatabases, namely the Security Policy Database (SPD) and theSecurity Association Database (SAD). The ability to query the SPDquickly is fundamental as this operation needs to be done for eachincoming or outgoing IP packet, even if no IPSec processing needsto be applied on it. This may easily result in millions of queryper second in gigabit networks. Since the databases may be ofseveral thousands of records on large secure gateways, a dedicatedhardware solution is needed to support high throughput. In thispaper we discuss an architecture for these query units, we proposedifferent query methods for the two databases, and we compare themthrough simulation. Two different versions of the architecture arepresented: the basic version is modified to supportmultithreading. As shown by the simulations, this technique isvery effective in this case. The architecture that supportsmultithreading allows for 11 million queries per second in thebest case. Simulation-based Methodology for Evaluating DPA-Resistance of Cryptographic Functional Units with Application to CMOS and MCML Technologies http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=59.ReBaEi07 This paper explores the resistance of MOS Current Mode Logic(MCML) against Differential Power Analysis (DPA) attacks. Circuitsimplemented in MCML, in fact, have unique characteristics both interms of power consumption and the dependency of the power profilefrom the input signal pattern. Therefore, MCML is suitable toprotect cryptographic hardware from DPA and similar side-channelattacks. In order to demonstrate the effectiveness of differentlogic styles against power analysis attacks, the non-linearbijective function of the Kasumi algorithm (known as substitutionbox S7) was implemented with CMOS and MCML technology, and a setof attacks was performed using power traces derived fromSPICE-level simulations. Although all keys were discovered forCMOS, only very few attacks to MCML were successful. Self-adaptive Security at Application Level: a Proposal http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=56.FeTaSaMa07 Self-adaptive systems have the ability to adapt themselves tomutating external or internal conditions without requesting anyintervention of the user; the security of such systems isinfluenced by those adaptations. Therefore, also the securitymechanisms that are put in place by the operating system, shouldadapt to maintain the desired security level. This paper proposesa self-adaptive framework for the system security. This adaptationscheme allows the system to choose the best set of securitypolicies at every given time; this set is determined byconsidering the system internal and external conditions as well asthe application requirements. The proposed framework deals withself-adaptation at system level in order to provide both a domainindependent and a flexible solution. A Memory Unit for Priority Management in IPSec Accelerators http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=55.DaFeMa This paper introduces a hardware architecture for high speednetwork processors, focusing on support for Quality of Service inIPSec-dedicated systems. The effort is aimed at defining a securesystem on chip environment, where the speed and securityrequirements are of utmost importance. In particular, a method isdevised to introduce and support Quality of Service throughpriorities at this level. An architecture of a memory system thatprovides automatic priority management is proposed. High-level Architecture of an IPSec-dedicated System on Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=54.FePi07 IPSec is a suite of protocols which adds security tocommunications at the IP level. Protocols within the IPSec suitemake extensive use of cryptographic algorithms. Since thesealgorithms are computationally very intensive, some hardwareacceleration is needed to support high throughput. In this paperwe propose a high level architecture of a System on Chip (SoC)which implements IPSec. This SoC has been thought to be placed onthe main data path of the host machine (flow-througharchitecture), thus allowing for transparent processing of IPSectraffic. The functionalities of the different blocks and theirinteractions, along with an estimation of the internal memorysize, are also shown. Introduction to SysML http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=52.Prevo2007MISC The Systems Modeling Language (SysML) is a general-purpose graphical modeling language for specifying, analyzing, designing, and verifying complex systems that may include hardware, software, information, personnel, procedures, and facilities. It is a response to the UML for Systems Engineering RFP developed by OMG, INCOSE, and the ISO AP233 workgroup. In this presentation I will provide an overview of SysML in particular by showing the diagrams that describe the four pillars of SysML: Requirements, Behavior, Structure, and Parametrics. The diagrams will be shown by means of a simple case study in the field of Wireless Sensor Network. HardwareScheduling Support in SMP Architecture http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=53.CoReLa07 In this paper the authors propose a hardware real time operatingsystem (HW-RTOS) that implements the OS layer in a dual-processorSMP architecture. Intertask communication is specified by means ofdedicated APIs and the HW-RTOS takes care of the communicationrequirements of the application and also implements the taskscheduling algorithm. The HW-RTOS allows to have smallerfootprints, since it avoids the need to link to the finalexecutables traditional software RTOS libraries. Moreover, theHW-RTOS is able to exploit the easy task migration featureprovided by an SMP architecture much more efficiently than atraditional software RTOS, due to its faster execution and theauthors show how this significantly overcomes the performanceachievable with optimal static task partitioning among twoprocessors. Preliminary results show that the hardware overhead ina dual processor architecture is less than 20K gates. StateCharts to SystemC: a High Level Hardware Simulation Approach http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=50.MuPaNeSa07 In this paper we present a tool that converts specificationswritten with a subset of StateCharts into SystemC behavioralmodels. The main advantages of such an approachare rapidity ofuse, simplicity and reusability. Various systems can be modeled atdifferent levels of abstraction and accuracy through StateChartsand different peculiar aspects (e.g. energy, performances) can betaken into consideration. Moreover different parts of the designcan be identified at different detail levels. The kernel of thesimulator is fully discussed together with its mapping to thesemantics of our StateCharts diagrams. As a case study we presenthere a model of the IBM PowerPC 750 Cache system and therespective SystemC simulator automatically generated by our tool. Scheduling Small packets in IPSec Multi-accelerator Based Systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=51.TaFe07 IPSec is a suite of protocols that adds security to communicationsat the IP level. Protocols within the IPSec suite make extensiveuse of cryptographic algorithms. Since these algorithms arecomputationally very intensive, some hardware acceleration isneeded to support high throughput. IPSec accelerator performancemay heavily depend on the dimension of the packets to beprocessed. In fact, when packets are small, the time needed totransfer data and to set up the accelerators may exceed the one toprocess (e.g. to encrypt) the packets by software. In this paperwe present a packet scheduling algorithm that tackles thisproblem. Packets belonging to the same Security Association aregrouped before the transfer to the accelerators. Thus, thetransfer and the initialization time have a lower influence on thetotal processing time of the packets. This algorithm also providesthe capability of scheduling grouped packets over multiplecryptographic accelerators. High-level simulations of thescheduling algorithm have been performed and the results for aone-accelerator and for a two-accelerator system are also shown inthis paper. Area and Power Efficient Synthesis of DPA-Resistant Cryptographic SBoxes http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=48.Giaconia2007 This paper presents a novel design methodology for the hardwareimplementation of non-linear bijective functions, commonly used inmost symmetric-key cryptographic algorithms and known assubstitution boxes (S-boxes). The proposed technique thwarts aparticularly relevant class of side-channel attacks againstcryptographic hardware, that of differential power analysisattacks (DPA). In the proposed approach, the cost of thecountermeasure is kept low in terms of silicon process overheads(standard CMOS gates used), area requirement, power consumptionand latency, when compared to existing countermeasures. Itseffectiveness is proven by showing resistance to simulated DPAattacks using power curves derived with SPICE simulation. Power Modeling and Power Analysis for IEEE 802.15.4: a Concurrent State Machine Approach http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=49.MuPaNeSaFa07 802.15.4 is a recent low-rate/low-power standard for wirelesspersonal area and sensor networks. Its simple infrastructure,intermediate range and good power performance make it a candidatefor applications that require a reasonably low throughput but avery high device lifetime and power efficiency. An experimentalpower analysis of an 802.15.4 implementation is carried out,providing a detailed power model of the protocol based onconcurrent state machines; resulting power model is then used togenerate a customized simulator. The model has been validatedthrough a set of experiments and provides good accuracy; resultsare discussed, considering in particular use of the model as abasis for subsequent optimizations on 802.15.4 networks. Tairona, an Open Source Platform for Worldwide Meeting and Tutoring http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=46.ReBoDjMa07 Tairona is a web-based platform for real time meeting andtutoring. It aims to provide a solution for face to facesynchronous communication between the tutor and the students inremote faculties and similar environments where a life meeting innot possible. In particular the application is tailored on needsof a scenario that is very unique: in the considered institutionin fact, teachers and students meet themselves only for the weeknecessary to complete the course. In this paper we present therequirements that led us to design and implement Taiorna. The Potential of Speculative Class-Loading http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=47.ZaJoHa07 Platforms such as Java provide many software engineeringbenefits. However, these benefits often come at the cost ofsignificant runtime overhead. In this paper we study thepotential for hiding some of that overhead by employingspeculative execution techniques. In particular, we studythe predictability of class-loading requests and the potentialbenefits of speculatively preloading classes in interactive applications. COOPER: Towards A Collaborative Open Environment of Project-centred Learning http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=45.BoTa2006 Nowadays, engineering studies are characterized by high mobility of students, lecturers and workforce and by the dynamics of multi-national companies where classes or students' teams composed of persons with different competencies and backgrounds, working together in projects to solve complex problems. Such an environment will become increasingly relevant in multinational universities and companies, and it has brought a number of challenges to existing e-learning technologies. COOPER is an ongoing project that focuses on developing and testing such a collaborative and project-centred leaning environment. This paper proposes a COOPER framework and shows its approaches to address the various research challenges. Bridging the Gap between SysML and Design Space Exploration http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=44.SivaPrev2006 In the last few years the embedded systems design disciplinerequired new design methodologies and new specification languagesto support system engineers in developing heterogeneous systemswhere hardware and software are combined. One of the emergingmodeling languages for system designers is the UML-based languagecalled Systems Modeling Language (SysML). One of the mostimportant tasks to be addressed early in the system design phaseis the Design Space Exploration (DSE). DSE helps designers indiscovering the optimal solutions among all possible combinationsafter mapping functional to architectural specifications. Thispaper describes an approach on how to use SysML for a DSE analysiswithin a system design phase. Particle Swarm Optimization with Discrete Recombination: An Online Optimizer for Evolvable Hardware http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=43.PeUpSa Self-reconfigurable adaptive systems have the possibility ofadapting their own hardware configuration. This feature providesenhanced performance and flexibility, reflected in computationalcost reductions. Self-reconfigurable adaptation requires powerfuloptimization algorithms in order to search in a space of possiblehardware configurations. If such algorithms are to be implementedon chip, they must also be as simple as possible, so the bestperformance can be achieved with the less cost in terms of logicresources, convergence speed, and power consumption. This paperpresents an hybrid bio-inspired optimization technique thatintroduces the concept of discrete recombination in a particleswarm optimizer, obtaining a simple and powerful algorithm, wellsuited for embedded applications. The proposed algorithm isvalidated using standard benchmark functions and used for traininga neural network-based adaptive equalizer for communicationssystems. ASIC Hardware Implementation of the IDEA NXT Encryption Algorithm http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=42.MaChen2006 Symmetric-key block ciphers are often used to provide dataconfidentiality with low complexity, especially in the case ofdedicated hardware implementations. IDEA NXT is a novel blockcipher family, which has many interesting features and is targetedto multimedia streaming encryption. Different values can beassigned to the hardware architecture parameters in order to scalethe security and the performance of the cipher. In this paper, weimplement the IDEA NXT algorithm in custom silicon, using acommercial technology library; different optimizations are appliedin order to satisfy different constraints in terms of latency andarea occupation, maintaining a high level of security. Aftergiving an overview of the IDEA NXT design, a discussion of theimplementation choices and trade offs is given, highlighting thesimilarities and the main differences with regards to other blockciphers. To the authors' knowledge this is the first paperdescribing such work. Scheduling Small Packets in IPSec-based Systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=41.TaFePi2006 IPSec is a suite of protocols that adds security to communicationsat the IP level. Protocols within the IPSec suite make extensiveuse of cryptographic algorithms. Since these algorithms arecomputationally very intensive, some hardware acceleration isneeded to support high throughput. IPSec accelerator performancemay heavily depend on the dimension of the packets to beprocessed. When packets are small, the time needed to transferdata and to set up the accelerator may exceed the one to processthe packets (e.g. to encrypt) by software. In this paper, wepropose a solution for this problem. High-level simulations andthe related results are provided to show the properties of thealgorithm. Power/Performance Tradeoffs in Bluetooth Sensor Networks http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=38.1110115 Low power consumption is a critical issue in wireless sensornetworks. Over the past few years, a considerable number of ad-hocarchitectures and communication protocols have been proposed forsensor network nodes. If on one hand custom solutions carry thegreatest power optimization potential, widespread communicationstandards guarantee interoperability and ease of connection withexisting devices. In this paper we present a variable-granularitypower model of Bluetooth, and apply it to variable-complexityoptimization scenarios, to devise optimal power managementpolicies. These policies, if backed by hardware implementationsthat are more power-aggressive than those available, could makethe protocol fit for a wider range of sensor networks than it istoday. Hardware/software partitioning of operating systems: a behavioral synthesis approach http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=40.1127983 In this paper we propose a hardware real time operatingsystem(HW-RTOS) solution that makes use of a dedicated hardware inorder to replace the standard support provided by the POSIX layerof a general purpose RTOS for implementing task synchronizationand scheduling. By redefining only the I/O APIs of the tasks, theHW-RTOS then takes care of the communication requirements of theoriginal application and also implements the task schedulingalgorithm. The new software application can then be compiledwithout any need for POSIX support. The main advantages aresmaller and faster executables. We present results that show how asmall hardware area, less than 10K gates, can result in a 15Xperformance improvement when the original software scheduler isreplaced by a dedicated HW-RTOS. Speeding Up AES By Extending a 32 bit Processor Instruction Set http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=39.1169233 Nowadays the need of speed in cipher and decipher operations ismore important than in the past. This is due to the diffusion ofreal time applications, which fact involves the use ofcryptography. Many co-processors for cryptography were studied andpresented in the past, but only few works were addressed to theenhancement of the instruction set architecture (ISA) of theembedded processor. This paper presents an extension of the ISA ofa 32 bit processor, that aims at speeding up the softwareimplementations of the AES algorithm. After the identification ofthe most frequently executed and the most time consuming sectionsof the algorithm, a set of dedicated instructions is designed inorder to improve the performances of the cipher operations. Wevalidate our instruction set extension by measuring the speed upfor different optimized implementations of AES using an ARMprocessor simulator, but the enhancements we propose are generalenough to be applied to almost all 32 bit processors. Hardware/Software Partitioning and Interface Synthesis in Networks On Chip http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=37.RegLaj2005 With deep sub-micron technology, chip designers are expected tocreate System-On-Chip (SOC) solutions by connecting differentIntellectual Property (IP) blocks using efficient and reliableinterconnection schemes. On chip networks are quite compellingbecause, by applying networking techniques to on-chipcommunication, they allow to implement a fully distributedcommunication pattern with little or no global coordination. Thisavoids the problems due to the difficulty of implementing futurechips with one single clock source and negligible skew. On theother hand, in order to benefit from the NOC communicationparadigm, designers should perform a careful functional mappingfor taking advantage of spatial locality, by placing the blocksthat communicate more frequently closer together. This reduces theuse of long global paths and the corresponding energy dissipation.In this work we show how a tile based NOC architecture can beexploited in order to support a flexible hardware/softwarepartitioning of a system-level specification and we present amethodology for the automatic synthesis of the hardware/softwareinterfaces. Automatic Synthesis of the Hardware/Software Interface in Multiprocessor Architectures http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=36.RegNacLaj2005 Although Moore's Law, in principle, enables a huge number ofcomponents to be integrated into a single chip, design methodsthat will allow system architects to put the components togetherto achieve cost, power and time-to-market targets are severelylacking. System-level design and optimization techniques cansignificantly reduce the design gap by providing solutions thatachieve correct-by-construction rather than thecorrect-by-iteration approach. This paper presents a programmaticinterface generation tool for automating the generation of thehardware/software interfaces in the context of multiprocessorSystems-On-Chips. The solutions that we present are of crucialimportance in a platform based design environment for building aflexible system with reusable IPs and CPU cores. Small-scale Variants of the Secure Hash Standard http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=35.MacRiv2005 In this paper we present effective small scale formulations of the Secure Hash Standard; we focus on the SHA-2 family of algo- rithms, introducing new compact instances baptized SHA-16, SHA-32, and SHA-64. These may be useful for computing hashes and Message Authentication Codes (MACs) on small platforms where only 8-bit pro- cessors are available, such as in the case of Radio Frequency Identifi- cation (RFID) devices and embedded systems. To prove the soundness of our scaling approach, we analyze the cryptographic properties of the proposed constructions in terms of adherence to the Strict Avalanche Criterion (SAC) and of robustness to birthday attacks, by also compar- ing the results with the expected values from random functions. As an additional contribution, we complete the theoretical results for the bal- ance property of random functions, thereby also calculating the expected robustness of the original SHA-2 family versus birthday attacks. Keywords: hash functions, balance, SAC, small scale, RFID. Quasi-Pipelined Hash Circuits http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=33.MacDad2005 Hash functions are an important cryptographic primitive. They areused to obtain a fixed-size fingerprint, or hash value, of anarbitrary long message. We focus particularly on the class ofdedicated hash functions, whose general construction is presented;the peculiar arrangement of sequential and combinational unitsmakes the application of pipelining techniques to theseconstructions not trivial. We formalize an optimization techniquecalled quasi-pipelining, whose goal is to optimize the criticalpath and thus to increase the clock frequency in dedicatedhardware implementations. The SHA-2 algorithm has been previouslyexamined by Dadda et al, with specific versions ofquasi-pipelining; a full generalization of the technique ispresented, along with application to the SHA-1 algorithm.Quasi-pipelining could be as well applied to future hashingalgorithms, provided they are designed along the same lines asthose of the SHA family. From a young academic institute a broad minded approach: the working and learning environment of the ALaRI Intranet tool (case study) http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=32.Salvioni2005 The aim of this paper is to present an innovative approach to theworking organization and learning environment, experimented atALaRI (Advanced Learning and Research Institute), the academicinstitute at the University of Lugano, in Switzerland, that, since1999, promotes research and education in embedded systems design .Through the introduction and the use of an ad-hoc intranet tool,new social and technological dynamics have been developing at theinstitute, integrating learning in presence with remotecooperation in a complex and distributed reality. Presenting thepractical experiences occurred within the ALaRI environment,through an analysis of the context needs and of the toolusability, the reader will discover the conditions and the reasonsthat have led to designing and implementing this intranetplatform, but also troubles and limitations of the intranet willbe explored from a usability and communication point of view. Fromthese experiences a plentiful research material arises toinvestigate new workflows and new ideas for virtual workplace Design and Synthesis of Reusable Platforms with Programmable Interconnects http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=34.BaLaPrevos2005 Platform based design requires to restrict the number of possibledesign choices in order to make it possible to come up withprogrammable solutions able to cope with the current complexity ofSystem-On-Chip (SoC) designs. Nowadays there is a generalconsensus toward the fact that an effective Electronic SystemLevel (ESL) design methodology must provide a specific support forplatform specification, hardware/software partitioning andprogrammatic interfaces synthesis in order to allow designers toexploit the potentials of state-of-the-art technologies. In thiswork we present a methodology that leverages on UML for buildingnew architectural platforms to be used to be used in the systemdesign process. We show how our methodology can allow to reusepre-designed platforms by adding new architectural components andby customizing their interconnections Application-Driven Optimization of VLIW Architectures: A Hardware-Software Approach http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=28.1049903 A large number of embedded multimedia applications arecharacterized by high instruction-level parallelism (ILP)expecially in the most critical internal loop bodies. Very LargeInstruction Word (VLIW) architectures Application SpecificInstruction Set Processors (ASIP) are best suited to exploit suchparallelism. Fast design space exploration and optimization ofVLIW architecture to a specific application target is increasinglybecoming the crucial factor to achieve higher efficiency designsin a relatively small amount of time. In this paper we propose anexample of VLIW architecture application driven optimization usingthe VEX (VLIW Example) system. A typical image processingapplication, the Imaging Pipeline, has been chosen as an example. A Methodology for Bridging the Gap between UML and Codesign http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=30.BaLaPre2005 The Unified Modeling Language (UML) is getting more popular amongsystem designers due to the need to raise the level of abstractionin system specifications. We present here a methodology thatintegrates UML specifications with a hardware/software codesignplatform. This work aims to give a contribution toward SoC DesignAutomation starting from system level specification down tohardware/software partitioning and integration. Flexible Power Modeling for Wireless Systems: Power Modeling and Optimization of two Bluetooth Implementations http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=31.1070384 A large number of embedded multimedia applications arecharacterized by high instruction-level parallelism (ILP)expecially in the most critical internal loop bodies. Very LargeInstruction Word (VLIW) architectures Application SpecificInstruction Set Processors (ASIP) are best suited to exploit suchparallelism. Fast design space exploration and optimization ofVLIW architecture to a specific application target is increasinglybecoming the crucial factor to achieve higher efficiency designsin a relatively small amount of time. In this paper we propose anexample of VLIW architecture application driven optimization usingthe VEX (VLIW Example) system. A typical image processingapplication, the Imaging Pipeline, has been chosen as an example. Speeding Security on the Intel StrongARM http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=29.SaMaRe2005 With the increasing use of portable and wireless devices in thebusiness and daily life, protecting sensitive information viaencryption is becoming more and more crucial. ALaRI (AdvancedLearning and Research Institute) has been conducting researchaimed at improving the execution of security algorithms inembedded systems. Thanks to a donation from Intel, ALaRI has beenable to develop several recommendations for implementing securityefficiently on the Intel StrongARM architecture. Interface Synthesis in Multiprocessing Systems-on-Chips http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=27.RegLaj2004 Although Moore's Law, in principle, enables a huge number ofcomponents to be integrated into a single chip, design methodsthat will allow system architects to put the components togetherto achieve cost, power and time-to-market targets are severelylacking. System-level design and optimization techniques cansignificantly reduce the design gap by providing solutions thatachieve correct-by-construction approach rather than thecorrect-by-iteration approach. This paper presents a programmaticinterface generation tool for automating the generation of thehardware/software interfaces in the context of multi-processorSystems-On-Chips. The solutions that we present are of crucialimportance in a platform based design environment for building aflexible system with reusable IPs and CPU cores. A Methodology for Testing IPSec-based Systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=26.BoFeDuPi2004 IPSec is a suite of protocols adding security to communications atthe IP level. This suite of protocols is becoming more and moreimportant as it is included as mandatory security mechanism inIPv6. This paper focuses on a methodology for testing IPSecimplementations. A UML model of the IPSec suite of protocols wasdeveloped. Test cases were obtained applying a coverage method onthe same model. UML Specifications Towards a Codesign Environment http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=24.LaBaPre2004 The Unified Modeling Language (UML) is receiving more and moreattention from system designers that need to model both hardwareand software related aspects of a system. On the ground of thegrowing consensus toward the need to raise the level ofabstraction in system specifications, we would like to present amethodology that aims to address embedded systems design issues atmultiple levels of abstraction and to support afunction/architecture codesign process. Our approach integratesUML with high-level synthesis and hardware/softwareco-verification techniques in order to provide an automated flowfor SoC design starting from system-level specifications down tohardware/software partitioning and integration. UML has beenselected because it is platform independent and helps team memberto share very efficiently relevant information during the variousdesign phases, while high-level synthesis helps to evaluateconstraints that the embedded system must satisfy: e.g.performance, power and cost starting from behavioralspecifications. UML System-Level Analysis and Design of Secure Communication Schemes for Embedded Systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=25.PiPreSte2004 In this work we develop a secure communication protocol in thecontext of a Remote Meter Reading (RMR) System. We first analyzeexisting standards in secure communication (e.g. IPsec, SSL/TSL)and existing implementations aimed at embedded systems withlow-power constraints in general (e.g. lwIP, lwBT, ZigBee). Then,starting from a Platform Independent Modeling (PIM), we develop aprotocol concept to address authentication, integrity andconfidentiality, also covering battery lifetime checking and theftmonitoring. Finally the protocol itself is described by means ofUML. Limited resource and low-power constraints are taken intoaccount when examining secure-transmission features. RMR is thusan example of an application requiring a light-weight protocolcombined with security features. One of the future objectives isto switch from the PIM description to PSM implementation. UML in an Electronic System Level Design Methodology http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=23.BaLaPre2004 The interest in System-On-Chip (SoC) design using the UnifedModeling Language (UML) has been growing significantly during thelast couple of years. In this paper we would like to present amethodology that aims to address embedded systems design issues atmultiple levels of abstraction and to support afunction/architecture codesign process. Our approach integratesUML with high-level synthesis and hardware/softwareco-verification techniques in order to provide an automated flowfor SoC design starting from system-level specifications down tohardware/software partitioning and integration. UML has beenselected because it is platform independent and helps team membersto share very efficiently relevant information during the variousdesign phases, while high-level synthesis helps to evaluateconstraints that the embedded system must satisfy: e.g.performance, power and cost starting from behavioralspecifications. The paper aims to give a contribution towards SoCDesign automation from System-level specification tohardware/software partitioning. Method of implementing one-to-one binary function and relative hardware device, especially for a Rijndael S-box http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=22.pat20040228482PATENT A method for implementing one-to-one binary functions defined on the Galois field GF(2.sup.8) is very useful for forming fast and low power hardware devices regardless of the binary function. The method includes decoding an input byte for generating at least one bit string that contains only one active bit, and logically combining the bits of the bit string according to the binary function for generating a 256-bit string representing a corresponding output byte. The 256-bit string is then encoded in a byte for obtaining the output byte. Efficient AES implementations for ARM based platforms http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=21.968073 The Advanced Encryption Standard (AES) contest, started by theU.S. National Institute of Standards and Technology (NIST), sawthe Rijndael [13] algorithm as its winner [11]. Although the AESis fully defined in terms of functionality, it requires bestexploitation of architectural parameters in order to reach theoptimum performance on specific architectures. Our workconcentrates on ARM cores [1] widely used in the embeddedindustry. Most promising implementation choices for the common ARMInstruction Set Architecture (ISA) are identified, and a newimplementation for the linear mixing layer is proposed. Theperformance improvement over current implementations isdemonstrated by a case study on the Intel StrongARM SA-1110Microprocessor [2]. Further improvements based on exploitation ofmemory hierarchies are also described, and the correspondingperformance figures are presented. The ALaRI Intranet: a Remote Collaboration Platform for a Worldwide Learning and Research Network http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=17.NegBon2004 The ALaRI Intranet is a web-based remote learning, tutoring andcollaboration platform that has been developed within theANTITESYS project. ANTITESYS is a EU project involving some of themajor academic and industrial institutions in Europe; its aim isto foster academic-industrial collaboration in the field ofembedded systems whilst forming selected students by means of aone-year master program, held at the ALaRI institute sited inLugano, Switzerland. What makes this scenario very unique lies inthe roles played by the industrial and academic partners ofANTITESYS. The two sides contribute to the training of the masterstudents in different ways, but both share the problem ofintegrating remote and face-to-face meetings with the students andwith the other stakeholders. In this paper, we present therequirements gathering process and the design phase of the ALaRIIntranet, plus some details about its actual implementation andsome initial usage figures. An ASIC design for a high speed implementation of the hash function SHA-256 (384, 512) http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=20.989053 An implementation of the hash functions SHA-256, 384 and 512 ispresented, obtaining a high clock rate through a reduction of thecritical path length, both in the Expander and in the Compressorof the hash scheme. The critical path is shown to be the smallestachievable. Synthesis results show that the new scheme can reach aclock rate well exceeding 1 GHz using a 0.13?m technology. UML-based specifications of an embedded system oriented to HW/SW partitioning: a case study http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=19.1016432 The Unified Modelling Language (UML) is a language for specifying,visualizing, constructing, and documenting the artefacts ofsoftware systems, as well as for modeling business and othernon-software systems. The UML represents a collection of bestengineering practices that succeeded in modelling large andcomplex systems; it is interesting to envision its extension forspecification and modelling of hardware-software systems as well,starting with the first design phases, i.e. prior tohardware-software partitioning. This paper analyses thedevelopment of a solution able to define the hardware/softwarepartitioning of an embedded system starting from its UML systemspecifications. The case study chosen is a Wireless Meter Reader(WMR) dedicated to the measurement of energy consumption. Thedesigners evaluated the hardware/software partitioning solution interms of cost, performance, size and consumption. The Design of a High Speed ASIC Unit for the Hash Function SHA-256 (384,512) http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=18.969266 After recalling the basic algorithms published by NIST forimplementing the hash functions SHA-256 (384, 512), a basiccircuit characterized by a cascade of full adder arrays is given.Implementation options are discussed and two methods for improvingspeed are exposed: the delay balancing and the pipelining. Anapplication of the former is first given, obtaining a circuit thatreduces the length of the critical path by a full adder array. Apipelined version is then given, obtaining a reduction of two fulladder arrays in the critical path. The two methods are afterwardscombined and the results obtained through hardware synthesis areexposed, where a comparison between the new circuits is alsogiven. FSM--based power modeling of wireless protocols: the case of bluetooth http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=16.1013323 The proliferation of pervasive computing applications relying onbattery-powered devices and wireless connectivity is posing greatemphasis on the issue of power optimization. While node-levelmodels and approaches have been widely discussed, a problemrequiring even greater attention is that of power associated withthe communication protocols. We propose a high-level modelingmethodology based on Finite State Machines useful to predict theenergy consumption of given communication tasks with very lowcomputational cost, which can be applied to any protocol. We usethis methodology to create a power model of Bluetooth that wecharacterize and validate experimentally on a real implementation. An Application Level Synthesis Methodology for Multidimensional Embedded Processing Systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=15.AlGaSte2003 The implementation of multidimensional systems in embedded devicesis a major design challenge due to the high algorithmic complexityof the applications. The authors suggest a novel application-levelsynthesis methodology for those parts of the embedded applicationwhich are characterized by being Lebesgue measurable (thecomputation involved in signal and image processing systems isLebesgue measurable). The synthesis methodology, based onperturbation analysis, supports the design of analog, digital, ormixed implementations at the very high level of the system designcycle. The outputs of the methodology are quantitative indicationsregarding the maximum performance loss tolerable by the subsystemscomposing the application. Such information, augmented with astochastic description of the tolerated perturbations, can berelated to lower synthesis levels and guide the designer towardthe final implementation of the embedded device. The perturbationanalysis is based on randomized algorithms for an effectiveevaluation of the performance loss of the computational flow onceaffected by behavioral perturbations and a Tabu-search-inspiredoptimizing algorithm for distributing the tolerable performanceloss at the system output along the computational subsystemscomposing the possibly multidimensional processing. UML-based Specifications of an Embedded System Oriented to HW/SW Partitioning: a Case Study http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=14.MiMaMaBaKoPre2003 The Unified Modelling Language (UML) is a language for specifying,visualizing, constructing, and documenting the artefacts ofsoftware systems, as well as for modelling business and othernon-software systems. The UML represents a collection of bestengineering practices that succeeded in modelling large andcomplex systems; it is interesting to envision its extension forspecification and modelling of hardware-software systems as well,starting with the first design phases, i.e. prior tohardware-software partitioning. This paper analyses thedevelopment of a solution able to define the hardware/softwarepartitioning of an embedded system starting from its UML systemspecifications. The case study chosen is a Wireless Meter Reader(WMR) dedicated to the measurement of energy consumption. Thedesigners evaluated the hard-ware/software partitioning solutionin terms of cost, performance, size and consumption. Intelligent, low-power and low-cost measurement system for energy consumption http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=12.MiMaMaPreKoBa2003 In the area of utility measurement systems, there is increasingawareness to the importance of using intelligent and secure meterreaders. The aim is not simply that of reducing operational costs;aspects such as availability of real-time determination ofconsumption (mainly in the case of energy meters, but potentiallyalso for water consumption etc.) are relevant not only for actionssuch as real-time billing but also in view of an increasingenvironmental awareness leading to 'preferential' billing inparticular times of the day or of the week and requiringavailability of fine-grained statistics. All these actions in turninvolve the requirement of data integrity; when utilities otherthan power providers are considered, the device should bebattery-powered (and very long battery life must be granted), sothat low-power design becomes a further requirement while beingpermanently either in active or in standby mode; moreover, notbeing connected to the power network means that wirelessconnections for transmitting and receiving information must betaken into account. Finally, these devices should be madeavailable to the general public and thus be low-cost ones. Thispaper describes how all the above constraints have been analyzedin the design of a wireless meter reading system. Hardware Implementation of the Rijndael Sbox: a Case Study http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=13.MaBer2003 The Rijndael algorithm was officially selected as the AdvancedEncryption Standard in 2001 and will replace the DES in allapplications, including Smart Card based products. For this kindof platform, a compact, area efficient hardware implementation ofthe algorithm is highly desirable. This paper describes such animplementation, which we have based on GF(28) finite fielddecomposition. We present our results from mappings on theSTMicroelectronics ASIC technology library and discuss area,timing and power consumption figures. About the Performances of the Advanced Encryption Standard in Embedded Systems with Cache Memory http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=11.BiMaBeBreZaFra2003 Modern networked embedded systems represent a growing marketsegment in which security is becoming an essential requirement.The Advanced Encryption Standard (AES) specification is becomingthe default choice for such type of systems; however, a propersoftware implementation of AES is of fundamental importance inorder to achieve significant performance. Current implementationspresented in literature differ in terms of the amount of look-uptables used for pre-computing the functions of theencryption/decryption phase. This raises some questions regardingwhich AES implementation is optimal for a specific systemconfiguration that, up to now, have been only empirically solved.In this work, we present an analytical model to study and evaluatethe performance of the possible AES implementations in the earlyphases of system development. We then show that the proposedhigh-level timing model captures, with significant accuracy, theactual performance of current AES applications and thus it can beused for early evaluation of optimal AES implementations and tosupport the design space exploration phase. Validating experimentshave been carried out on the Lx architecture, a scalable andcustomizable VLIW architecture developed by STMicroelectronics andHP Labs. Some final considerations are eventually reported aboutthe relevant characteristics of the analyzed implementations andthe role of the cache memory. Method and circuit for data encryption/decryption http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=10.pat20030068036PATENT Data are converted between an unencrypted and an encrypted format according to the Rijndael algorithm, including a plurality of rounds. Each round is comprised of fixed set of transformations applied to a two-dimensional array, designated state, of rows and columns of bit words. At least a part of said transformations are applied on a transposed version of the state, wherein rows and columns are transposed for the columns and rows, respectively. A Methodology for efficient architectural exploration of energy-delay trade-offs for embedded systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=9.SaSaSciSiZaZa2003 The main goal of this paper is to identify the best architectureof an embedded system by considering at the same time energy anddelay, avoiding the comprehensive analysis of the architecturaldesign space. We adopt the Energy-Delay Product (EDP) as theevaluation metric to compare the alternative architectures of thetarget system. The paper analyzes an extended adaptive randomsearch algorithm (ADGREED) to efficiently explore thearchitectural design space. The ADGREED algorithm is apseudo-random optimization algorithm that combines the bestpotentialities of the adaptive random search (ADRAS) and theGreedy deterministic algorithm. The analysis has been carried outthrough the architectural optimization of the memory subsystem ofa real-word embedded system executing the set of Mediabenchbenchmarks for multimedia applications. The reported experimentalresults have shown a reduction up to one order of magnitude of thenumber of design alternatives analyzed during the explorationphase, while maintaining very high accuracy. Efficient Software Implementation of AES on 32-Bit Platforms http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=8.752733 Rijndael is the winner algorithm of the AES contest; therefore itshould become the most used symmetric-key cryptographic algorithm.One important application of this new standard is cryptography onsmart cards. In this paper we present an optimisation of theRijndael algorithm to speed up execution on 32-bits processorswith memory constraints, such as those used in smart cards. Firsta theoretical analysis of the Rijndael algorithm and of theproposed optimisation is discussed, and then simulation results ofthe optimised algorithm on different processors are presented andcompared with other reference implementations, as known from thetechnical literature. System-level design of embedded applications by UML: the Wireless Meter Reading case http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=7.MiMaMaPre The Unified Modeling Language (UML) is a language for specifying, visualizing, constructing, and documentingthe artifacts of software systems, as well as for business modeling and other non-software systems. TheUML represents a collection of best engineering practices that have proven successful in the modeling of largeand complex systems; it is interesting to envision its extension for specification and modeling of hardwaresoftwaresystems as well, since the first design phases, i.e. before hardware-software partitioning has been effected.This paper describes how UML has been used in the design of a wireless meter reading system consistingof hardware and software components. Energy Estimation and Optimization of Embedded VLIW Processors based on Instruction Clustering http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=6.BoSaSciSiZaZa2002 Aim of this paper is to propose a methodology for the definitionof an instruction-level energy estimation framework for VLIW (VeryLong Instruction Word) processors. The power modeling methodologyis the key issue to define an effective energy-aware softwareoptimisation strategy for state-of-the-art ILP (Instruction LevelParallelism) processors. The methodology is based on an energymodel for VLIW processors that exploits instruction clustering toachieve an efficient and fine grained energy estimation. Theapproach aims at reducing the complexity of the characterizationproblem for VLIW processors from exponential, with respect to thenumber of parallel operations in the same very long instruction,to quadratic, with respect to the number of instruction clusters.Furthermore, the paper proposes a spatial scheduling algorithmbased on a low-power reordering of the parallel operations withinthe same long instruction. Experimental results have been carriedout on the Lx processor, a 4-issue VLIW core jointly designedby HPLabs and STMicroelectronics. The results have shown an average error of 1:9%between the cluster-based estimation model and the reference design, with a standarddeviation of 5:8\%. For the Lx architecture, the spatial instruction schedulingalgorithm provides an average energy saving of 12\% An Application Level Synthesis Methodology for Embedded Systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=5.AlGaSte2002 Time-to-market, cost and power consumption requirements arepushing research in embedded systems towards the development ofsophisticated CAD environments. The paper suggests a novelsynthesis methodology for embedded devices based on an applicationlevel perturbation analysis. The methodology is based onrandomised algorithms for evaluating the effective performanceloss of the computational flow induced by perturbations and aTabu-search optimising algorithm for distributing the tolerableperformance loss along the computational subsystems composing thecomputation. An Instruction-Level Methodology for Power Estimation and Optimization of Embedded VLIW cores http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=4.BonaSaSciSiZaZa2002 The overall goal of this work is to define an instruction-levelpower macro-modeling and characterization methodology for VLIWembedded processor cores. The approach presented in this paper isa major extension of the work previously proposed in [1-3],targeting an instruction-level energy model to evaluate the energyconsumption associated with a program execution on a pipelinedVLIW core. Our first goal is the reduction of the complexity ofthe processor's energy model, without reducing the accuracy of theresults. The second goal is to show how the energy model can befurther simplified by introducing a methodology to automaticallycluster the whole Instruction Set with respect to their averageenergy cost, in order to con verge to an highly effective designof experiments for the actual characterization task. The paperdescribes also the application of the proposed model to a realindustrial VLIW core (the Lx Architecture developed by HP Labs andSTMicroelectronics), to validate the effectiveness and accuracy ofthe proposed methodology. The 'Smart Card System' project: From plastic money to mobile transaction support http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=2.BoSaMa2001 Will be added later Efficient C implementation of the ECC and AES cryptographic systems http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=3.CaPoMaMaBeBreFra2001 Development cost and size estimation starting from high-level specifications http://www.alari.ch/Research/Publications/bebop/index.php?action=showcategory&by=ID&pub=1.371690 This paper addresses the problem of estimating cost anddevelopment effort of a system, starting from its complete orpartial high-level description. In addition, some modifications toevaluate the cost-effectiveness of reusing VHDL-based designs, arepresented. The proposed approach has been formalized using anapproach similar to the COCOMO analysis strategy, enhanced by aproject size prediction methodology based on a VHDL function pointmetric. The proposed design size estimation methodology has beenvalidated through a significant benchmark. The LEON-Imicroprocessor, whose VHDL description is of publicdomain.