Applications

Bringing 32-Bit Performance to 8- and 16-Bit Applications

By Reinhard Keil, Director of MCU Tools, ARM Germany GmbH, Shawn Prestridge, Senior Field Applications Engineer, IAR Systems, Sean Newton, Field Applications Engineering Manager, STMicroelectronics

Today's embedded applications are being called upon to provide an increasing number of capabilities. More and more devices need to be connected, require greater precision, must offer a graphics-based interface with touch capabilities, utilize sophisticated signal processing, and support multimedia playback.

In the past, developers were compelled by cost constraints to base their designs on 8- and 16-bit architectures that limited performance. Now, with the availability of next-generation MCUs like the STM32 F0 that provide 32-bit performance at 8-bit budget pricing, OEMs can bring substantial value to end-users without having to compromise functionality. In addition, powerful development tools like Keil's MDK-ARM and IAR Embedded Workbench enable developers new to 32- bit programming to immediately exploit the full capabilities of the STM32 F0 architecture.

The 32-bit Advantage

There are several ways in which the STM32 F0 lowers product cost compared to 8- and 16-bit-based designs. Specifically, because these MCUs tend to be based on legacy architectures, they have many limitations that slow development by forcing designers to work around the architecture, so to speak. For example, to complete a 16 x 16 multiplication for a processing algorithm, a 16-bit CPU requires four multiplies and several additions, depending upon the implementation. An 8-bit CPU would require significantly more cycles. With the STM32 F0, this takes a single instruction.

The result is code that makes better utilization of MCU resources, leading to faster operation, more performance per MHz, higher code density, and greater power efficiency. Since each instruction does more per clock cycle, applications can be written using less code. In addition to accelerating development, shorter code is easier to debug as well. Together, all of these benefits lead to lower system cost.

Cost, however, is only one of the numerous advantages the STM32 F0 has over 8- and 16- bit architectures. The STM32 F0 is a full embedded MCU built using the same STM32 DNA that the rest of the STM32 family has, including excellent real-time performance, DMA, high-resolution ADC and DAC peripherals, motor control timers, and connectivity interfaces. These integrated capabilities bring tremendous efficiency to cost-sensitive designs in a way that limited 8- and 16-bit MCU architectures cannot (see Figure 1).

Image listing the STM32 F0 Series' key features.
It is a full embedded MCU built using the same STM32 DNA that the rest of the STM32 family has and offers tremendous
efficiency to cost-sensitive designs in a way that limited 8- and 16-bit MCU architectures cannot

For example, the availability of a 32-bit bus not only speeds data transfers and increases computing performance, it improves system reliability. Consider the challenge of reading a 12-bit DAC using an 8-bit bus where the CPU has to read the DAC twice to capture the entire sample. If an interrupt occurs between these reads, the DAC data may be overwritten by the next sample before the interrupt is completed and the second read can be executed. To prevent this, developers have to manually disable interrupts for every such "atomic" operation in an application. If even one instance is missed, this creates a potential for an intermittent error that will be extremely difficult to resolve.

DMA: Moving Data Efficiently

The STM32 F0 is a modern architecture integrating the latest in processing, power, and debugging technology. For example, multiple low power modes extend greater control over power consumption to achieve longer operating life for battery-operated and portable devices. In addition, the STM32 F0 offers advanced features, including full Direct Memory Access (DMA) and the ability to shut down the ADC between samples to further increase performance while lowering power consumption.

In general, 8-bit MCUs don't have the powerful peripherals that higher performance MCUs tend to have. For example, DMA has become an essential peripheral for applications that need to move a great deal of data, whether as part of a processing algorithm, receiving data from an interface, playing back audio, or transferring graphics to the display. In a traditional 8-bit architecture, each word of data has to be moved by the CPU. In addition, pointers need to be updated and a loop managed. Thus, every 8-bits of data takes several cycles of CPU time to move.

With the DMA in the STM32 F0, an entire block of data can be moved without involving the CPU. After the program configures the transfer, the DMA manages moving the data in the background. In fact, the CPU can drop into a low power sleep mode while it waits for the transfer to complete. As a result, data transfers do not consume unnecessary CPU cycles and require less power to complete than for 8- and 16-bit architectures.

The availability of a DMA controller can also greatly simplify and accelerate product development. Consider reading data off of a high-speed data interface such as I²C. Because of the load on the CPU, 8-bit developers have to work around the MCU's architecture, using many interrupts to utilize the time between data reads. With the STM32 F0, the CPU operates independently of the interface, allowing developers to program the CPU for other tasks without having to worry about missing an interrupt or losing data.

Because the STM32 architecture uses an internal bus matrix, the DMA can be used in conjunction with each of the different on-chip memories as well as many of the peripherals. For example, the DMA can be configured to sample the ADC regularly over a period of time: a timer triggers the DMA to read the ADC and store the result in memory without involving the CPU. Once the operation is complete, the ADC shuts down until the next sample time. In fact, the bus matrix combined with a 5-channel DMA enables the STM32 F0 to support execution of code from Flash in parallel with other memory-memory, peripheral-memory, or memory-peripheral DMA transfers.

There are many tools to assist developers in taking advantage of the STM32 F0's DMA capabilities without requiring them to become DMA experts. The ARM DSP Cortex Microcontroller Software Interface Standard (CMSIS) library, for example, provides signal processing functionality that has been optimized for the STM32 F0 and takes full advantage of the DMA.

An intelligent compiler can also help developers exploit DMA technology to its fullest advantage. IAR Embedded Workbench, for example, offers a feature that will automatically rearrange program data to maximize the use of the DMA. This enables developers to achieve high efficiency without having to put much forethought into how to layout the data space. The compiler achieves this by analyzing how data is used by the application. Consider a program that copies two different data structures using DMA. Each copy operation requires a separate DMA operation. However, after the compiler collocates the data structures in memory, they can be copied with a single DMA transfer.

Note that each MCU may use the DMA in a slightly different manner. Keil's MDK-ARM, for example, abstracts how the DMA is used from the application through an API that prevents code from being tied to a particular processor. This enables developers to migrate applications to other STM32 devices and know that code utilizing the DMA will still perform optimally.

Writing 32-bit Code

Moving from 8-bit to 32-bit assembly is not trivial, given the vastly different instructions 32-bit architectures offer; i.e., single-instruction, multiple data (SIMD) instructions work on multiple data to vastly accelerate processing. Even moving between 16-bit architectures is challenging given that the peripherals can differ and impact how application code is written. The STM32 F0 architecture facilitates a smooth migration to 32-bits. The ability to develop in embedded C reduces the learning curve of moving to a new architecture. In many cases, engineers are already familiar with the ARM Cortex-M architecture. Developers can further ease migration by using a tool chain they are already familiar with, such as IAR Embedded Workbench and Keil's MDK-ARM. Finally, developing for the STM32 F0 is simplified through the use of the ARM CMSIS libraries that abstract much of the underlying hardware from the application.

Moving to the STM32 F0 will result in a substantial reduction in code size because of the density possible with 32-bit instructions, on the order of 30% (see Figure 2). With its 32-bit address space, the STM32 F0 also eliminates addressing and paging limitations that complicate memory management in 8-bit designs. For example, data sets can be larger than a single page and there are no longer "far" addressing penalties. The use of object-oriented constructs, as is common with modern programming and modeling tools, can also be implemented without disruptive fragmentation.

Chart of STM32 F0 code size in bytes for various benchmark applications.

Without question, the best compiler is the human brain. Given enough time, a person can create a highly optimized program that no compiler can beat. Programming in assembly can also be more efficient than a C version of the same program. Time, however, is one of the resources of which developers don't have a surplus. In addition, hand-written code can be extremely fragile; if the product specs change in a material way, many of a programmer's optimizations will need to be completely reevaluated.

The reality is that Keil's MDK-ARM and IAR Embedded Workbench are smart enough to make excellent coding choices that might take a person weeks to evaluate. For example, how data is laid out impacts performance. There's also the challenge of balancing optimization techniques like loop unrolling to memory footprint. A compiler can make these decisions for an entire program in just minutes. Each of these tools offers numerous optimization options it can perform automatically for the STM32 F0 architecture that are significantly different than those typical with 8- and 16-bit MCUs. These options include data-flow optimizations such as common sub-expression elimination and loop optimizations such as loop combining and distribution. They also include advanced techniques like branch speculation and executing code out of sequence.

These development tools for the STM32 F0 give excellent results. Compiler efficiency compared to human coding has been estimated at 97%. Put another way, the cost of achieving that last 3% is on the order of weeks to months of development time. In addition, if a major design change is required, the compiler can complete a new set of optimizations with just a simple recompile.

As a modern architecture, the STM32 F0 is supported by similarly modern tools that utilize the latest advancements in compiler, debugger, and middleware technology to reduce development time and effort considerably. Being based on the Cortex-M architecture, the STM32 F0 is backed by a larger ecosystem of tools and production-ready software than any other MCU architecture on the market. In addition, for many applications where the code base is small, the tools may be effectively free. For example, both IAR Embedded Workbench and Keil's MDK-ARM are free when used for programs under 32 KB, thus enabling 32-bit design with a low initial investment.

Advanced Debugging

While the ability to design demanding applications quickly is important, developers need debugging capabilities that can abstract the complexity of applications while still providing full visibility and control during run-time operation. In addition, many embedded markets, including medical and industrial, require that application software be certified as well.

The integrated debug capabilities of the STM32 F0 provide many advanced capabilities that offer a superior debug experience compared to old-fashioned 8- and 16-bit architectures. For example, the STM32 F0 architecture features ARM's Coresight technology to help developers analyze, optimize, and verify program execution with minimal effort and cost.

Coresight represents the latest in advanced debugging technology. Traditional MCUs offer only limited run/stop debug capabilities. To achieve greater visibility, an in-circuit emulator on the order of $1000s may be required, and a different pod will be required for each MCU in use. A few of the benefits Coresight provides which other MCU architectures do not include on-the-fly read/write access and trace capabilities at the instruction, data, and application level. As implemented in the STM32 F0, Coresight also supports up to 4 hardware breakpoints and 2 watchpoints without requiring the use of intrusive monitoring techniques that can skew performance.

Developers also have a choice of many low-cost debug adapters for the STM32 F0. For example, the STLink in-circuit debugger and programmer, which links the STM32 F0 target board to a PC via USB, is $25. For more advanced debugging, IAR Systems has the I-Jet debugger while Keil offers developers its ULINK2 and ULINKpro debuggers.

These debuggers offer powerful capabilities that are often not available for 8- and 16-bit designs. Keil MDK-ARM tools, for example, enable comprehensive code coverage, execution profiling, and performance analysis to ensure maximum performance efficiency. With the I-jet debugger, IAR Systems is able to offer non-intrusive power consumption monitoring at the board- and chip-level. Such "power debugging" enables developers to uncover opportunities to utilize and tune hardware to achieve the highest power efficiency.

STM32 F0 Features

STM32 F0 MCUs have been designed with real-time operating system (RTOS) and kernel support in mind to enable much tighter integration with RTOSes like Keil's royalty-free RTX. In a typical 8- or 16-bit MCU, for example, the RTOS and application share the stack, and complex nesting problems can arise that overflow the stack and crash the system. The only way to avoid such issues is to overprovision the stack. The STM32 F0, in contrast, has two stacks: one for the application and one for the RTOS. This prevents applications from compromising RTOS integrity. In addition, RAM overhead is much lower.

Other companies basing MCUs on the Cortex-M0 architecture integrate only the minimum capabilities an MCU requires. ST is the only company to offer Cortex-M0-based MCUs with:

Easy Communication: Using the integrated DMA controller, the STM32 F0 can support continuous I²C at a rate of 1 Mbps without bogging down the CPU. This data rate isn't possible to achieve on an 8- or 16-bit MCU that does not support DMA.
Advanced Digital and Analog Capabilities: The STM32 F0 integrates a wide range of IP to facilitate the design of sensing and control systems. For example, advanced timers enable the accurate output of complex AC waveforms. On-chip comparators simplify the design of sensors. The 12-bit, multi-channel ADC operating at up to 1 MSample/s allows for fast and precise data acquisition, as well as improves system responsiveness to external events. Advanced timing control is enabled using the 32-bit and 16-bit PWM timers with 17 capture/compare I/O mapped onto up to 28 pins.
Safety Ready: With shrinking process technologies and larger memories combined with frequently changing data, bit errors from cosmic rays can occur. For systems that must meet stringent safety compliance standards, the STM32 F0 performs real-time, hardware-based RAM parity checking and 16-bit CRC verification for Flash to ensure the integrity of memory. RAM checks are performed automatically whenever memory is accessed. Flash verification is self-managed, enabling developers to confirm program integrity upon startup and when updating firmware to verify that no bits have been flipped since they were written.
Reliability: The STM32 F0 integrates two watchdog timers, one of which is a windowed watchdog timer. These timers, which can operate in low power modes as well, provide a higher level of reliability not available in most 8- and 16-bit MCUs. A Clock Security System (CSS) enables systems to switch to internal RC-based clocking in case of external clock failure to ensure systems can shut down gracefully rather than catastrophically.
Optimized Communications: The STM32 F0 supports the HDMI Consumer Electronics Communication (CEC) protocol. Important for devices targeted for consumer markets, this peripheral enables devices to have smart control over multiple HDMI lines. For devices needing remote control capabilities, ST provides a full infrared firmware library.
Memory: Memory capacity ranges from 16 KB to 128 KB Flash
1.8V Ready: The STM32 F0 can interface directly to 1.8 to 3.6 V-based devices. This eliminates the need for additional conditioning circuitry 8- and 16-bit MCUs require.
Capacitive Touch Sensing: To add touch to 8- and 16- bit MCU-based designs, a second processor is typically required. With the STM32 F0, developers can easily introduce capacitive touch sensing to applications, with up to 18 keys and slider/wheel configurations, all with a single chip. In addition, touch sensing can be implemented with zero CPU loading when using the charge transfer method.

Overall, the STM32 F0 provides an optimal balance of cost, performance, and peripherals for embedded applications (see Figure 3). Rather than tie developers to a proprietary architecture with limited tools and support, ST offers the industry's widest Cortex-M portfolio with more than 300 compatible devices across the entire STM32 family.

Chart of STM32 F0 benchmark positioning compared to others

With code-, pin-, and peripheral-compatibility across the STM32 family, developers can leverage Cortex-M0-based designs to M3- and M4-based MCUs with unparalleled flexibility. For example, applications designed using the STM32 F0 are easily migrated to the STM32 F2 and STM32 F4. With Keil's MDK-ARM and IAR Embedded Workbench, developers just need to change the MCU selection and the compiler handles all of the details by recompiling the code. This enables developers to easily migrate to an MCU with more performance, memory, and peripherals without rewriting the application. As a result, developers can leverage the same application and tool chain across an entire product line and a variety of MCUs.

Similarly, developers have the option of designing code on the STM32 F2 or F4 with the intention of later downsizing to the STM32 F0. This enables design to take place on a platform with the highest performance and memory to accelerate proof-of-concept design. Once the design has settled, developers can optimize it for the STM32 F0.

With the STM32 F0, ST offers a compelling alternative to 8- and 16-bit devices. For the same price, developers get more performance, higher resolution peripherals, better tools, wider support, accelerated development, and faster time-to-market. To explore how the new STM32 F0 can bring the benefits of 32-bit technology to your designs, the STM32 F0 Discovery Kit is available now for less than $10.

Applications

Technologies

Bringing 32-Bit Performance to 8- and 16-Bit Applications

The 32-bit Advantage

DMA: Moving Data Efficiently

Writing 32-bit Code

Advanced Debugging

STM32 F0 Features

Mouser Electronics®

Company

Resources

Support

Connect with Us