您的位置:首页 > 其它

An overview of the ARM architecture

2012-07-23 14:43 453 查看
ARM architecture versions
The ARM architecture has been through several revisions since its emergence in the mid 1980’s. The most recent version, ARMv7, is implemented in the Cortex range of processors. The architecture is defined in three “profiles”, the ‘A’ profile or Application-class
processors, ‘R’ for Real-time and ‘M’ for microcontroller devices.

ARMv7-A is currently implemented in the Cortex-A5, Cortex-A8, Cortex-A9 and Cortex-A15 processors and supports fully-featured application class devices capable of running platform Operating Systems such as Linux, Windows Mobile etc. It provides full virtual
memory support and optional media processing, security and virtualization extensions.

ARMv7-R is available in the Cortex-R4 and Cortex-R5 and is targeted at applications which require hard, predictable real-time performance. Devices incorporating a Cortex-R4 processor are used, for instance, in engine management systems, hard disk drive controllers
and mobile baseband processors.

ARMv7-M is used in microcontroller-type devices, principally those based around the Cortex-M3 and Cortex-M4 processors. This profile supports a subset of features in the v7-A and v7-R profiles aimed at enabling devices which maximize power efficiency and
minimize cost. The architecture incorporates many features common in the microcontroller world e.g. bit-banding, hardware interrupt pre-emption etc.

In this document, we assume that the target ARM platform is built around an ARMv7-A processor. Unless explicitly stated otherwise, we refer to the ARMv7-A architecture including the security, advanced SIMD, floating point, Java acceleration and multiprocessing
extensions as described in section 2.2 below.

In addition, we consider implementations of the ARMv7-A architecture which include the 40-bit physical addressing (LPAE) and virtualization extensions described in sections 2.2.5 and 2.2.6 below. These extensions are supported by the ARM Cortex-A15 processor.

2.2 Architecture ARMv7-A extensions

There are several optional extensions to architecture ARMv7-A. For further details of these extensions and their intended use, refer to the architecture documentation.

2.2.1 Security

The TrustZone security extensions were introduced in architecture v6K and are an optional extension to the ARMv7-A profile. They introduce an additional operating mode (Monitor mode) with associated banked registers and an additional “secure” operating state.

2.2.2 Advanced SIMD and Floating Point

Both floating point (VFP) support and SIMD (NEON) are optional extensions to the ARMv7-A profile. They may be implemented together, in which case they share a common register bank and some common instructions. Almost all NEON implementations also include
floating point support.

2.2.3 Java acceleration

Two architectural extensions are available for accelerating Java and other dynamically compiled languages. Both Jazelle DBX (acceleration for Java only by implementing hardware support for execution of bytecodes) and Jazelle RCT (an extension to the Thumb
instruction set providing acceleration for a wider set of dynamically compiled languages) are a required part of the ARMv7-A architecture (though “trivial” implementations are possible).

Note that these two extensions are not often used in ARMv7-A devices and Jazelle RCT is now deprecated. The Coretx-A15 processor provides a trivial implementation – see the documentation for further details.

2.2.4 Multiprocessing

These provide for synchronization and coherency across a “cluster” of cores, operating either in Asymmetric or Symmetric Multi-Processing mode. This extension is currently supported by the Cortex-A5MP, Cortex-A9MP and Cortex-A15 processors.

2.2.5 40-bit physical addressing

The Large Physical Address Extensions (LPAE) are an optional extension to the ARMv7-A profile. This extension to the VMSAv7 virtual memory architecture allows the generation of 40-bit physical addresses from 32-bit virtual addresses.

LPAE is supported by the Cortex-A15 processor.

2.2.6 Virtualization

The virtualization extensions introduce an extra mode (Hypervisor mode) with associated banked registers. A new Hyp exception can be used to trap software accesses to hardware and configuration registers, thus allowing implementation of an efficient hardware-assisted
virtualization solution.

These extensions are supported by the Cortex-A15 processor.

2.3 Programmer’s model

The description presented here is standard for the ARMv7-A and ARMv7-R architecture profiles. The ARMv7-M microcontroller profile has a significantly different model for modes and exceptions.

2.3.1 Standard features

1. Operating modes

The ARM processor supports up to nine operating modes. All of these, with the exception of User mode, are privileged. Seven modes (Supervisor, Undefined, Abort, FIQ, IRQ, Hyp and Monitor) are associated with handling particular types of exception events.
Applications generally run either in User mode (unprivileged) with the operating system running in Supervisor mode.

Hyp mode is only present in processors supporting the Virtualization extensions (this includes the Cortex-A15); Monitor mode is only in processors supporting the Security extensions (currently all ARMv7-A processors).

2. Register set

The ARM register set consists of a maximum 43 general-purpose registers, 16 of which are usable at any one time. The subset which is usable is determined by the current operating mode – see diagram below.

In addition to the general purpose registers, the CPSR (Current Program Status Register) holds current status, operating mode, instruction set state, ALU status flags etc.

Seven of the modes also provide an SPSR (Saved Program Status Register) which is used for taking a copy of processor state on entry to an exception handler.

The diagram shows the standard ARMv7-A register set. Where registers are not shown under a particular mode, the corresponding User mode register is used.

Exception Modes
User/System

FIQ

IRQ

Abort

Undef

SVC

Monitor

Hyp

R0

Shared with User Mode

Shared with User Mode

Shared with User Mode

Shared with User Mode

Shared with User Mode

Shared with User Mode

Shared with User Mode

R1

R2

Security Extensions Only
Virtualization Extensions Only
R3

R4

R5

R6

R7

R8_usr

R8_fiq

R9_usr

R9_fiq

R10_usr

R10_fiq

R11_usr

R11_fiq

R12_usr

R12_fiq

SP_usr

SP_fiq

SP_irq

SP_abt

SP_und

SP_svc

SP_mon

SP_hyp

LR_usr

LR_fiq

LR_irq

LR_abt

LR_und

LR_svc

LR_mon

LR_hyp

PC

Shared with User Mode
CPSR

SPSR_fiq

SPSR_irq

SPSR_abt

SPSR_und

SPSR_svc

SPSR_mon

SPSR_hyp

3. Instruction sets

Current ARM processors support several instruction sets.

· The classic ARM instruction set, in which all instructions are 32-bit.

· The Thumb instruction set, introduced in ARMv4T and in which all instructions are 16-bit, greatly improves code density. In Cortex processors, Thumb-2 technology adds 32-bit instructions to the Thumb instruction set providing increased performance
while maintaining the high code density of the original Thumb instruction set.

· The NEON instruction set is a wide SIMD processing architecture, optionally supported by ARMv7-A processors.

Of the ARM processors available on the market today, all support the ARM and Thumb instruction sets as a minimum, with the exception of ARMv7-M devices which support only the Thumb instruction set.

4. Exceptions and interrupts

ARM supports eight basic exception types. External interrupts are mapped to the FIQ and IRQ exceptions. Other exceptions are used for external events (e.g. bus errors), internal events (e.g. undefined instructions or memory address translation faults), or
software interrupts. Software interrupts are caused synchronously by execution of an SVC (Supervisor Call), SMC (Secure Monitor Call) or
HVC (Hypervisor Call) instruction.

Later ARM processors implement a standardized Generic Interrupt Controller, which provides interrupt prioritization, pre-emption, configuration, distribution, masking etc in hardware.

5. Memory architecture

ARM processors have a 32-bit address bus providing a flat 4GB linear physical address space. Memory is addressed in bytes and can be accessed as 8-byte doublewords, 4-byte words, 2-byte halfwords or single bytes. Configuration options in the processor determine
the endianness and alignment behavior of the memory interface.

ARMv7-A processors implement the VMSAv7 Virtual Memory System Architecture. This provides 32-bit virtual-physical address translation functionality. In the latest processors, like the Cortex-A15, this is extended (in the form of the Large Physical Address
Extensions) to provide 40-bit physical addressing (see 2.2.5 above).

The architecture supports up to 8 levels of cache, with current implementations typically supporting 2 levels. The architecture permits several options with respect to virtual or physical indexing and tagging of cache contents.

Multi-core processors (e.g. Cortex-A5MP, Cortex-A9MP and Cortex-A15) provide coherency in the L1 data cache across up to four cores in a single cluster.

2.3.2 Extended features

This section describes the extended physical addressing and virtualization extensions to ARMv7-A. These are supported by the Cortex-A15 processor.

1. Large Physical Address Extensions

All ARMv7-A processors provide virtual-to-physical address translation via an integrated MMU. This is achieved by a two-level structure of page tables describing the address translation as well as memory attributes for each page. Page sizes from 16MB (termed
a “supersection”) down to 4KB (termed a “small page”) are supported. A single level page table allows granularity of 1MB, with a second level of tables being required to allow smaller granularity.

In processors prior to the Cortex-A15, both virtual and physical addresses are 32-bit, allowing a linear 4GB address space.

The Cortex-A15 implements the Large Physical Address Extensions (LPAE) which, via an extended translation scheme allows the generation of 40-bit physical addresses. The tables used in the LPAE extensions contain longer descriptors, providing mapping of addresses
at granularity of 1GB, 2MB or 4KB using between one and three stages. In all cases, virtual addresses as issued by the processor are still 32-bit; it is the physical addresses issued to the memory system which can be up to 40 bits.

Processors implementing LPAE are backwards compatible with the existing 32-bit translation scheme and use of the extended addressing is optional.

6. Virtualization extensions

The virtualization extensions are intended to support implementation of a hypervisor environment using a combination of hardware and software support. The architectural extensions are in several parts.

· There is support for a second stage of virtual memory translation which is managed by the hypervisor. Note that this second stage of translation is supported via the LPAE translation mechanism so it follows that implementation of LPAE is an integral
part of the virtualization extensions. This second stage of translation allows a hypervisor complete control of the physical address map output by the processor and this can be changed dynamically to support the needs of different “guest” systems. In this
way, guest systems can be kept isolated from each other and each can be presented with a complete virtual memory system which it “owns”.

· A defined set of control and configuration registers are “banked” in hypervisor mode so that each guest system sees a different, private set of the registers. Access to these registers by a guest system causes a trap into Hypervisor mode so that
the hypervisor can take appropriate steps to configure the system accordingly.

· A defined set of system events (e.g. exceptions) can be configured to cause direct entry to Hypervisor mode instead of taking the standard exception handling action. The hypervisor code can then process the exception event before scheduling a “virtual”
exception to be handled by the appropriate guest system.

The combination of these features allows a hypervisor to manage and control system configuration to maintain isolation between guest systems. Each guest system operates within a separate virtual machine.

2.4 Debug

ARM provides debug using the industry-standard JTAG port. As standard, this uses a 5-wire connection. A 2-wire debug port is also available for use in applications where pin-count is at a premium.

Program trace, if implemented, is provided via a combination of additional logic within the chip and an external Trace Port Adapter unit connected to a Trace Port on the chip itself.

ARM’s CoreSight on-chip debug infrastructure allows chip designers to specify and build complex multi-core debug systems which allow synchronous trace and debug of multiple processors within a single device.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: