General overview of architecture of TI's Davinci 8168 SoC
2011-12-31 13:57
363 查看
Davinci 8168 is a very interesting arm-cortex A8 SOC chip which contains many co-processor unit such as DSP core, M3 core, which brings a new height of integration level to embbed devices. The biggest difference comparing to traditional solution such
as using socket communication between separate boards or using board level buses, is that the functionality unit are all in one processing unit which sharing memory resulting significant communication cost savings. With traditional solution, it would be hard
to combine so much computing units into one system, the hardware and working flow design would dizzy your head, while with this chip, the hardware is almost ready to use. What make thing not so perfect is that the complexity of software raised for that so
many units should be managed in OS and made working together with high performance. Following is some knowledge from my understanding.
1 Hardware overview:
Graph 1. hardware blocks overview of TI8168 overview
From the above, we can see there are about 4 parts that have computation capabilities, 1 is ARM, 2 is DSP, 3 media processors, 4 graphic accelerator. From the user or software's point of view, the 1 and 2 provide large programming and computation
potentiality, while 3 and 4 provide limited programming capabilities in most cases. In another word they are intended to do certain limited things such as h.264 encoding. The whole system have a NoC(network on chip) L3 link, which using packet-based protocol
to transfer date between different units. But what the OS saw, is that they are still physical address so this layer should be transparent. Memory layout at hardware level could be got from 8168's hardware chip datesheet, to change the memory mapping you
need to take care of Uboot and Linux kernel.
Note: In fact, the video system contains two block: VPSS and VIDEO. VPSS is the subsystem doing video capture, deinterlacing, scaling, noise filtering etc. VIDEO is the subsystem doing encoding and decoding. Actually, they are software containing tiny
real time OS and APP which run on several M3 co-processors and controlling more hardware accelator such as hardware encoder--HDVICP2, and in SDK they are not intend to expose their details to application programmer for the reason that it is very complex and
hardware related. But if you want, in RDK you can found the source code running as firmware.
summary: this SOC provide: programmable ARM and DSP core, configurable hardware video, media and graphic subsystems. Actually there are about 5 cores inside:1 arm+ 1 DSP + 2 or 3 M3 + 1 graphic. All are programmable running at very high speed( > 500
Mhz), but in most case you would only need to building up program on ARM and DSP.
2 Software overview:
From the above we can see the key problem is to manage the sub systems and synchronize them. The system use linux 2.6 on cortex-A8 as host OS taking the role of controller of all hardware(directly or indirectly), and the whole booting process would be:
uboot->linux kernel->rootfs->optionally boot up co-processors. Linux is playing the mastering everything role in the system, while DSP actually have its own small OS.
How to develop APP on that? generally speaking there are 4 methods:
(1) c6run:
Ti have provided a compiler in a open source project, which accepts parameters very likely to gcc. I have tried it, the very nice things is that it is so similar to gcc that I can write a Makefile to compile one copy of source codes into 3 outputs on
developing workstation: x86, arm, arm+dsp, and they can be directly run on related linux system. This is excellent, which means you can deploy your algorithm very fast on DSP, and check whether the performance satisfying.
How it works? Basically the DSP compiler, I mean c6run compiler would compile all the codes and archive them into a static .lib file that arm gcc tool chain could link. So other codes on arm side could just include the header files as if it is a normal
function on arm, and at link stage the functions are link to the .lib files which contains communication and dsp binary codes on DSP core. In another work the communication and SYNC details were hide by the c6run compiler, the codes that would run on DSP seems
just as a library. There is still another way that make the DSP codes works alone instead of being a library of ARM side code, but I have not tried that.
But till now I have found following limitation in this way: Unable to start other co-processors, hard to debug codes on DSP(you can't use CCS or emulator to debug it as traditional DSP development). I am not sure if these would improve in the future.
Blow is the calling process of C6RunLib style.
Note: you program could ignore the existence of DSP and SYSLINK framework completely, the compiler have hide it and wrapper the function call on ARM to a message in SYSLINK, and binary code on DSP would be called which is also automatically build by
the C6Run compiler. The communication is done by sharing a memory zone in DDR between DSP and ARM.
ARM DSP
|
normal arm APP process
calling algorithm function A():
wrapper the function A() to internal function A_syslink()
send message to DSP in SYSLINK framework ----------------------------------------->|
recv the message, get the function and related parameters
execute the DSP version code of the function
return it by send message to ARM in SYSLINK framework
get the result from SYSLINK, return it to the caller.<--------------------------------------|
continue the code on ARM
|
Graph 2. Executing process of C6RunLib of C6EZRun
(2) c6accel
It is very similar to C6RunLib that it also appears like a library to ARM program, but it have syslink.ko than just cmem.ko, which require the the DSP program conform to XDAIS standards, which means you are not able to deploy you algorithm quickly. But
it is good for DSP program development that you can write and debug them in CCS and then get the compiled output library. More important with this framework other existing hardware or software units such as hardware based video processing could be bring up.
link: http://processors.wiki.ti.com/index.php/C6EZAccel
(3) OpenMax
Basically OpenMax is a encapsulation at the same level as Android component organization, But t
4000
ill now it is a good way to start developement because some components was ready to use in SDK5.0.3, So you can skip some settings of co-processors. Generally
speaking OpenMax is a software standard to let different components communicate easily, so it uses concepts "channel" "basket" to form a data link. The link could be set underneath, I mean between hardware co-processors, or between co-processor and arm cortax-a8
core.
It is in tunnel form in old version of SDK, but now I saw there is a new manner to function to call the components in non-tunnel mode, which is very similar to simple linux API and ioctl calls. But still it have some limitation to the hardware you are
using, I mean the peripheral devices, especially the video capture decode IC.
Note: from SDK 5.0.1 it require a 1080P IO sub board, otherwise video inputting would be a problem. When I tried to migrate the old driver, it seems driver is at #include <linux/vps_capture.h>, but encoding and other things goto M3 core via syslink,
and at sdk503 it is bound to 3 channel, so I guess making whole thing work need to looking for M3 code, that is too much work and hard if Ti did not provide assistance, maybe at future it would released another architecture to unbound the capturing and encoding
in linux kernel, so I decide to give up that now.
(4) RDK
Built for multichannel vision usage, especially video recorder. It is quite similar to channel style usage of OpenMax, and using the Framework called MCFW(Multi-Channel Framework). It have all the source code and tools that M3 core runs, so it would
be easier if you want to modify the hardware and use it to build multi-channel D1 APP.It is based on the "links" object, that is what I am using.
as using socket communication between separate boards or using board level buses, is that the functionality unit are all in one processing unit which sharing memory resulting significant communication cost savings. With traditional solution, it would be hard
to combine so much computing units into one system, the hardware and working flow design would dizzy your head, while with this chip, the hardware is almost ready to use. What make thing not so perfect is that the complexity of software raised for that so
many units should be managed in OS and made working together with high performance. Following is some knowledge from my understanding.
1 Hardware overview:
Graph 1. hardware blocks overview of TI8168 overview
From the above, we can see there are about 4 parts that have computation capabilities, 1 is ARM, 2 is DSP, 3 media processors, 4 graphic accelerator. From the user or software's point of view, the 1 and 2 provide large programming and computation
potentiality, while 3 and 4 provide limited programming capabilities in most cases. In another word they are intended to do certain limited things such as h.264 encoding. The whole system have a NoC(network on chip) L3 link, which using packet-based protocol
to transfer date between different units. But what the OS saw, is that they are still physical address so this layer should be transparent. Memory layout at hardware level could be got from 8168's hardware chip datesheet, to change the memory mapping you
need to take care of Uboot and Linux kernel.
Note: In fact, the video system contains two block: VPSS and VIDEO. VPSS is the subsystem doing video capture, deinterlacing, scaling, noise filtering etc. VIDEO is the subsystem doing encoding and decoding. Actually, they are software containing tiny
real time OS and APP which run on several M3 co-processors and controlling more hardware accelator such as hardware encoder--HDVICP2, and in SDK they are not intend to expose their details to application programmer for the reason that it is very complex and
hardware related. But if you want, in RDK you can found the source code running as firmware.
summary: this SOC provide: programmable ARM and DSP core, configurable hardware video, media and graphic subsystems. Actually there are about 5 cores inside:1 arm+ 1 DSP + 2 or 3 M3 + 1 graphic. All are programmable running at very high speed( > 500
Mhz), but in most case you would only need to building up program on ARM and DSP.
2 Software overview:
From the above we can see the key problem is to manage the sub systems and synchronize them. The system use linux 2.6 on cortex-A8 as host OS taking the role of controller of all hardware(directly or indirectly), and the whole booting process would be:
uboot->linux kernel->rootfs->optionally boot up co-processors. Linux is playing the mastering everything role in the system, while DSP actually have its own small OS.
How to develop APP on that? generally speaking there are 4 methods:
(1) c6run:
Ti have provided a compiler in a open source project, which accepts parameters very likely to gcc. I have tried it, the very nice things is that it is so similar to gcc that I can write a Makefile to compile one copy of source codes into 3 outputs on
developing workstation: x86, arm, arm+dsp, and they can be directly run on related linux system. This is excellent, which means you can deploy your algorithm very fast on DSP, and check whether the performance satisfying.
How it works? Basically the DSP compiler, I mean c6run compiler would compile all the codes and archive them into a static .lib file that arm gcc tool chain could link. So other codes on arm side could just include the header files as if it is a normal
function on arm, and at link stage the functions are link to the .lib files which contains communication and dsp binary codes on DSP core. In another work the communication and SYNC details were hide by the c6run compiler, the codes that would run on DSP seems
just as a library. There is still another way that make the DSP codes works alone instead of being a library of ARM side code, but I have not tried that.
But till now I have found following limitation in this way: Unable to start other co-processors, hard to debug codes on DSP(you can't use CCS or emulator to debug it as traditional DSP development). I am not sure if these would improve in the future.
Blow is the calling process of C6RunLib style.
Note: you program could ignore the existence of DSP and SYSLINK framework completely, the compiler have hide it and wrapper the function call on ARM to a message in SYSLINK, and binary code on DSP would be called which is also automatically build by
the C6Run compiler. The communication is done by sharing a memory zone in DDR between DSP and ARM.
ARM DSP
|
normal arm APP process
calling algorithm function A():
wrapper the function A() to internal function A_syslink()
send message to DSP in SYSLINK framework ----------------------------------------->|
recv the message, get the function and related parameters
execute the DSP version code of the function
return it by send message to ARM in SYSLINK framework
get the result from SYSLINK, return it to the caller.<--------------------------------------|
continue the code on ARM
|
Graph 2. Executing process of C6RunLib of C6EZRun
(2) c6accel
It is very similar to C6RunLib that it also appears like a library to ARM program, but it have syslink.ko than just cmem.ko, which require the the DSP program conform to XDAIS standards, which means you are not able to deploy you algorithm quickly. But
it is good for DSP program development that you can write and debug them in CCS and then get the compiled output library. More important with this framework other existing hardware or software units such as hardware based video processing could be bring up.
link: http://processors.wiki.ti.com/index.php/C6EZAccel
(3) OpenMax
Basically OpenMax is a encapsulation at the same level as Android component organization, But t
4000
ill now it is a good way to start developement because some components was ready to use in SDK5.0.3, So you can skip some settings of co-processors. Generally
speaking OpenMax is a software standard to let different components communicate easily, so it uses concepts "channel" "basket" to form a data link. The link could be set underneath, I mean between hardware co-processors, or between co-processor and arm cortax-a8
core.
It is in tunnel form in old version of SDK, but now I saw there is a new manner to function to call the components in non-tunnel mode, which is very similar to simple linux API and ioctl calls. But still it have some limitation to the hardware you are
using, I mean the peripheral devices, especially the video capture decode IC.
Note: from SDK 5.0.1 it require a 1080P IO sub board, otherwise video inputting would be a problem. When I tried to migrate the old driver, it seems driver is at #include <linux/vps_capture.h>, but encoding and other things goto M3 core via syslink,
and at sdk503 it is bound to 3 channel, so I guess making whole thing work need to looking for M3 code, that is too much work and hard if Ti did not provide assistance, maybe at future it would released another architecture to unbound the capturing and encoding
in linux kernel, so I decide to give up that now.
(4) RDK
Built for multichannel vision usage, especially video recorder. It is quite similar to channel style usage of OpenMax, and using the Framework called MCFW(Multi-Channel Framework). It have all the source code and tools that M3 core runs, so it would
be easier if you want to modify the hardware and use it to build multi-channel D1 APP.It is based on the "links" object, that is what I am using.
相关文章推荐
- zero copy architecture of my video process application on TI 8168 using C6runLib
- zero copy architecture in RDK of TI 8168 EVM
- General Overview of The System(一)
- A deeper look at hardware and software of TI 8168 EVM
- An Overview of the Android Architecture (Android Studio)
- An Overview of Blockchain Technology: Architecture, Consensus, and Future Trends 全文翻译
- An overview of the ARM architecture
- Overview of the Architecture of ADO.NET.
- Overview of the Architecture of ADO.NET
- General overview of the Linux file system
- 微服务架构概述 An Overview of Micro services Architecture by Khoa Dinh
- TI的davinci和omap资源
- A Swing Architecture Overview
- An overview of gradient descent optimization algorithms解读
- [论文笔记] Quality-of-service oriented web service composition algorithm and planning architecture (JSS, 2008)
- Overview Of The Drupal Module Info File
- A Compared Overview of C++, C#, and Java
- overview of the TableView API
- 有关TI DAvinci 芯片的比较
- 12C New Feuture: Overview of Container Databases (CDB) and Pluggable Databases (PDB)