您的位置:首页 > 编程语言 > Go语言

General overview of architecture of TI's Davinci 8168 SoC

2011-12-31 13:57 363 查看
    Davinci 8168 is a very interesting arm-cortex A8 SOC chip which contains many co-processor unit such as DSP core, M3 core, which brings a new height of integration level  to embbed devices. The biggest difference comparing to traditional solution such
as using socket communication between separate boards or using board level buses, is that the functionality unit are all in one processing unit which sharing memory resulting significant communication cost savings. With traditional solution, it would be hard
to combine so much computing units into one system, the hardware and working flow design would dizzy your head, while with this chip, the hardware is almost ready to use.  What make thing not so perfect is that the complexity of software raised for that so
many units should be managed in OS  and made  working together with high performance. Following is some knowledge from my understanding.

1 Hardware overview:

   


                 Graph 1.     hardware blocks overview of TI8168 overview

      From the above, we can see there are about 4 parts that have computation capabilities, 1 is ARM,  2 is DSP, 3 media processors, 4 graphic accelerator. From the user or software's point of view,  the 1 and 2 provide large programming and computation
potentiality, while 3 and 4 provide limited programming capabilities in most cases. In another word they are intended to do certain limited things such as h.264 encoding. The whole system have a NoC(network on chip) L3 link, which using packet-based protocol
to transfer date between different units. But what the OS saw,  is that they are still physical address so this layer should be transparent. Memory layout at hardware level could be got from 8168's hardware chip datesheet, to change the memory mapping you
need to take care of Uboot and Linux kernel.

     Note: In fact, the video system contains two block: VPSS and VIDEO. VPSS is the subsystem doing video capture, deinterlacing, scaling, noise filtering etc. VIDEO is the subsystem doing encoding and decoding. Actually, they are software containing tiny
real time OS and APP which run on several M3 co-processors and controlling more hardware accelator such as hardware encoder--HDVICP2,  and in SDK they are not intend to expose their details to application programmer for the reason that it is very complex and
hardware related. But if you want, in RDK you can found the source code running as firmware.

    summary: this SOC provide: programmable ARM and DSP core, configurable hardware video, media and graphic subsystems. Actually there are about 5 cores inside:1 arm+ 1 DSP + 2 or 3 M3 + 1 graphic. All are programmable running at very high speed( > 500
Mhz), but in most case you would only need to building up program on ARM and DSP.

2 Software overview:

    From the above we can see the key problem is to manage the sub systems and synchronize them. The system use linux 2.6 on cortex-A8 as host OS taking the role of controller of all hardware(directly or indirectly), and the whole booting process would be:
uboot->linux kernel->rootfs->optionally boot up co-processors. Linux is playing the mastering everything role in the system, while DSP actually have its own small OS.

   How to develop APP on that? generally speaking there are 4 methods:

(1) c6run:

    Ti have provided a compiler in a open source project, which accepts parameters very likely to gcc. I have tried it, the very nice things is that it is so similar to gcc that I can write a Makefile to compile one copy of source codes into 3 outputs on
developing workstation: x86, arm, arm+dsp, and they can be directly run on related linux system. This is excellent, which means you can deploy your algorithm very fast on DSP, and check whether the performance satisfying.

    How it works? Basically the DSP compiler, I mean c6run compiler would compile all the codes and archive them into a static .lib file that arm gcc tool chain could link. So other codes on arm side could just include the header files as if it is a normal
function on arm, and at link stage the functions are link to the .lib files which contains communication and dsp binary codes on DSP core. In another work the communication and SYNC details were hide by the c6run compiler, the codes that would run on DSP seems
just as a library. There is still another way that make the DSP codes works alone instead of being a library of ARM side code, but I have not tried that. 

    But till now I have found following limitation in this way: Unable to start other co-processors, hard to debug codes on DSP(you can't use CCS or emulator to debug it as traditional DSP development). I am not sure if these would improve in the future.
Blow is the calling process of C6RunLib style.

    Note: you program could ignore the existence of DSP and SYSLINK framework completely, the compiler have hide it and wrapper the function call on ARM to a message in SYSLINK, and binary code on DSP would be called which is also automatically build by
the C6Run compiler. The communication is done by sharing a memory zone in DDR between DSP and ARM.

               ARM                                                                                                          DSP

                 |

normal arm APP process

calling algorithm function A():

wrapper the function A() to internal function A_syslink()

send message to DSP in SYSLINK framework  ----------------------------------------->|

                                                                               recv the message, get the function and related parameters

                                                                                            execute the DSP version code of the function

                                                                                 return it by send message to ARM in SYSLINK framework

get the result from SYSLINK, return it to the caller.<--------------------------------------|

 continue the code on ARM

                    |             

                                         Graph 2. Executing process of C6RunLib of C6EZRun

(2) c6accel

    It is very similar to C6RunLib that it also appears like a library to ARM program, but it have syslink.ko than just cmem.ko, which require the the DSP program conform to XDAIS standards, which means you are not able to deploy you algorithm quickly. But
it is good for DSP program development that you can write and debug them in CCS and then get the compiled output library. More important with this framework other existing hardware or software units such as hardware based video processing could be bring up.
link: http://processors.wiki.ti.com/index.php/C6EZAccel
(3) OpenMax  

    Basically OpenMax is a encapsulation at the same level as Android component organization, But t
4000
ill now it is a good way to start developement because some components was ready to use in SDK5.0.3, So you can skip some settings of co-processors. Generally
speaking OpenMax is a software standard to let different components communicate easily, so it uses concepts "channel"  "basket" to form a data link. The link could be set underneath, I mean between hardware co-processors, or between co-processor and arm cortax-a8
core.

    It is in tunnel form in old version of SDK, but now I saw there is a new manner to function to call the components in non-tunnel mode, which is very similar to simple linux API and ioctl calls. But still it have some limitation to the hardware you are
using, I mean the peripheral devices, especially the video capture decode IC.

    Note: from SDK 5.0.1 it require a 1080P IO sub board, otherwise video inputting would be a problem.  When I tried to migrate the old driver, it seems driver is  at #include <linux/vps_capture.h>,  but encoding and other things goto M3 core via syslink,
and at sdk503 it is bound to 3 channel, so I guess making whole thing work need to looking for M3 code, that is too much work and hard if Ti did not provide assistance, maybe at future it would released another architecture to unbound the capturing and encoding
in linux kernel, so I decide to give up that now.

(4) RDK

    Built for multichannel vision usage, especially video recorder. It is quite similar to channel style usage of OpenMax, and using the Framework called MCFW(Multi-Channel Framework). It have all the source code and tools that M3 core runs, so it would
be easier if you want to modify the hardware and use it to build multi-channel D1 APP.It is based on the "links" object, that is what I am using.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息