include some assembler

  • 1
  • 2
  • Page 1 of 2
Droopy
Benutzer
Avatar
Gender:
Age: 58
Posts: 24
Registered: 01 / 2016
Subject:

include some assembler

 · 
Posted: 05.03.2017 - 00:00  ·  #1
Hey all,

I know it is possible to integrate some assembler code in Avrco. But I know nothing about assembler and need a very short piece of assembler to optimize for speed. Can someone help and take a look?

I try to multiplex a small RGB display with a xmega256A3u. I have a pointer to ram where the data is located, I have a free output port D and a free clock pin on a different port.
The assembler only need to get the ram byte at the pointer, increment the pointer, and place the data on the output port. Than it needs to set the clock pin high and low again.

I've a example from internet in C but like mentioned, I never used assembler in avrco or with the AVR so it's chinese for me ;) Seems they do it in just 5 ticks :confused2:

Example:

// A tiny bit of inline assembly is used; compiler doesn't pick
// up on opportunity for post-increment addressing mode.
// 5 instruction ticks per 'pew' = 160 ticks total
#define pew asm volatile(
"ld __tmp_reg__, %a[ptr]+" "nt"
"out %[data] , __tmp_reg__" "nt"
"out %[clk] , %[tick]" "nt"
"out %[clk] , %[tock]" "n"
:: [ptr] "e" (ptr),
[data] "I" (_SFR_IO_ADDR(DATAPORT)),
[clk] "I" (_SFR_IO_ADDR(SCLKPORT)),
[tick] "r" (tick),
[tock] "r" (tock));

// Loop is unrolled for speed:
pew pew pew pew pew pew pew pew
pew pew pew pew pew pew pew pew
pew pew pew pew pew pew pew pew
pew pew pew pew pew pew pew pew

Thanks all for any help and information :thumbright:
Merlin
Administrator
Avatar
Gender:
Age: 24
Posts: 1409
Registered: 03 / 2005
Subject:

Re: include some assembler

 · 
Posted: 05.03.2017 - 09:36  ·  #2
AVRCo Pascal supports memory pointer increment native. Is this still too slow? It might be easier to compile what you want in AVRCo Pascal and ask someone to optimise the resultant assembler. Or it may be that what you are using is effectively I2C or (possibly) SPI or something close enough for the internal routines to work effectively.
Droopy
Benutzer
Avatar
Gender:
Age: 58
Posts: 24
Registered: 01 / 2016
Subject:

Re: include some assembler

 · 
Posted: 06.03.2017 - 01:00  ·  #3
Thnx Merlin.

Multiplexing an RGB display is time consuming because of the number of leds.
The display I have is 32x64. This results in 32*64*3RGB=6144 leds. To create several colors I want to give every led 16 intensity levels. So 16R*16G*16B=4096 several colors.
Using SPI is not possible: the display clocks two lines at once with R,G and B data (R0,G0,B0 for line 0 and R1,G1,B1 for line 16). I take port D for it and don't use D6 and D7. A whole byte works faster.

The code look like:

var
RgbClk[@PortF,7] : bit; //Clock pin
RgbData[@PortD] : byte; //port to send out the data arranged as xxR0G0B0R1G1B1
RgbDataArr : array[0..1023] of byte; //graphic data
RgbLine : byte; //from 0..15
MyPntr : pointer;
ii : byte;
idx : word;
begin
idx:=(word(RgbLine and $0F))shl 6; //Calculate the index in the array
MyPntr:=@RgbDataArr[idx]; //Set pointer
for ii:=0 to 63 do //Do for panel length
RgbData:=byte(MyPntr^++); //Get data
incl(RgbClk);nop;excl(RgbClk); //Clock data
endfor;

On a xmega256@32MHz, above code to clock RGB data for example line 0 and line16 is taken 85µsec not optimized. Optimized it still takes about 58µsec. If I see the internet assembler code, they do it in 64x5ticks=320ticks or 10µSec, almost 6 times faster.
To have an idea:
Let's take I do it not with assembler, just optimized. With some overhead I need 60µsec for two lines. For the whole screen I will need x16 = 960µsec. I will have to this 4 times to create a 16 level intensity so x4 = almost 4msec just communication. To create a smooth image with good color and no flickering you need a refresh rate of about 200 frames. That means my avr is almost full occupied with communication between the display.
I'm aware that it is time consuming and therfore I want to use 1 AVR as host. That one is for display tasks communicating over SPI to another avr as slave that does the display multiplexing.

Thnx;)
Thomas.AC
Benutzer
Avatar
Gender: n/a
Age: 43
Posts: 308
Registered: 07 / 2013
Subject:

Re: include some assembler

 · 
Posted: 06.03.2017 - 10:48  ·  #4
You are rigth. AVRco loops are not efficient. Variables are always stored back to RAM.

One example to write all 1024 byte values. Code is compilable but untested!!!

Code

program SimulatorTest;

Device = mega2560, VCC=5;

Import SysTick;

Define
  ProcClock      = 16000000;       {Hertz}
  SysTick        = 10;             {msec}
  StackSize      = $0100, iData;
  FrameSize      = $0100, iData;

Implementation
{$IDATA}
 
var
  RgbData[@PortD] : byte; // port to send out the data arranged as xxR0G0B0R1G1B1
  RgbDataArr : array[0..1023] of byte; //graphic data

procedure test;
begin
  asm;
                        LDI       _ACCB, 000h
              LOOPI:
                        LD        _ACCA, Z+        // _ACCA := MyPtnr^+
                        OUT       %.RGBDATA, _ACCA // RGBDATA := _ACCA
                        SBI       PORTF, 007h      // incl(RgbClk)
                        INC       _ACCB;           // increment loop register
                        CPI       _ACCB, 000h      // compare loop register with zero
                        CBI       PORTF, 007h      // excl(RgbClk)
                        BRNE      LOOPI            // jump if not equal (result of CPI)
  endasm;
end;

// main
begin
  asm;
    LDI       _ACCCLO, %.RgbData AND 0FFh // load @rgbData[0] to Z register pair
    LDI       _ACCCHI, %.RgbData SHRB 8
  endasm;
  test; // 0..253
  test; // 254 .. 511
  test; // ...
  test; // ... 1023
  loop
  endloop;
end SimulatorTest.
Droopy
Benutzer
Avatar
Gender:
Age: 58
Posts: 24
Registered: 01 / 2016
Subject:

Re: include some assembler

 · 
Posted: 06.03.2017 - 12:20  ·  #5
Thanks Thomas.

At the moment I still try to find out if the whole concept is good and feasible : will I have enough runtime, will it not be too complicated and is this the right way to make a big scoreboard? Because it is for commercial use, I also need to check some other things: beside budget and development time, are those RGB panels reliable and can I pass EMC testing with all those multiplexing …

So still some issues to check before I have a go;)

Thanks again Thomas, I will try it out;)
Droopy
Benutzer
Avatar
Gender:
Age: 58
Posts: 24
Registered: 01 / 2016
Subject:

Re: include some assembler

 · 
Posted: 06.03.2017 - 17:04  ·  #6
Thomas, i get an error on PORTD, PORTF in the assembler. I'm a noob in this assembler so I will do something stupid?
Code

.ASM
                        .LINE     120
                        LDI   _ACCB, 000h;
                        LOOPI:
                        .LINE     122
                        LD    _ACCA, Z+;
                        .LINE     123
                        OUT   PORTD, _ACCA;    //-> error
                        .LINE     124
                        SBI   PORTF, 007h;         // -> error
                        .LINE     126
                        INC   _ACCB;                                                 
                        .LINE     127
                        CPI   _ACCB, 03Fh                                             
                        .LINE     128
                        CBI   PORTF, 007h          // -> error
                        .LINE     129
                        BRNE      LOOPI                                               
                        
.endasm
rh
Administrator
Avatar
Gender:
Location: Germany
Age: 24
Homepage: e-lab.de
Posts: 5558
Registered: 03 / 2002
Subject:

Re: include some assembler

 · 
Posted: 06.03.2017 - 18:09  ·  #7
Hi,

the IN/OUT and SBI/CBI statement only work in a small memory area, $20..$3F
I think. Above that these statements can´t be used instead use LDS and STS.

rolf
Thomas.AC
Benutzer
Avatar
Gender: n/a
Age: 43
Posts: 308
Registered: 07 / 2013
Subject:

Re: include some assembler

 · 
Posted: 06.03.2017 - 22:33  ·  #8
i did not compile with xmega256A3.
Here is a version which works with xmega. Thanks to Rolf.

avr instructions are explained here: http://www.atmel.com/webdoc/avrassembler/index.html
the assembly code uses the indirect addressing methoth with Index Z, which is composed by _ACCCLO and _ACCCHI. The index is first loaded with the address of the dara array.
_ACCB holds the loop index, which is incremented each loop iteration. The loop stops when
_ACCB becomes zero again after 256 interations. incl(RgbClk) and excl(RgbClk) are now implemted as read modify operations (load port to _acca, manipulate _acca, store _acca back to port).
By Calling the assembly code four times, the complete array is pulled out to the port.

Be concerned about read modify operations which are not atomic. A bug will definitely occure, if an interrupt or a thread writes to PORTF, too.

Code

program SimulatorTest;

Device = xmega256A3u, VCC=5;

Import SysTick;

Define
  OSCtype = int32MHz, PLLmul=4, prescB=1, prescC=1;
  SysTick        = 10;             {msec}
  StackSize      = $0100, iData;
  FrameSize      = $0100, iData;

Implementation
{$IDATA}

var
  RgbData[@PortD] : byte; // port to send out the data arranged as xxR0G0B0R1G1B1
  RgbDataArr : array[0..1023] of byte; //graphic data

procedure test;
begin
  asm;
                        LDI       _ACCB, 000h
              LOOPI:
                        LD        _ACCA, Z+        // _ACCA := MyPtnr^+
                        STS       %.RGBDATA, _ACCA // RGBDATA := _ACCA
                        LDS       _ACCA, PORTF     // incl(RgbClk)
                        SBR       _ACCA, 080h
                        STS       PORTF, _ACCA
                        INC       _ACCB;           // increment loop register
                        CPI       _ACCB, 000h      // compare loop register with zero
                        CBR       _ACCA, 080h      // excl(RgbClk)
                        STS       PORTF, _ACCA
                        BRNE      LOOPI            // jump if not equal (result of CPI)
  endasm;
end;

// main
begin
  asm;
    LDI       _ACCCLO, %.RgbData AND 0FFh // load @rgbData[0] to Z register pair
    LDI       _ACCCHI, %.RgbData SHRB 8
  endasm;
  test; // 0..253
  test; // 254 .. 511
  test; // ...
  test; // ... 1023
  loop
  endloop;
end SimulatorTest.

  • 1
  • 2
  • Page 1 of 2
Selected quotes for multi-quoting:   0

Registered users in this topic

Currently no registered users in this section

The statistic shows who was online during the last 5 minutes. Updated every 90 seconds.
MySQL Queries: 15 · Cache Hits: 14   140   154 · Page-Gen-Time: 0.032276s · Memory Usage: 2 MB · GZIP: on · Viewport: SMXL-HiDPI