AVRCO Scheduler problem (big bug)

  • 1
  • 2
  • Page 1 of 2
Gerard
 
Avatar
 
Subject:

AVRCO Scheduler problem (big bug)

 · 
Posted: 10.11.2012 - 16:00  ·  #1
Hello

I am developping a complex application with stepper motors, usb driver, syntax analyze on an AT90USB board.

My application is running quite well but ......

I found a bug which appends sometimes depending of the activity of the board and I was "lucky" to discover that the bug was a data memory corruption (always in the same memory area but not the same bytes)

After 4 complete days isolating the minimum code which generates the bug at least one time per minute, I can resume it to you like this :

My application use the "Stepper-Motor in UserMode" driver, so I define my own StepperIOS function.
(I have found a lot of weeks ago that the parameter (CW: boolean) of the userdevice is not correct and is always set to false because the caller of StepperIOS tests the wrong bit of _StepFlags variable, but I didn't took time to report you the bug, sorry !)

After some days of tests I isolated the problem : sometimes, the frame pointer Y is not a valid frame pointer when the value CW is pushed on the frame before the call. So the value CW (which is always $00 like I told you above) is WRITTEN SOMEWHERE in the memory, I leave you imagine what is happening after and when .....

So by patching the result .hex file
LDS _ACCA, _StepFlags;
ANDI _ACCA, 004h
ST -Y, _ACCA
CALL EchoTPXUDefinitions.StepperIOS
by
LDS _ACCA, _StepFlags;
ANDI _ACCA, 004h
NOP
CALL EchoTPXUDefinitions.StepperIOS

the application is WELL RUNNING FOR HOURS. Without the patch the bug appears before less than 10 seconds !!!!!

This is the tip of the iceberg but doesn't explain why it is happening so I didn't stop my investigations and I found something very very deep in the scheduler :

To use the stepper motor very smoothly, I declare the scheduler as this :
Scheduler = iData, interruptible;

In this case the compiler creates this code :

SYSTEM.$INTERRUPT_TIMER0:
.......
CALL EchoTPXUDefinitions.onSysTick
LDI _ACCA, 1 SHLB IntFlag
OR Flags, _ACCA
SEI

so the whole following scheduler code accepts extra interruptions and it is fine for the stepper motor driver.

BUT.....

when the scheduler has to make a task change, it uses this part of code

SYSTEM.SwitchProcess:
CALL SYSTEM.SavePrcsRegs
...........
SYSTEM.SwitchProcess_T:
......
.DEB $_CurProcess
CALL SYSTEM.RestorePrcsRegs

and this is done with the interruptions enabled BUT the procedure "RestorePrcsRegs" is not protected from interruption.

Effectively the "RestorePrcsRegs" code is the following :

SYSTEM.RestorePrcsRegs:
LDD _FPTRHI, Z+9
LDD _FRAMEPTR, Z+8
LDD _ACCA, Z+11
......

So you can see that if an interrupt occurs between these 2 lines
LDD _FPTRHI, Z+9
LDD _FRAMEPTR, Z+8
the frame pointer which will be used in the interrupt service routine is no AVAILABLE

Fortunately (or unfortunately, otherwise I would have seen it before) all my interrupt service routines of my others applications (without stepper motor) are not using frame variables and so the problem was hidden.


Actually, I have only two solutions :
A) remove the "interrupt" key word on the scheduler declaration,
but the stepper motor will not work well and the steps generation might be heratic.

B) patch the hex file to remove the frame push of the CW variable before the StepperIOS call.
but it is very heavy to do this each time I compile the application !!!

So, Rolf, could you change the scheduler code to protect the SYSTEM.RestorePrcsRegs from interrupts !!
(and possibly change also the call of StepperIOS by testing the correct bit)

Thank you for reading me until the end ^_^
Gerard
rh
Administrator
Avatar
Gender:
Location: Germany
Age: 24
Homepage: e-lab.de
Posts: 5558
Registered: 03 / 2002
Subject:

Re: AVRCO Scheduler problem (big bug)

 · 
Posted: 11.11.2012 - 20:42  ·  #2
Hello Gerard,

you wrote that you have zoomed down your application to a minimum
so that the problem can be found. Please send me this short program
for testing/debugging purpose.

rolf
Gerard
 
Avatar
 
Subject:

Re: AVRCO Scheduler problem (big bug)

 · 
Posted: 12.11.2012 - 11:45  ·  #3
Hello Rolf

Yes I wrote that I have zoomed down my application but in fact I keep all the source code and I use or force to use some parts of code by conditional compilation flag as
{$ifdef DebugCorruption }
.....
{$endif}


The main problem is to detect that the problem happens, the first effect is that the part of code which was well running doesn't run anyway, the second effect (and it is where I was lucky) is to check some variables in RAM and compare it with a mirror. I saw that the data corruption was always in the same part (about 1 or 2 bytes among an area of 200 bytes width) of the memory and depends of the global data of the modules of the code so I couldn't change the USES declaration and order of my project !!!!

Anyway I can send you my project but it is a little bit heavy at least 10 .pas files for 112kb of code (87% of the flash).

Otherwise, if you want to simulate the problem you can do this by :
* create a multiprocess project with the define :
Scheduler = iData, interruptible;
* create one process which do nothing in plus than the Main_Proc

or open a project with at least 2 processes and ensure that the scheduler is interruptible
Scheduler = iData, interruptible;

Fisrt ensure that, in the .lst project file, the compiler have written this (my application uses onSysTick but it is not necessary

SYSTEM.$INTERRUPT_TIMER0:
.DEB SYSTICKENTRY
... 12 assembler lines
CALL project.onSysTick
// here the scheduler enables the interrupts

LDI _ACCA, 1 SHLB IntFlag
OR Flags, _ACCA
SEI


then with the emulator set a breakpoint at the line (for me it was very very difficult, but for you as you have the source code it will be easy)

SYSTEM.RestorePrcsRegs:
LDD _FPTRHI, Z+9
LDD _FRAMEPTR, Z+8

This procedure is only called by the scheduler (I see this in the .lst of my project) and you will be able to verify that the "Global Interrupt Enable" bit of the Status Register enable flag bit is still set !
So if you have an interrupt as a timer interrupt like in the Stepper motor driver, the frame pointer used in the interrupt could be build with the high byte of frame pointer of the new process and the low byte of the old active process frame pointer.
In this case and if the interrupt handler uses the frame (as the StepperIOS userdevice of the Stepper motor driver) it is corrupting a part of the data memory which depends of many things so it is very difficult to detect it, and as I told you above if I change something in the modules used, it is quite impossible to detect the corruption.

Gerard
rh
Administrator
Avatar
Gender:
Location: Germany
Age: 24
Homepage: e-lab.de
Posts: 5558
Registered: 03 / 2002
Subject:

Re: AVRCO Scheduler problem (big bug)

 · 
Posted: 12.11.2012 - 12:18  ·  #4
Hello Gereard,

ok, I will check this.

rolf
Gerard
 
Avatar
 
Subject:

Re: AVRCO Scheduler problem (big bug)

 · 
Posted: 12.11.2012 - 12:40  ·  #5
Thank you Rolph
rh
Administrator
Avatar
Gender:
Location: Germany
Age: 24
Homepage: e-lab.de
Posts: 5558
Registered: 03 / 2002
Subject:

Re: AVRCO Scheduler problem (big bug)

 · 
Posted: 12.11.2012 - 14:34  ·  #6
Hello Gerard,

do I understand correct that this problem is fixed when I do something like this:
(simplified)

CLI;
STD Z+xx, _FPTRHI;
STD Z+xx, _FRAMEPTR;
SEI; // only if interruptible!

and

CLI;
LDD _FPTRHI, Z+xxx...
LDD _FRAMEPTR, Z+xxx...
SEI; // only if interruptible!

rolf
Gerard
 
Avatar
 
Subject:

Re: AVRCO Scheduler problem (big bug)

 · 
Posted: 12.11.2012 - 17:00  ·  #7
Hello Rolf

I am agree with the second part of your message but not with the first one.

About the first part, you said :
CLI;
STD Z+xx, _FPTRHI;
STD Z+xx, _FRAMEPTR;
SEI; // only if interruptible!

I suppose this code is extracted from the "SaveAllRegs1" procedure. If it is, It is very dangerous because "SaveAllRegs1" is a part of "SaveAllRegs:" which could be use in any user interrupt handler and then the interrupt will be re-enabled !!!
I think that "SaveAllRegs:" or "SaveAllRegs1" must not change the interrupt flag, this is done by the caller.

Moreover, an interrupt between the two lines
STD Z+xx, _FPTRHI;
STD Z+xx, _FRAMEPTR;
will not change anything IF the interrupt handler is correct, i.e. restore the frame pointer as it found it, but writing correct code is not the subject :-)

So for me you can leave the code as it was.


About the second part, you said :
CLI;
LDD _FPTRHI, Z+xxx...
LDD _FRAMEPTR, Z+xxx...
SEI; // only if interruptible!

Here I suppose again that it is extracted from
SYSTEM.RestorePrcsRegs:
LDD _FPTRHI, Z+9
LDD _FRAMEPTR, Z+8

So I agree with you for the CLI at the beginning, but the SEI is forbidden because RestorePrcsRegs is called by RestoreAllRegs which is used in interrupt code save/restore register and it will allow reentrancy in interrupt handler,
But in anyway the I flag will be put back to the previous value from the context stored by the caller code scheduler or interrupt handler.

So I think that
CLI;
LDD _FPTRHI, Z+xxx...
LDD _FRAMEPTR, Z+xxx...

is sufficient for the scheduler become transparent with the frame pointer.

Gerard

PS:
I think something else about frame pointer and interrupt but I'm not completely sure. I will try to demonstrate it by an example code later.
rh
Administrator
Avatar
Gender:
Location: Germany
Age: 24
Homepage: e-lab.de
Posts: 5558
Registered: 03 / 2002
Subject:

Re: AVRCO Scheduler problem (big bug)

 · 
Posted: 12.11.2012 - 18:04  ·  #8
Hello Gerard,

I'm aware of this CLI/SEI operations 8-)
I found a way to guard these FP operations without changing the interrupt flags.

1. Stepper CW bug fixed.
2. avoiding Frame parameter passing to StepperIOS is now possible with:
Code
{$NoFrame}
UserDevice StepperIOS(CW : boolean);
begin
  if _ACCA <> 0 then                              // use the direction info
    // ......
  else
    // .....
  endif;

Please note the switch $noframe !
CW is then passed in _ACCA
3. The FramePointer operations in multitasking is now protected against interrupts.

rolf
  • 1
  • 2
  • Page 1 of 2
Selected quotes for multi-quoting:   0

Registered users in this topic

Currently no registered users in this section

The statistic shows who was online during the last 5 minutes. Updated every 90 seconds.
MySQL Queries: 15 · Cache Hits: 14   137   151 · Page-Gen-Time: 0.038149s · Memory Usage: 2 MB · GZIP: on · Viewport: SMXL-HiDPI