Jump to content

Photo

Rainbow Death Crash!


  • Please log in to reply
27 replies to this topic

#16 Evan20000

Evan20000

    P͏҉ę͟w͜� ̢͝!

  • Members
  • Real Name:B̵̴̡̕a҉̵̷ņ̢͘͢͜n̷̷ę́͢d̢̨͟͞
  • Location:B̕҉̶͘͝a̶̵҉͝ǹ̵̛͘n̵e̸͜͜͢d҉̶

Posted 22 July 2017 - 10:39 AM

If it happens more in some quests than others, and it's most often in one heavily scripted quests, I do wonder if it's caused by a script error. Any chance the script is trying to load FFC #0 at some point? That's the big issue we already know about.
 

I can't speak for all of them, but the recorded instances of it in my quests are seemingly at random. Some screens it's happened in have had a lot going on, others have been benign screens where only the global is running. If there's some underlying pattern to it, I'm not seeing it.



#17 DarkDragon

DarkDragon

    Junior

  • Members

Posted 22 July 2017 - 12:12 PM


Why was this changed, from the commented line?

LoadFFC returns one FFC object (of possibly several that the script is simultaneously manipulating), whereas writing to REFFFC would set global interpreter state:

ffc one = Screen->LoadFFC(1);
ffc two = Screen->LoadFFC(2);
one->X = 10; // you expect this to change FFC #1, not FFC #2

Edited by DarkDragon, 22 July 2017 - 12:13 PM.


#18 Timelord

Timelord

    The Timelord

  • Banned
  • Location:Prydon Academy

Posted 22 July 2017 - 02:02 PM

That is a very good question. Probably some ancient bug fix or optimization. Can't check the logs at the moment, unfortunately, so it will have to remain a mystery for now.

 
Barely tried. But I know why it happens, and the code is still there.

 

(Emphasis, mine.)

 

Please elaborate as best you can on that. Indicate the code that you claim causes this, so that I can seek a countermeasure. I'll leave ZC running overnight to confirm, as in the past, if ZC ran for more than an hour or two, this seemed to always occur. 

 

\

 

 

LoadFFC returns one FFC object (of possibly several that the script is simultaneously manipulating), whereas writing to REFFFC would set global interpreter state:

ffc one = Screen->LoadFFC(1);
ffc two = Screen->LoadFFC(2);
one->X = 10; // you expect this to change FFC #1, not FFC #2

 

 

Wait.... That is not the case for LoadNPC, which also treats each pointer that you create separately and does not fail to maintain the association:

 

I never noticed this. What precisely is used to set REFFFC?

 

It doesn't matter much how this is working, but the goal is to prevent loading illegal FFC indices. Changing LoadFFC to follow the same formula as LoadItem, LoadNPC, and so forth seems a viable fix, as the range can be clamped in a function do_load_ffc() in ffscript.cpp.

 


Edited by ZoriaRPG, 22 July 2017 - 02:18 PM.


#19 Saffith

Saffith

    IPv7 user

  • Members

Posted 22 July 2017 - 03:59 PM

I can't speak for all of them, but the recorded instances of it in my quests are seemingly at random. Some screens it's happened in have had a lot going on, others have been benign screens where only the global is running. If there's some underlying pattern to it, I'm not seeing it.

Using an invalid FFC corrupts some other area of memory, but it's hard to say what or where. It's probably usually benign, but it may also cause totally unpredictable issues.

This has happened to me a lot towards the end of The Hero of Dreams (specifically at the Commander boss and Ganondorf), and I do play in Windowed mode.

But that changes things. Yeah, no scripts in HoD. I wouldn't rule out something happening if you played a scripted quest beforehand and then switched to HoD without restarting the program, but I doubt that's the case.

Please elaborate as best you can on that. Indicate the code that you claim causes this, so that I can seek a countermeasure. I'll leave ZC running overnight to confirm, as in the past, if ZC ran for more than an hour or two, this seemed to always occur.

Total running time isn't a factor. It's a race condition. Actually, I think just letting it sit would mean only one thread was accessing the keyboard, so it would never happen that way. Might be wrong about that, though.
 
typedef struct KEY_BUFFER
{
   volatile int lock;
   volatile int start;
   volatile int end;
   volatile int key[KEY_BUFFER_SIZE];
   volatile unsigned char scancode[KEY_BUFFER_SIZE];
} KEY_BUFFER;
   buffer->lock++;

   if (buffer->lock != 1) {
      buffer->lock--;
      return;
   }

   // Do stuff

   buffer->lock--;
The keyboard is used from multiple threads, but the locking mechanism isn't thread-safe. Every time I've seen it happen, it's been the same thing: key_buffer->lock is -1, and setting it to 0 fixes it.
I don't think it should be hard to fix, actually. It's only used in one file.

#20 TheLegend_njf

TheLegend_njf

    Deified

  • Members
  • Real Name:Grant

Posted 22 July 2017 - 05:04 PM

But that changes things. Yeah, no scripts in HoD. I wouldn't rule out something happening if you played a scripted quest beforehand and then switched to HoD without restarting the program, but I doubt that's the case.

 

I had not played any scripted quest in awhile actually. So unless Shoelace added in scripts in a later build or something for his end game content, this would have nothing to do with scripts. 



#21 DarkDragon

DarkDragon

    Junior

  • Members

Posted 22 July 2017 - 09:48 PM

(Emphasis, mine.)

 

Please elaborate as best you can on that. Indicate the code that you claim causes this, so that I can seek a countermeasure. I'll leave ZC running overnight to confirm, as in the past, if ZC ran for more than an hour or two, this seemed to always occur. 

 

\

 

 

Wait.... That is not the case for LoadNPC, which also treats each pointer that you create separately and does not fail to maintain the association:

 

I never noticed this. What precisely is used to set REFFFC?

 

It doesn't matter much how this is working, but the goal is to prevent loading illegal FFC indices. Changing LoadFFC to follow the same formula as LoadItem, LoadNPC, and so forth seems a viable fix, as the range can be clamped in a function do_load_ffc() in ffscript.cpp.

 

 

It's all handled in the same way: any time any NPC variable is read or set, the REFNPC variable is first written to:

 

vector<Opcode *> LibrarySymbols::getVariable(LinkTable &lt, int id, int var)
{
    int label  = lt.functionToLabel(id);
    vector<Opcode *> code;
    //pop object pointer
    Opcode *first = new OPopRegister(new VarArgument(EXP2));
    first->setLabel(label);
    code.push_back(first);
    //load object pointer into ref register
    if(refVar!=NUL)
        code.push_back(new OSetRegister(new VarArgument(refVar), new VarArgument(EXP2)));
    code.push_back(new OSetRegister(new VarArgument(EXP1), new VarArgument(var)));
    code.push_back(new OPopRegister(new VarArgument(EXP2)));
    code.push_back(new OGotoRegister(new VarArgument(EXP2)));
    return code;
}

 

In other words, the REF variables are never used in a persistent way. ZScript scripts store their NPC references (and all other references) as integers in general-purpose registers, or the stack.



#22 Gleeok

Gleeok

    It's dangerous to dough alone, bake this.

  • Members
  • Real Name:Pillsbury
  • Location:Magical Land of Dough

Posted 23 July 2017 - 12:43 AM

Total running time isn't a factor. It's a race condition. Actually, I think just letting it sit would mean only one thread was accessing the keyboard, so it would never happen that way. Might be wrong about that, though.

typedef struct KEY_BUFFER{   volatile int lock;   volatile int start;   volatile int end;   volatile int key[KEY_BUFFER_SIZE];   volatile unsigned char scancode[KEY_BUFFER_SIZE];} KEY_BUFFER;
buffer-&gt;lock++;   if (buffer-&gt;lock != 1) {      buffer-&gt;lock--;      return;   }   // Do stuff   buffer-&gt;lock--;
The keyboard is used from multiple threads, but the locking mechanism isn't thread-safe. Every time I've seen it happen, it's been the same thing: key_buffer-&gt;lock is -1, and setting it to 0 fixes it.
I don't think it should be hard to fix, actually. It's only used in one file.

I always (still) suspected that it was a threading issue in allegro when it used to hang randomly and rarely on exit.

Anyway, what would you say is the best fix here; change lock to atomic? ..or is it a logical error where the ref_count (lock) can be invalidated?

#23 Timelord

Timelord

    The Timelord

  • Banned
  • Location:Prydon Academy

Posted 23 July 2017 - 02:32 AM

I'm going to run 2.50.2, and 2.53.0 side-by-side, all on 1st.qst,for the same duration; first sitting idle, then playing the quest; then sitting idle again; and see what happens.



#24 satokoaddict96

satokoaddict96

    .qst

  • Members
  • Real Name:Michael
  • Location:Norway

Posted 23 July 2017 - 03:28 AM

A friend of mine had the crash 3 times a while ago as he downloaded ZC 2.50.2 just to play my quest (no custom scripts). He's using a desktop computer with windows 10.

 

I've never had the crash myself over the years, but that can be luck for all I know. I've used ZC on laptops only with windows 7/8/8.1/10.



#25 Saffith

Saffith

    IPv7 user

  • Members

Posted 23 July 2017 - 05:40 PM

Anyway, what would you say is the best fix here; change lock to atomic? ..or is it a logical error where the ref_count (lock) can be invalidated?

Yeah, atomic operations ought to do it.

#26 KingPridenia

KingPridenia

    King of Pridenia, Safehaven of the LGBTQ

  • Members
  • Real Name:Adam
  • Location:Pennsylvania

Posted 24 July 2017 - 05:36 AM

I never had this happen to me personally, but I know MeleeWizard had issues with this constantly in his Isle of Rebirth let's play. I'm not sure if the amount of scripting in the quest has an effect on it or not, as Panoply of Catalia is also pretty script intensive, yet Melee never got the rainbow crash while LP'ing it. I know this isn't exactly helpful, but it seems that the rainbow crash is one of those fatal ZC bugs that is hard to fix, thanks to there seemingly being no way to trigger the crash on command; it just happens. It's just like another bug that's been around since 2.50 that randomly ZC will no longer respond to the keyboard when you try to F6, forcing you to actually click on end game than simply hitting Enter. A lot of people suffer through it, yet nobody has found a way to reproduce the issue on command.



#27 Avaro

Avaro

    o_o

  • Members
  • Real Name:Robin
  • Location:Germany

Posted 24 July 2017 - 05:56 AM

I never had this happen to me personally, but I know MeleeWizard had issues with this constantly in his Isle of Rebirth let's play. I'm not sure if the amount of scripting in the quest has an effect on it or not, as Panoply of Catalia is also pretty script intensive, yet Melee never got the rainbow crash while LP'ing it. I know this isn't exactly helpful, but it seems that the rainbow crash is one of those fatal ZC bugs that is hard to fix, thanks to there seemingly being no way to trigger the crash on command; it just happens. It's just like another bug that's been around since 2.50 that randomly ZC will no longer respond to the keyboard when you try to F6, forcing you to actually click on end game than simply hitting Enter. A lot of people suffer through it, yet nobody has found a way to reproduce the issue on command.

 

The one with the keyboard not responding apparently happens when ZC has been running for a long time, I think.



#28 Timelord

Timelord

    The Timelord

  • Banned
  • Location:Prydon Academy

Posted 24 July 2017 - 03:05 PM

The one with the keyboard not responding apparently happens when ZC has been running for a long time, I think.

 

As did I, however, I set up some ZC threads on Friday, each running 1st.qst.

 

First, I let each run for a full day, unpaused. No issues.

Next, i played through level 1 using each. No issues.

I let both sit (unpaused) over the next day, and still no issues.

Next, I will pause each, and see if I can trigger this problem. In fact, I am pausing both as of this post, and I will check on them in 12 to 24 hours.

 

I'm trying to narrow down a way to reproduce the issue, with 2.50.2, and 2.53.0 beta 2 running simultaneously, to see if one locks up, and the other does not. This test is on Windows 7, running in windowed mode.

 

I still suspect script corruption in this Rainbow Death issue, as both Saffith and I mentioned, it is possible to write to some memory areas that should not be accessed by loading invalid FFC IDs. The written values may affect nothing for a long time, but they could stick around lurking in the background, and persist through quest loads, if the areas that are corrupted are not cleared on quest init by ZC.

 

I'll try to bound ffcs for 2.53.0, and future builds one way or another; or DD might do it. We have a bit of an ongoing debate on the best way to do this over on AGN at present.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users