[Grbl_Esp32 PR#716] fix race with sys_rt_exec_state by breaking out bools

未分类 bolang 6个月前 (10-14) 47次浏览

Issue #716 | 状态: 已关闭 | 作者: treib | 创建时间: 2020-12-21


A fix for the race condition with sysrtexec_state that breaks out all the state bits into separate bools.

A few notes:

– I renamed the cycle_stop global that was broken out previously to be consistent with the newly-added bits.
– In the existing code, in three places (the main protocolexecrtsystem() as well as the limit check loops in Limits.cpp and CoreXY.cpp) the ExecState is captured into a local copy which is then used in the if/else logic. To keep the functionality and structure the same, I made a sysgetrtexec_state() call in System.cpp that pulls all the global bools back into a bitfield for this purpose. This all should be functionally the same as before.

Note that this call includes the cycle_stop bit as it used to in the olden days, and I changed the if/else to check it that way for consistency. (This may make line 275 in Protocol.cpp look like a mistake at first glance, but it should be correct.)


评论 (7)

#1 – treib 于 2020-12-23

That’s a good question on whether the bitfield approach is necessary. Its main purpose seems to be to take a snapshot of all the booleans before entering the gob of if/else statements that act on those bits. This is mainly in protocolexecrt_system(), but is also done during homing in Limits.cpp and CoreXY.cpp.

It is hard to tell whether any of that if/else logic could be tripped up if a state changes mid-way. Without fully diving in to think it through, it might be a safer approach to continue to snapshot the values.

Note that that when it comes to setting those values, I do just alter the new bools directly. It is only this snapshot used by the if/else logic where I was proposing still bringing it together into a bitfield, just as a cleaner way to snapshot the values and keep the code looking and working as close to the way it was before.


#2 – MitchBradley 于 2020-12-23

I have studied this code fairly carefully and I don’t see any places where a bad thing will happen if a variable gets set in the middle of the procedure. Typically you run into trouble when a flag is tested true, thus initiating some operation, then it becomes false in the middle of that operation. That is the classic “synchronizer failure” that I fought frequently when I was a hardware designer/diagnostician. In this case, the flags are only cleared by the handler code, so a previously-tested-as-true flag cannot inexplicably go false.

I really dislike “just in case” code. In my experience, it always causes problems down the road, by cluttering up the logic and making future maintainers wonder “why is this here” – when the answer is that nobody really knows.

I think what we need to do is test this extensively.


#3 – treib 于 2020-12-23

Thanks Mitch, yes, I definitely understand where you’re coming from and agree entirely. Especially being new to the codebase, I was taking a conservative approach of not changing things that didn’t need to change.

In this case, I would still advocate for keeping the “snapshot” methodology in protocolexecrt_system(). My reasoning is:

protocolexecrt_system() is very much a state machine, taking an input state and creating a new output state. I think capturing that input state in a snapshot and working with it makes that clearer and less error-prone than manipulating the input and output state together as one.

Another concern I have is it would be very difficult to test a change like this and be 100% confident in correct operation. The specific timing of signal changes that could result in failure would likely be hard to hit intentionally.

I think I see at least one case where the existing code would behave differently if we change this: Let’s say motionCancel and ‘cycleStart’ bits are set and we were in a jog. In Protocol.cpp:

1. line 275 if statement is true (motionCancel and cycleStart are set)
2. line 288 if statement is true (motionCancel is set)
3. line 290 if statement is true (not in Alarm or CheckMode)
3. line 311 if statement is true, causing motionCancel to be cleared to false
4. line 375 is true because of cycleStart
5. line 378 is testing if motionCancel is set. With the snapshot method, it would still be set, but not with the direct manipulation method.

I don’t know how likely it is that motionCancel and cycleStart would be true at the same time, and I’m not sure if it would have a real impact even if they were. But this gives me the sense that this code was written assuming that output state would be modified without impacting input state.


#4 – bdring 于 2021-01-06

@treib Thanks for your P.R. I hope to address this issue soon and approve the PR. The PR should target the devt branch. We do this with all normal PRs to prevent main getting ahead of devt.

Here is a link to our Discord server. We discuss firmware development the on the grbl-esp32-dev channel.


#5 – treib 于 2021-01-16

@bdring Thanks, and sorry for targeting the main branch instead of devt. Should I check out devt, apply my changes, and submit a new P.R.? Or is there a better way for me to fix this up? (This was my first real pull request so I’m still figuring things out.) Thanks, Phil


#6 – bdring 于 2021-01-16

@treib If you apply it against devt, that will be easiest for us. You can close this P.R.


#7 – treib 于 2021-01-17

Will submit a new P.R. against the Devt branch.


原始Issue: https://github.com/bdring/Grbl_Esp32/pull/716

喜欢 (0)