Here are the steps I follow to reproduce the issue:
1. Run LinuxCNC on a Ethernet FPGA card with a sserial remote
2. Dump packets
3. Packets show that on the write function the sserial DoIt command is written
before all data registers have been written.
This is what I expected to happen:
DoIt issued after all sserial user data is written
This is what happened instead:
DoIt issued before all data registers are written
It worked properly before this:
Not sure, problem exists in 2.7 and 2.8, and may exist in PCI /EPP drivers as well
评论 (30)
#2 – rene-dev 于 2018-03-01
I think I see what you see.
That seems to be the problem.
look at the dump, 0x1cc is written before 0x1d4
I think swapping those lines will fix it.
““
CS: 0x1cc
U0: 0x1d4
U1: 0x1dc
U2: 0x1e4
################
start of packet len 60
lbp16 cmd: 0xc282
write
addr: 0x1ec len: 8
lbp16 cmd: 0xc281
write
addr: 0x1bc len: 4
lbp16 cmd: 0xc281
write
addr: 0x1cc len: 4
lbp16 cmd: 0xc281
write
addr: 0x1d4 len: 4
lbp16 cmd: 0xc281
write
addr: 0x1d0 len: 4
lbp16 cmd: 0xc281
write
addr: 0x1d8 len: 4
lbp16 cmd: 0xd182
write
addr: 0x14 len: 4
end of packet
################
start of packet len 44
lbp16 cmd: 0x4282
read
addr: 0x1ec len: 8
lbp16 cmd: 0x4281
read
addr: 0x1bc len: 4
lbp16 cmd: 0x4281
read
addr: 0x1c4 len: 4
lbp16 cmd: 0x4281
read
addr: 0x1cc len: 4
lbp16 cmd: 0x4281
read
addr: 0x1d4 len: 4
lbp16 cmd: 0x4281
read
addr: 0x1d0 len: 4
lbp16 cmd: 0x4281
read
addr: 0x1d8 len: 4
lbp16 cmd: 0x5901
read
addr: 0x8 len: 2
lbp16 cmd: 0xd182
write
addr: 0x10 len: 4
lbp16 cmd: 0x5184
read
addr: 0x10 len: 8
sending 42 bytes
end of packet
################
#3 – rmu75 于 2018-03-29
with this patch, sserial via hm2_rpspi stops working.
#4 – andypugh 于 2018-03-29
I will have to see if I can find a Pi and a 7i90 then.
Can you suggest a simple test that demonstrates that it is broken?
#5 – rmu75 于 2018-03-29
Sorry I haven’t been more specific. hostmot2 errors out with “DoIt not cleared”. My “simple” scenario from the actual machine config:
““
> halrun
halcmd: loadrt trivkins
Note: Using POSIX realtime
halcmd: loadrt motmod servoperiodnsec=1000000 num_joints=3
halcmd: loadrt hostmot2
hm2: loading Mesa HostMot2 driver version 0.15
halcmd: loadrt hm2rpspi spiclkrate=25000
hm2/hm2_7i90.0: Smart Serial Firmware Version 43
hm2/hm2_7i90.0: 72 I/O Pins used:
hm2/hm2_7i90.0: IO Pin 000 (P1-01): PWMGen #0, pin Not-Enable (Output)
hm2/hm2_7i90.0: IO Pin 001 (P1-03): Muxed Encoder #0, pin Muxed A (Input)
hm2/hm2_7i90.0: IO Pin 002 (P1-05): Muxed Encoder #0, pin Muxed B (Input)
hm2/hm2_7i90.0: IO Pin 003 (P1-07): Muxed Encoder #0, pin Muxed Index (Input)
hm2/hm2_7i90.0: IO Pin 004 (P1-09): Muxed Encoder #1, pin Muxed A (Input)
hm2/hm2_7i90.0: IO Pin 005 (P1-11): Muxed Encoder #1, pin Muxed B (Input)
hm2/hm2_7i90.0: IO Pin 006 (P1-13): Muxed Encoder #1, pin Muxed Index (Input)
hm2/hm2_7i90.0: IO Pin 007 (P1-15): Muxed Encoder #2, pin Muxed A (Input)
hm2/hm2_7i90.0: IO Pin 008 (P1-17): Muxed Encoder #2, pin Muxed B (Input)
hm2/hm2_7i90.0: IO Pin 009 (P1-19): Muxed Encoder #2, pin Muxed Index (Input)
hm2/hm2_7i90.0: IO Pin 010 (P1-21): Muxed Encoder Select #0, pin Mux Select 0 (Output)
hm2/hm2_7i90.0: IO Pin 011 (P1-23): PWMGen #0, pin Out0 (PWM or Up) (Output)
hm2/hm2_7i90.0: IO Pin 012 (P1-25): PWMGen #0, pin Out1 (Dir or Down) (Output)
hm2/hm2_7i90.0: IO Pin 013 (P1-27): PWMGen #1, pin Out0 (PWM or Up) (Output)
hm2/hm2_7i90.0: IO Pin 014 (P1-29): PWMGen #1, pin Out1 (Dir or Down) (Output)
hm2/hm2_7i90.0: IO Pin 015 (P1-31): PWMGen #2, pin Out0 (PWM or Up) (Output)
hm2/hm2_7i90.0: IO Pin 016 (P1-33): PWMGen #2, pin Out1 (Dir or Down) (Output)
hm2/hm2_7i90.0: IO Pin 017 (P1-35): PWMGen #3, pin Out0 (PWM or Up) (Output)
hm2/hm2_7i90.0: IO Pin 018 (P1-37): PWMGen #3, pin Out1 (Dir or Down) (Output)
hm2/hm2_7i90.0: IO Pin 019 (P1-39): PWMGen #4, pin Out0 (PWM or Up) (Output)
hm2/hm2_7i90.0: IO Pin 020 (P1-41): PWMGen #4, pin Out1 (Dir or Down) (Output)
hm2/hm2_7i90.0: IO Pin 021 (P1-43): PWMGen #5, pin Out0 (PWM or Up) (Output)
hm2/hm2_7i90.0: IO Pin 022 (P1-45): PWMGen #5, pin Out1 (Dir or Down) (Output)
hm2/hm2_7i90.0: IO Pin 023 (P1-47): PWMGen #0, pin Not-Enable (Output)
hm2/hm2_7i90.0: IO Pin 024 (P2-01): IOPort
hm2/hm2_7i90.0: IO Pin 025 (P2-03): IOPort
hm2/hm2_7i90.0: IO Pin 026 (P2-05): IOPort
hm2/hm2_7i90.0: IO Pin 027 (P2-07): IOPort
hm2/hm2_7i90.0: IO Pin 028 (P2-09): IOPort
hm2/hm2_7i90.0: IO Pin 029 (P2-11): IOPort
hm2/hm2_7i90.0: IO Pin 030 (P2-13): IOPort
hm2/hm2_7i90.0: IO Pin 031 (P2-15): IOPort
hm2/hm2_7i90.0: IO Pin 032 (P2-17): IOPort
hm2/hm2_7i90.0: IO Pin 033 (P2-19): IOPort
hm2/hm2_7i90.0: IO Pin 034 (P2-21): IOPort
hm2/hm2_7i90.0: IO Pin 035 (P2-23): IOPort
hm2/hm2_7i90.0: IO Pin 036 (P2-25): IOPort
hm2/hm2_7i90.0: IO Pin 037 (P2-27): IOPort
hm2/hm2_7i90.0: IO Pin 038 (P2-29): IOPort
hm2/hm2_7i90.0: IO Pin 039 (P2-31): IOPort
hm2/hm2_7i90.0: IO Pin 040 (P2-33): IOPort
hm2/hm2_7i90.0: IO Pin 041 (P2-35): IOPort
hm2/hm2_7i90.0: IO Pin 042 (P2-37): IOPort
hm2/hm2_7i90.0: IO Pin 043 (P2-39): IOPort
hm2/hm2_7i90.0: IO Pin 044 (P2-41): IOPort
hm2/hm2_7i90.0: IO Pin 045 (P2-43): IOPort
hm2/hm2_7i90.0: IO Pin 046 (P2-45): IOPort
hm2/hm2_7i90.0: IO Pin 047 (P2-47): IOPort
hm2/hm2_7i90.0: IO Pin 048 (P3-01): Smart Serial Interface #0, pin RxData0 (Input)
hm2/hm2_7i90.0: IO Pin 049 (P3-03): Smart Serial Interface #0, pin RxData1 (Input)
hm2/hm2_7i90.0: IO Pin 050 (P3-05): Smart Serial Interface #0, pin RxData2 (Input)
hm2/hm2_7i90.0: IO Pin 051 (P3-07): IOPort
hm2/hm2_7i90.0: IO Pin 052 (P3-09): Smart Serial Interface #0, pin TxData0 (Output)
hm2/hm2_7i90.0: IO Pin 053 (P3-11): Smart Serial Interface #0, pin TxEn0 (Output)
hm2/hm2_7i90.0: IO Pin 054 (P3-13): Smart Serial Interface #0, pin TxData1 (Output)
hm2/hm2_7i90.0: IO Pin 055 (P3-15): Smart Serial Interface #0, pin TxEn1 (Output)
hm2/hm2_7i90.0: IO Pin 056 (P3-17): Smart Serial Interface #0, pin TxData2 (Output)
hm2/hm2_7i90.0: IO Pin 057 (P3-19): Smart Serial Interface #0, pin TxEn2 (Output)
hm2/hm2_7i90.0: IO Pin 058 (P3-21): IOPort
hm2/hm2_7i90.0: IO Pin 059 (P3-23): IOPort
hm2/hm2_7i90.0: IO Pin 060 (P3-25): IOPort
hm2/hm2_7i90.0: IO Pin 061 (P3-27): IOPort
hm2/hm2_7i90.0: IO Pin 062 (P3-29): IOPort
hm2/hm2_7i90.0: IO Pin 063 (P3-31): IOPort
hm2/hm2_7i90.0: IO Pin 064 (P3-33): IOPort
hm2/hm2_7i90.0: IO Pin 065 (P3-35): IOPort
hm2/hm2_7i90.0: IO Pin 066 (P3-37): IOPort
hm2/hm2_7i90.0: IO Pin 067 (P3-39): IOPort
hm2/hm2_7i90.0: IO Pin 068 (P3-41): IOPort
hm2/hm2_7i90.0: IO Pin 069 (P3-43): IOPort
hm2/hm2_7i90.0: IO Pin 070 (P3-45): IOPort
hm2/hm2_7i90.0: IO Pin 071 (P3-47): IOPort
hm2/hm2_7i90.0: registered
halcmd: loadrt classicladder_rt numRungs=200 numTimersIec=20 numPhysInputs=100 numPhysOutputs=100 numSections=10
creating ladder-state
halcmd: addf hm2_7i90.0.read servo-thread
halcmd: addf motion-command-handler servo-thread
halcmd: addf motion-controller servo-thread
halcmd: addf pid.x.do-pid-calcs servo-thread
halcmd: addf pid.y.do-pid-calcs servo-thread
halcmd: addf pid.z.do-pid-calcs servo-thread
halcmd: addf pid.s.do-pid-calcs servo-thread
halcmd: addf classicladder.0.refresh servo-thread
halcmd: addf hm2_7i90.0.write servo-thread
halcmd: start
halcmd: hm2/hm27i90.0: Smart Serial port 0: DoIt not cleared from previous servo thread. Servo thread rate probably too fast. This message will not be repeated, but the hm27i90.0.sserial.0.fault-count pin will indicate if this is happening frequently.
hm2/hm2_7i90.0: Smart Serial Comms Error: There have been more than 1 errors in 10 thread executions at least 200 times. See other error messages for details.
hm2/hm2_7i90.0: Smart Serial Port 0 will be stopped
halcmd:
The message “DoIt not cleared” appears instantly. Reverting this patch seems to fix this.
#6 – andypugh 于 2018-03-29
Is this a precompiled RPi image, or did you compile it yourself? (Just trying to save some effort on creating a system to replicate your problem)
#7 – pcw-mesa 于 2018-03-29
Have you tried running the servo thread slower? the race fix moves the DoIt to after the outgoing data is written to the user registers, this is needed to avoid a race condition, but it does reduce the thread timing margins. Another thing to try is to turn on debugging and see if the DoIt command is actually issued as this change may have exposed a RSPI bug
#8 – rmu75 于 2018-03-29
This is a stock raspbian 8 image with kernel version 4.9.65 with raspberry pi patch and rt-preempt rt56-v7 patch, cross compiled. Hardware is a Raspberry Pi 3 with cmdline 8250.nruarts=0 bcm2708fb.fbwidth=1600 bcm2708fb.fbheight=900 bcm2708fb.fbswap=1 vcmem.membase=0x3dc00000 vcmem.memsize=0x3f000000 sdhcibcm2708.enablellm=0 dwcotg.lpmenable=0 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait isolcpus=3
I am running with RTAPICPUNUMBER=3
I will try with slower servo-thread, higher SPI frequency and try enabling debug and report back.
#9 – andypugh 于 2018-03-29
It is possible that the hm2 read and hm2 write function ordering in the HAL file might matter too.
#10 – rmu75 于 2018-03-29
increasing servo period to 2000000 does not change anything. running with spi_debug=4 shows
““
INFO----REALTIME allocations for classicladder:
Sizes: rungs- 200 bits- 20 words- 20 timers- 10 mono- 10 count- 10 IEC timers- 20
HAL Bin- 100 HAL Bout- 100 expressions- 100 sections- 10 symbols - 200
s32in - 10 s32out- 10 Error bits-10
halcmd: addf hm2_7i90.0.read servo-thread
halcmd: addf motion-command-handler servo-thread
halcmd: addf motion-controller servo-thread
halcmd: addf classicladder.0.refresh servo-thread
halcmd: addf hm2_7i90.0.write servo-thread
halcmd: start
halcmd: MOTION: setting Traj cycle time to 2000000 nsecs
MOTION: setting Servo cycle time to 2000000 nsecs
hm2/hm2_7i90.0: Num Auto = 3
hm2/hm2_7i90.0: Enabled Remotes tag = = 7
hm2/hm27i90.0: Smart Serial port 0: DoIt not cleared from previous servo thread. Servo thread rate probably too fast. This message will not be repeated, but the hm27i90.0.sserial.0.fault-count pin will indicate if this is happening frequently.
hm2/hm2_7i90.0: Smart Serial Comms Error: There have been more than 1 errors in 10 thread executions at least 200 times. See other error messages for details.
hm2/hm2_7i90.0: Smart Serial Port 0 will be stopped
#11 – rmu75 于 2018-03-29
> It is possible that the hm2 read and hm2 write function ordering in the HAL file might matter too.
reordering read and write doesn’t make any difference, neither does removing the other calls.
#12 – rmu75 于 2018-03-29
rebuilt hm2rpspi with RPSPIDEBUG. halrunning this
““
loadrt trivkins
loadrt motmod servoperiodnsec=2000000 num_joints=3
loadrt hostmot2
loadrt hm2rpspi spiclkrate=25000 spi_debug=5
loadrt classicladder_rt numRungs=200 numTimersIec=20 numPhysInputs=100 numPhysOutputs=100 numSections=10
addf hm2_7i90.0.read servo-thread
addf motion-command-handler servo-thread
addf motion-controller servo-thread
addf classicladder.0.refresh servo-thread
addf hm2_7i90.0.write servo-thread
start
yields debug.log on stderr.
#13 – andypugh 于 2018-03-29
The debug log isn’t adding much info.
It seems that the DoIt fault isn’t causing the system to stop, what is causing the problem is a comms error.
It might be interesting to mask that fault (set the dec > inc in the sserial params. See fault-inc, fault-dec and fault-lim here: http://linuxcnc.org/docs/2.7/html/man/man9/sserial.9.html
I am not suggesting this as a solution, just wondering how bad the problem is.
#14 – pcw-mesa 于 2018-03-29
Yeah you probably need a log of all HM2 read and writes (data and addresses).
I thought there was a way to do this but I dont see it now
#15 – jasenk2 于 2018-03-29
Hi all,
we have similar problem kernel is not full preemptive. That’s all folks.
2018-03-29 22:14 GMT+03:00 andypugh
> The debug log isn’t adding much info.
> It seems that the DoIt fault isn’t causing the system to stop, what is
> causing the problem is a comms error.
> It might be interesting to mask that fault (set the dec > inc in the
> sserial params. See fault-inc, fault-dec and fault-lim here:
> http://linuxcnc.org/docs/2.7/html/man/man9/sserial.9.html
> I am not suggesting this as a solution, just wondering how bad the problem
> is.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/LinuxCNC/linuxcnc/issues/412#issuecomment-377342362>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ACATkQMw3X_YAMKwotqSo8AyefOl0noZks5tjTKQgaJpZM4SXfxz>
> .
>
#16 – andypugh 于 2018-03-29
What bitfile are you using in the 7i92? (Just in case it is bitfile-specific)
#17 – rmu75 于 2018-03-29
> The debug log isn’t adding much info.
It seems that the DoIt fault isn’t causing the system to stop, what is causing the problem is a comms error.
I don’t think communication per se does not work, as the whole thing is working when i back out the patch, i.e. I/O via the sserial-connected 7i90s is possible and working.
The sserial slaves do have stock bitfile, did not modify anything on them. Not sure about the bitfile in the spi-connected 7i90, I had to replace the stock firmware as that did not work. IIRC I built a bitfile, but ultimately flashed an updated version. Unfortunately the machine that I used to build the bitfile and to flash the card is gone, so I can’t have a look now (will have to dig out some backups).
#18 – rmu75 于 2018-03-29
> Yeah you probably need a log of all HM2 read and writes (data and addresses).
I thought there was a way to do this but I dont see it now
In theory I could attach an oscilloscope or a logic analyzer, but that would take some time.
My current theory is that hm2_rpspi possibly truncates the transfer, so the DoIt doesn’t make it when it is the last thing to be transferred.
#19 – rmu75 于 2018-03-29
I hm2registertramwriteregion-ed a dummy transfer after the DoIt, problem gone. So either hm2_rpspi is doing something strange (truncating transfer) or I have SPI issues or something else.
Will try to hook up oscilloscope tomorrow.
#20 – andypugh 于 2018-03-29
It has to be hm2_rpspi as all the code above the llio driver is shared.
There are two spi_write functions, I am not sure which one is used under which circumstances.
But I would be very interested to see what happens if this is changed to a post-decrement:
https://github.com/LinuxCNC/linuxcnc/blob/master/src/hal/drivers/mesa-hostmot2/hm2_rpspi.c#L389
#21 – rmu75 于 2018-03-29
Here, it doesn’t change anything. I’m using SPI0. Comment suggests that hm2rpspiwritespi0 is used for SPI0 and …spi1 for SPI1.
Is there some register that is safe to dummy-write on the 7i90?
#22 – rmu75 于 2018-03-29
I logged read/write calls to hm2rpspiread|write_spi0 (see attached file), format is address, size and first word of buffer. debug2.log
#23 – andypugh 于 2018-03-30
Thanks for being so willing to help diagnose the problem. Hopefully PCW can work out if that log looks right.
#24 – pcw-mesa 于 2018-03-30
It is suspicious that moving the sserial command write to the end of the write function causes problems.
But I dont think the command is truncated since the DoIt command write does seem to succeed:
4973 hm2rpspi: hm2rpspiwritespi0: 5a00 4: 1007
because when the sserial command register is read back it contains 0x1007:
4979 hm2rpspi: hm2rpspireadspi0: 5a00 4: 1007
You would expect sserial errors if the normal read/process/write/wait order were not followed
and I’m assuming that the hal file was returned to normal if any changes to this order were done
So it sort of looks like a thread order issue to me (not enough delay between write and read)
(there should be more than 1 ms between the command register write and read with the 2 ms servo thread)
#25 – andypugh 于 2018-03-30
Is it possible that the SPI hardware is disabled a bit too soon, and does not have time to finish sending the data?
I am looking at the spi0reset call here: https://github.com/LinuxCNC/linuxcnc/blob/master/src/hal/drivers/mesa-hostmot2/hm2rpspi.c#L673
#26 – rmu75 于 2018-03-30
I didn’t have time to hook up the oscilloscope to the machine today, will try to replicate the setup in the lab and check for SPI transfer trunaction issues.
> Is it possible that the SPI hardware is disabled a bit too soon, and does not have time to finish sending the data?
I am looking at the spi0_reset call here:
That would be weird. Each line in the logfile is a call to hm2rpspireadspi0 resp. write, spi0reset is called every time.
#27 – andypugh 于 2018-03-30
I am unclear why spireset needs to be called at all. And I am wondering if, possibly, it is called before the SPI buffer is clear. But I don’t even know if there is_ an SPI buffer.
#28 – rmu75 于 2018-03-30
It is more or less the same code as in the driver of the linux kernel, see https://github.com/torvalds/linux/blob/master/drivers/spi/spi-bcm2835.c#L129
I just replaced the body of spi0_reset with a return, no changes. Works if I transfer dummy after DoIt, does not work if DoIt is last write.
#29 – pcw-mesa 于 2018-03-30
The fact that the read of the sserial command register at 0x5A00 returns the last data written suggests that the write took place but perhaps the thread order is not as it should be (a pre-existing problem masked by the old code writing the sserial command register early).
#30 – andypugh 于 2018-03-30
The thread order in the HAL file looked correct. I would have considered the possibility that the thread time was such that read followed close upon write, except that the report was that removing functions from threads made no difference.
#1 – andypugh 于 2018-02-28
I think I see the problem.
But need to test to see if it is as simple as it looks.