Stablity/ starvation Issues
Introduction
What can cause stability issues in the Marshall link?
These stability issues happen usually either due to not using proper hardware –
- 24V/1.5A power source
- If using a USB to COM adaptor- it has to be Chipi-X, the only type of adaptor approved by Nayax's technical team. It was also proven in the past that other types of adaptors may (and tend to) cause stability issues.
- Issues with your machine's COM port (physical connection/ COM port management)
- If using Marshall over LAN- could be related to issues with your ETH connector, LAN communication issues etc.
Or due to "starving" the Marshall task- explained further bellow
You've mentioned that the Marshall task is being "starved". What does it mean?
Once the Marshall task does not get enough priority to run it's "starved" and would not behave properly. In that case, what happens is that your machine would stop sending the "Keep Alive"s. The device (VPOST/VPOSM) will identify that machine is disconnected due to the missing "Keep Alive"s and will launch a new pairing process. The starvation issue is the greatest issue you can have, and it can cause some unexpected behavior depending on when it takes place (which I think you've seen happen), so it's very important to solve it. Those issues can cause weird behavior from the consumer's POV, and can even cause the Marshall link to break, which shouldn't happen (and when that happens you wouldn't be able to make transactions or complete ongoing ones).
Possible causes for the starvation of the Marshall task: not giving the Marshall a high enough priority to run, having your machine do other things which will cause it to not respond to Marshall (keepalives or other commands) in time, extensive file-writing other than the SDK's etc.
What it looks like?
Once the Marshall task is starved you'd see the following:
- In the log file of the SDK will be printed:
"vmc_link: warning! sdk is not getting enough cpu! (starved for Xms)"
or:
" vmc_link: *** error: retransmitting... last_rx"
Or:
" lowlevel_serial: bytes available: 0 bytes "
- And at the same laps of time, in the GTrace will be printed something along the lines of:
Marshall: Sys,Err: No keep alive,Handle = 1
Marshall: Com error,Event = 1
Marshall: No keep alive message from the POS. retry number: 2
Marshall: Sys,Err: No keep alive,Handle = 1
Marshall: Com error,Event = 1
Should your peripheral continue to not respond (each packet would be sent once and would have another possible 2 retries which would try and get a response from your machine- would go into details about it bellow I the section about keepalives)- the Marshall link would be deemed broken. Then, the device will repeatedly send the "Reset()" command to launch the Pairing process. This means that the pairing process is initiates- which would mean that the VPOST/VPOSM is looking for a peripheral to communicate with. Once the machine responds to it (with the Firmware Info command)- it would complete the process by sending the Config command.
Until the connection is established, you'd see resets on the Marshall log and "Marshall: peripheral not connected" in the Gtrace log. Once the Marshall pairing process is completed a link is established between the VPOST/VPOSM and the peripheral the following would appear in the Marshall log:
> [1661418016846+( 432ms)] marshall_t: received reset
[1661418016847+( 1ms)] : rx: 09:00:00:6a:ff:ff:ff:ff:01:77:d6:
[1661418016853+( 6ms)] : tx:
87:00:00:00:00:30:00:00:05:00:02:0b:01:40:01:6d:
61:72:73:68:61:6c:6c:2d:6a:61:76:61:2d:73:64:6b:
00:00:00:30:31:32:33:34:35:36:37:00:00:00:00:00:
00:00:00:00:00:00:00:30:2e:31:2e:35:2e:31:33:00:
00:00:00:00:00:00:00:00:00:00:00:30:31:32:33:34:
35:36:37:00:00:00:00:00:00:00:00:00:00:00:00:6d:
61:6e:75:66:00:00:00:00:00:00:00:00:00:00:00:00:
00:00:00:30:2e:31:2e:35:2e:31:33:00:00:00:00:00:
00:00:00:00:00:00:00:9d:4a:
[1661418016874+( 21ms)] marshall_t: received config
[1661418016875+( 1ms)] : rx:
2a:00:00:00:00:35:01:30:06:02:01:01:e8:03:00:00:
00:24:00:00:24:00:00:02:35:35:30:36:30:30:31:32:
38:31:32:33:34:33:34:30:00:02:01:68:
[1661418017084+( 209ms)] marshall_t: received transfer_data
[1661418017084+( 0ms)] marshall_t: default credit: 1000
[1661418017084+( 0ms)] marshall_t: main fw ver: 4.0.8.23-RC03
[1661418017084+( 0ms)] marshall_t: pos fw ver: 2203-rc09
[1661418017084+( 0ms)] marshall_t: monyx id: 06911447
[1661418017085+( 1ms)] : rx:
34:00:05:01:00:35:01:30:0a:14:02:e8:03:12:0e:34:
2e:30:2e:38:2e:32:33:2d:52:43:30:33:00:13:0a:32:
32:30:33:2d:72:63:30:39:00:15:09:30:36:39:31:31:
34:34:37:00:4b:89:
[1661418017085+( 0ms)] : tx: 0a:00:00:01:01:30:00:35:00:00:bb:4b:
[1661418017096+( 11ms)] vmc_link: vmc is online. time: Thu Aug 25 12:00:17 IDT 2022
[1661418017097+( 1ms)] vmc_socket_t: received event: [init_done], state: [init]
[1661418017097+( 1ms)] Main: link with vpos serial: 0434321821006055
[1661418017097+( 0ms)] vmc_socket_t: changed state, from [init] to [idle]
[1661418017097+( 0ms)] Main: main fw version: 4.0.8.23-RC03
[1661418017097+( 0ms)] Main: pos fw version: 2203-rc09
[1661418017097+( 0ms)] Main: monyx id: 06911447
[1661418017097+( 0ms)] Main: default credit: 1000
[1661418017098+( 1ms)] : tx:
16:00:01:01:01:30:00:35:31:01:00:68:65:6c:6c:6f:
20:74:68:65:72:65:76:09:
[1661418017099+( 1ms)] : tx: 0a:00:00:02:01:30:00:35:00:00:39:93:
[1661418017099+( 0ms)] vmc_vend_t: received status: 20
[1661418017099+( 0ms)] Main: device available
[1661418017099+( 0ms)] vmc_socket_t: received status: 20
[1661418017115+( 16ms)] : rx: 0a:00:00:01:00:35:01:30:00:00:08:b0:
And in Gtrace it would show:
"Marshall sys: RCV - Firmware info - type - 0 subtype - 1
Marshall sys: RCV - Firmware info - Capabilities – 0"
How the "keepalive" mechanism works/what is the reason for the starvation
The "no keep alive" means that your peripheral did not respond to the keep alive, which takes place each second (unless configured differently), and in that case it would try and resend the packet. If there is not response it would try again (1 original attempt+ 2 retried), and if this attempt fails- the Marshall link would be deemed broken and the pairing process (the establishment of communication between the VPOST/VPOSM and your charger) would be re-initiated.
In some cases, the starvation is causes by poor task handling, or by having an extensive file-writing (unrelated to the Marshall log writing) as while a file is being written all task are halted as to not corrupt the flash.
In addition, the starvation could be due to your peripheral doing things not related to Marshall, causing it to not respond to the Marshall task in time.
Can I increase the time between keepalive/ would that solve the starvation issue?
The SDK is made to work best with a keepalive of 1 second, and increasing the keepalive to a higher amount won't solve the issue of starvation- it would just take much longer to realize there's some issue with the communication. We don't recommend having it be set to more than 5 seconds. For example, if you'd have it configured to 30 seconds, it would take 3*30= 90 seconds for the SDK to realize there's an issue, and in that time the VPOST may appear as if it's "frozen"/unresponsive and confuse consumers. Have the keep alives take place each second are not the thing which is overloading the processor, as the issue is that the task does not get enough priority to start with.
Should I have the keepalives written in the logs? Would you be aware of any issues with them if they do not appear?
You can have them written in the logs but they would just make it really hard to read as you get a keepalive (and a response to it) every second, so in an hour-long log you'd have at least 3,600 records. The "debug level dump moderate" means that only the important information would be logged, such as the pairing process, transactions, statuses, starvation warning etc., so if you'd have an issue with keepalives/packets going missing- you'd get the same warning/information in the "debug level dump moderate " logging versus " debug level dump all". Bottom line- I would highly recommend using "debug level dump moderate" instead of "debug level dump all".
Updated about 18 hours ago