Skip to main content

RI freeze while scheduling the future recordings

Please note these java.net forums are being decommissioned and use the new and improved forums at https://community.oracle.com/community/java.
7 replies [Last post]
srinivasrk
Offline
Joined: 2011-10-31
Points: 0

RI is getting freeze while running the TDK test TC0159, where we try to schedule 5 future recordings with 1 minute start time offset on different source ids. While scheduling, RI will update the metadata of the recording by calling RecordingContentItemLocal.updateMetadata(). In this method it is calling SIManagerExt.getTransportStream(), where it waits for infinite time for notification. This is not getting notification because RI stack is getting MPE_SI_NOT_AVAILABLE_YET from native layer (simgr.c). After this there is no SIChangeEvent happened as we are just scheduling a future recording I guess. So the SIRequestorImpl will not get notified for either success or failure. We observed that RI is freeze here because there are no logs in RILog.txt after this.

Why native layer is giving MPE_SI_NOT_AVAILABLE_YET:
I suspect the previous test case which tries to tune to a source id 1513 (which is also used by TC0159). This tune is getting failed because of vlc telnet connect error. I think at this point ts_handle->siStatus is updated as SI_AVAILABLE_SHORTLY. when it runs TC0159 it is checking for the same ts_handle->siStatus. As it's value is SI_AVAILABLE_SHORTLY, it is sending the exception as MPE_SI_NOT_AVAILABLE_YET.

Please correct me if I am wrong and confirm this is a bug or not?

I am attaching a log file which starts from the test case where the tuning has failed and ends with RI freeze. See the below logs where RI freeze initiated:

20120120 06:33:16.732 DEBUG RI.Stack- 1934969 [Thread-115] DEBUG service.SICacheImpl - org.cablelabs.impl.manager.service.SICacheImpl@67214eeb retrieveNetworks() called with org.cablelabs.impl.service.javatv.transport.TransportImpl@36138be2[uniqueID=TransportImpl4096, deliverySystemType=CABLE, handle=org.cablelabs.impl.manager.service.SIDatabaseImpl$TransportHandleImpl@1000[handle=1000], transportID=1]
20120120 06:33:16.748 DEBUG RI.Stack- 1934985 [Thread-115] DEBUG service.SICacheImpl - org.cablelabs.impl.manager.service.SICacheImpl@67214eeb Enqueued request org.cablelabs.impl.manager.service.SIRequestNetworks@20d7f06a
20120120 06:33:16.748 DEBUG RI.Stack- 1934985 [ThreadPool-System-5] DEBUG service.SIRequestNetworks - @20d7f06a Attempting delivery
20120120 06:33:16.748 DEBUG RI.Stack- 1934985 [Thread-115] DEBUG service.SICacheImpl - org.cablelabs.impl.manager.service.SICacheImpl@67214eeb retrieveTransportStream() called with ocap://0x5e9
20120120 06:33:16.748 DEBUG RI.Stack- 1934985 [Thread-115] DEBUG service.SIRequestImpl - @e8dcd1SIRequestImpl setExpirationTime: 9223372036854775807
20120120 06:33:16.748 DEBUG RI.Stack- 1934985 [Thread-115] DEBUG service.SICacheImpl - org.cablelabs.impl.manager.service.SICacheImpl@67214eeb Enqueued request org.cablelabs.impl.manager.service.SIRequestTransportStream@e8dcd1[transportID=1, frequency=ffffffff, sourceID=5e9, serviceName=null]
20120120 06:33:16.748 DEBUG RI.Stack- 1934985 [ThreadPool-System-4] DEBUG service.SIRequestTransportStream - @e8dcd1 Attempting delivery
20120120 06:33:16.748 DEBUG RI.Stack- <<JNI-SIDB>> checkReturnCode() called with returnCode=16390

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
srinivasrk
Offline
Joined: 2011-10-31
Points: 0

I have attached the zip file which contains the log as the size of the log is more than the limit.

Regards,

Kurma Srinivas

cpratt
Offline
Joined: 2008-12-18
Points: 0

Go ahead and file a bug for this.

And if you can reproduce this issue, please generate a thread dump after the stack appears frozen.

Thx,

cp

srinivasrk
Offline
Joined: 2011-10-31
Points: 0

I have one question. There are 24 places where Infinite Wait is used in RI code. These places may cause this kind of freeze problems. Can we replace all these calls with wait(some timeout) so that the execution will comeout of wait after timeout. If this is not possible can we handle these things in any other way? Need your inputs.

Thanks,

Kurma Srinivas.

pmodem
Offline
Joined: 2008-12-17
Points: 0

Can you point out where the infinite wait is used?

As far as SI requests are concerned there is an implicit timeout (default 15 sec) associated with those. These requests if not fulfilled in that time will timeout. Looking at the log it appears the platform is signaling a TUNE_UNSYNC event due to vlc connect error. The native SI layer is correctly resetting the internal SI state to NOT_AVAILABLE but it is not signaling any events. As a result the SIRequest will end up waiting until the timeout is reached. Correct solution is for the native to signal a an event to unblock all waiting callers and the request will fail with DATA_UNAVAILABLE.

If you file an issue tracker issue we will look into providing a fix.

srinivasrk
Offline
Joined: 2011-10-31
Points: 0

In this case RI freeze happened because the timeout for requesting transport stream is huge i.e. Long.MAX_VALUE. This is what I am referring as Infinite wait. There are total three types of SI requests where the expiration time is set to Long.MAX_VALUE.

  1. SIRequestServices
  2. SIRequestTransportStream
  3. SIRequestTransportStreams

These places may cause infinite wait. Please correct me if I am wrong.

I will raise a bug for this RI freeze.

Thanks,

Kurma Srinivas

srinivasrk
Offline
Joined: 2011-10-31
Points: 0

Any Update on this?

I have raised IT-598 for this issue.

Thanks,

Kurma Srinivas

cpratt
Offline
Joined: 2008-12-18
Points: 0

Yes - we have been looking into the issue a bit this week.

We believe this situation is being caused by insufficent error handling in the SI code for tune failure conditions.

The fix is 2-part: (1) to eliminate the NetworkInterfaceImpl's dependance upon a sync/no-sync event to dequeue the next tune request. (2) is to have the SIDB fail SI requests associated with a transport stream when the tune fails (similar to what's done for a PAT timeout).

We are testing the fixes currently and will provide updates via the issue (IT-598).