how to handle network downtime gracefully?

Discussion:

Mario Emmenlauer

2017-07-03 14:51:33 UTC

How can I gracefully handle network problems? In grpc, I used to
create the full interface even if the network was down, and later
when I try to call RPC methods, grpc would hang until it could
connect. That was quite simple, when the network came back the RPC
succeeded eventually.

What is the most graceful way to handle an unreliable network
connection in thrift?

Background:
I'm building a cross platform API with Java server and C++ client
in thrift. I use the binary protocol to send large files. I use two
transport channels, one that uses SSL to send the login credentials,
and a second one that may later be used to send large datasets (after
the login succeeded).

Currently I create the full interface. But if the network is down,
I get an exception somewhere after creating the secure socket, with
error "No more data to read".

All the best,

Mario Emmenlauer

--
BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203
Balanstr. 43 mailto: memmenlauer * biodataanalysis.de
D-81669 München http://www.biodataanalysis.de/

Randy Abernethy

2017-07-03 16:13:07 UTC

Permalink

Hi Mario,

The simplest form of error recovery (though not necessarily always the most
efficient) in RPC is to disconnect and reconnect. A reasonable starting
place is to write call code that operates within a protected block (e.g. a
"try" block) then when a non application error is thrown, the catch block
optionally disconnects (you may already be disconnected) and attempts to
reconnect and/or retry the call. This is a simple but reliable approach and
once working you can optimize as needed.

It is worth pointing out that RPC (of any kind) is not perfect for large
file transfer. RPC - Remote Procedure Call, is designed to let you invoke
remote functions and retrieve their results. The function call is an atomic
thing, it either completely succeeds or completely fails. "Procedure Call"
also infers some manageable size block of arguments and return values in
most world views. This means that all of the many small and large
architectural decisions made when creating Thrift were predicated on
reasonable sized inputs and outputs (< 1MB ish).

If you try to transfer a file by passing its data as an argument to a
server and the operation fails you make no progress. It may make sense to
use RPC directly as a file transfer scheme for small files where retrying
the entire transfer might be reasonable. For large files though it is
better to create an application level protocol where you pass modest sized
chunks of the file (in the 1MB handle say). This way if a chunk fails you
only re-transmit the chunk rather than the entire file. Also transferring
really large files (1GB+) in one go can overflow (or overtax) buffers on
the client but particularly on the server. Using chunks avoids this issue.
You can easily write a library wrapper for your chunked transfer that
allows clients to make a single call to transfer a large file with many RPC
transfers happening behind the scenes.

There are lots of ways to skin a cat of course. just some thoughts.

Very best,
Randy

Post by Mario Emmenlauer
How can I gracefully handle network problems? In grpc, I used to
create the full interface even if the network was down, and later
when I try to call RPC methods, grpc would hang until it could
connect. That was quite simple, when the network came back the RPC
succeeded eventually.
What is the most graceful way to handle an unreliable network
connection in thrift?
I'm building a cross platform API with Java server and C++ client
in thrift. I use the binary protocol to send large files. I use two
transport channels, one that uses SSL to send the login credentials,
and a second one that may later be used to send large datasets (after
the login succeeded).
Currently I create the full interface. But if the network is down,
I get an exception somewhere after creating the secure socket, with
error "No more data to read".
All the best,
Mario Emmenlauer
--
BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203
Balanstr. 43 mailto: memmenlauer * biodataanalysis.de
D-81669 MÃŒnchen http://www.biodataanalysis.de/

Mario Emmenlauer

2017-07-03 20:55:02 UTC

Permalink

Dear Randy,

thanks a lot for the many hints and insights, its very much appreciated!

I will certainly think about the chunked up- and download. Actually as a
first step, it seems already a reasonable improvement to implement a small
protocol for chunked data transfer on top of thrift RPC :-)

About the network disconnect and reconnect, I will do as you suggest! What
parts of the connection can be re-used? Basically my code currently boils
down to:
- create a socket
- create a transport on top of the socket
- create a protocol on top of the transport
- create the client interface on top of the protocol

I don't know if its always like this, but I gathered this from examples.
After a disconnect, when I want to reconnect, which objects would be
sensible to re-create, and which ones can e just re-used?

Thanks and all the best,

Mario

Post by Randy Abernethy
Hi Mario,
The simplest form of error recovery (though not necessarily always the most
efficient) in RPC is to disconnect and reconnect. A reasonable starting
place is to write call code that operates within a protected block (e.g. a
"try" block) then when a non application error is thrown, the catch block
optionally disconnects (you may already be disconnected) and attempts to
reconnect and/or retry the call. This is a simple but reliable approach and
once working you can optimize as needed.
It is worth pointing out that RPC (of any kind) is not perfect for large
file transfer. RPC - Remote Procedure Call, is designed to let you invoke
remote functions and retrieve their results. The function call is an atomic
thing, it either completely succeeds or completely fails. "Procedure Call"
also infers some manageable size block of arguments and return values in
most world views. This means that all of the many small and large
architectural decisions made when creating Thrift were predicated on
reasonable sized inputs and outputs (< 1MB ish).
If you try to transfer a file by passing its data as an argument to a
server and the operation fails you make no progress. It may make sense to
use RPC directly as a file transfer scheme for small files where retrying
the entire transfer might be reasonable. For large files though it is
better to create an application level protocol where you pass modest sized
chunks of the file (in the 1MB handle say). This way if a chunk fails you
only re-transmit the chunk rather than the entire file. Also transferring
really large files (1GB+) in one go can overflow (or overtax) buffers on
the client but particularly on the server. Using chunks avoids this issue.
You can easily write a library wrapper for your chunked transfer that
allows clients to make a single call to transfer a large file with many RPC
transfers happening behind the scenes.
There are lots of ways to skin a cat of course. just some thoughts.
Very best,
Randy

Viele Gruesse,

Mario Emmenlauer

--
BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203
Balanstr. 43 mailto: memmenlauer * biodataanalysis.de
D-81669 München http://www.biodataanalysis.de/