Clients of the sync&share system (CERNBOX) are particularly exposed to "operational failures" due to heterogeneity of hardware, OS and network environments.
Sync&share system operates in very heterogenous network environment: from fast, reliable network inside the computing center to unreliable, high-latency ad-hoc connections such as from air-ports etc.
Windows filesystems have substantially different semantics (e.g. locking) from Unix filesystems -- these difference affect the synchronization process
the goal of the R&D is to analyze the environment and identify the relevant classes of failures in order to provide a reproducible framework for injecting faults at the system level for testing client-server data transmission
* network slowdown or packet loss
* local disk failure
* checksum errors
* failed software upgrades
the work is supported by real monitoring and logging data: failure patterns in an existing service (CERNBOX)
the work extends on existing testing framework (smashbox)