Category

Fault-tolerant computer systems

page 1

computer cluster

set of computers configured in a distributed computing system

OpenVMS, often referred to as just VMS, is a multi-user, multiprocessing and virtual memory-based operating system. It is designed to support time-sharing, batch processing, transaction processing and workstation applications. Customers using OpenVMS include banks and financial services, hospitals and healthcare, telecommunications operators, network information services, and industrial manufacturers. During the 1990s and 2000s, there were approximately half a million VMS systems in operation worldwide.

Spanning Tree Protocol

network protocol that builds a loop-free logical topology for Ethernet networks

collection of computer servers

Uptime is a measure of system reliability, expressed as the period of time a machine, typically a computer, has been continuously working and available. Uptime is the opposite of downtime.

replacing computer system components without shutting down the system

single point of failure

part whose failure will disrupt the entire system

transaction processing

information processing that is divided into individual, indivisible operations

recorded state of a computer storage system at a particular point in time

computer memory which detects and corrects errors

Byzantine fault

Fault in a computer system that presents different symptoms to different observers

data replication

making multiple copies of information to ensure consistency in computing

concept in computer science

use of a number of critical components for securing one or more functions of a system with the intention of increasing its reliability, usually in the form of a backup or fail-safe design

thumb|4G cellular failover for network resiliency

In engineering, a fail-safe is a design feature or practice that, in the event of a failure of the design feature, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. Unlike inherent safety to a particular hazard, a system being "fail-safe" does not mean that failure is naturally inconsequential, but rather that the system's design prevents or mitigates unsafe consequences of the system's failure. If and when a "fail-safe" system fails, it remains at least as safe as it was before the failure. Since many types of failure are poss

quantum error correction

techniques that enable reliable delivery of quantum data over unreliable quantum communication channels

Tandem Computers

American computer hardware manufacturer ( 1974–1997)

family of protocols for solving consensus in a network of unreliable processors

high-availability cluster

cluster of separate computers designed for high availability at the application level (even if individual nodes fail)

data redundancy

presence of data additional to the actual data that may permit correction of errors in stored or transmitted data

Round-robin DNS

load balancing technique in the Internet's Domain Name System (DNS)

log-structured file system

structure of file system that writes all information to a circular buffer

conflict-free replicated data type

data structure replicated across a network such that any replica is updatable independently, concurrently and without coordination, and any inconcistencies are algorithmically resolved with replicas’ states guaranteed to eventually converge

family of fault-tolerant servers

application checkpointing

a technique for inserting fault tolerance into computing systems

data synchronization

process of bidirectionally maintaining consistency of data stored in multiple data stores

SpaceWire is a spacecraft communication network based in part on the IEEE 1355 standard of communications. It is coordinated by the European Space Agency (ESA) in collaboration with international space agencies including NASA, JAXA, and RKA.

disk storage system which contains multiple disk drives

Triple modular redundancy

redundancy using three systems and voting to determine the result

replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability

Stratus Technologies

American producer of computer servers and software

consensus algorithm

spare component that is an active and connected part of a working system, ready to take over functionality with little or no interruption

redundant IO technology

reliability, availability and serviceability

quality of robustness of computer hardware

disk array controller

computer device that manages a hardware RAID array