Volume 18 - Issue 2

Case Report Biomedical Science and Research Biomedical Science and Research CC by Creative Commons, CC-BY

Requirements for a Vision Based System for Detecting Epileptic Seizures

*Corresponding author:Bin Zhao, School of Science, Hubei University of Technology, Wuhan, Hubei, China

Received:February 08, 2023; Published:March 23, 2023

DOI: 10.34297/AJBSR.2023.18.002458


Epileptic seizures can pose a risk to those affected, which can be reduced by seizure monitoring. However, the automated visionbased detection of epileptic seizures comes with some challenges of computer vision. In this work, basic knowledge about epileptic seizures and relevant computer vision methods are presented. Based on this and other literature, requirements for a system for detecting seizures are derived and discussed.

Keywords: Computer vision, Vision based seizure Detection, Applied computing, Epilepsy, Action Recognition, Requirements


Motivation and Goal

Epilepsy is a neurological disorder associated with different types of recurrent seizures. These are often a burden for those affected and can pose a threat to their lives. A system for detecting epileptic seizures can relieve those affected. It is assumed that many of those affected and whose relatives have difficulty correctly estimating the frequency of seizures. Since physicians are dependent on this information for treatment, the incorrect assessment can lead to over- or under-medication of the patient. This can lead to an increase in side effects or, if not treated, to persistence of the seizures [1]. Generalized motor seizures in particular can pose a threat to the patient’s life. Timely detection of nocturnal seizures is believed to be associated with reduced patient risk [2].


The present work examines the requirements for a vision-based system for the detection of epileptic seizures consist. These are discussed in the context of the state of the art for seizure, pose and action recognition. For this purpose, the disease epilepsy and the associated types of seizures are presented. Subsequently, currently existing systems for seizure detection, as well as the state of the art in relation to action and pose detection. Building on this, the work examines and discusses requirements for vision-based systems for detecting seizures.



Epilepsy is a neurological disorder that encompasses a heterogeneous group of disorders in which the patient has an increased likelihood of suffering epileptic seizures [3]. These seizures are characterized by “abnormally excessive or synchronous neural activity in the brain” [3]. There are over 30 types of epilepsy and over 10 types of seizures [4]. One speaks of epilepsy when there are “repeated, unprovoked” [5] seizures or when EEG or MRI diagnostics indicate an increased readiness for seizures [5]. Video- EEG is the gold standard for diagnostics, which can be used to compare seizures in video and EEG [6]. More than 10% of all people experience an epileptic seizure at least once in their life. Epilepsy is one of the most common neurological diseases. Especially in childhood and from the age of 60 there is a sharp increase in new cases [7].

Seizure Types

Epileptic seizures are classified primarily by their origin in the brain, based on whether they are confined to one hemisphere or spread across both hemispheres [8]. In addition, the respective types are divided according to the characteristics of the seizure. Motor seizures are seizures that involve the muscles [9]. This can be the case with both focal and generalized seizures. On the one hand, there are simple motor seizures that lead to unnatural movements [10]. On the other hand, complex seizures occur in which there are movements that can be classified in a different context than normal movement [10]. In an atonic seizure, there is a loss or reduction in muscle tone in the affected area [9]. In tonic seizures, highfrequency muscle contractions cause muscle groups to cramp for a few seconds to minutes [9,11]. Clonic seizures are characterized by regular, continuous twitching, which occurs at a frequency of 2-3 clones/second [9]. In myoclonic seizures, single or multiple short jerks occur [9]. Focal seizures describe seizures that are limited to one hemisphere [7]. The symptoms of the seizure depend on which brain region is affected [5]. Focal seizures can be classified according to whether the seizure is experienced consciously and whether the seizure is motor or non-motor [8]. Motor focal seizures are characterized by unilateral involvement [5]. Non-motor partial seizures include autonomic, cognitive, emotional, and sensory seizures [8]. In the case of cognitive seizures, the patient has, for example, limitations in speech, thinking or hallucinations [8].

Generalized seizures are seizures that spread to parts of both hemispheres [7]. These seizures account for 1/3 of all seizures [5]. They are also divided into motor and non-motor seizures [8]. Nonmotor generalized seizures are seizures in which there is a sudden cessation of activity lasting a few seconds to a minute and there is no response to speech [8]. Generalized clonic seizures are associated with “rhythmic bilateral twitching of the extremities and often of the head, neck, face and trunk” [8]. In generalized tonic seizures, there is bilateral stiffness, as well as lifting of the extremities and neck [8]. The extremities may be in unnatural positions of flexion or extension [8]. A frequency of tonic and clonic phases occurs [9]. They are characterized by “unconsciousness, a fall, cramps all over the body, twitching of the arms and legs and a subsequent state of exhaustion or confusion” [5]. These seizures must be differentiated from other conditions with a similar presentation. For example, a distinction must be made between fainting spells and psychogenic non-epileptic seizures [11,12]. A generalized clonic-tonic seizure is always an emergency. If the seizure does not subside within a few minutes, medical intervention is required. During a seizure, one should secure the patient’s area and monitor the duration and nature of the seizure [4].

Comorbidity and Risks

The comorbidities of epilepsy include, for example, osteoporosis and gastrointestinal complaints, but also mental illnesses and neuropsychological impairments. They can be related to the disease, the medication, and societal stigmas. During an epileptic seizure, the person affected is also exposed to an increased risk of injury [4]. In epilepsy patients, the risk of an unnatural death from suicide, accidents and desired or accidental Dying from drug poisoning increased [13]. Especially in (nocturnal) tonic-clonic generalized seizures SUDEP (sudden unexpected death in epilepsy) can occur [14]. The affected person usually dies in the context of a seizure [14]. The risk of dying from SUDEP decreases when nocturnal seizures are identified, for example when the sufferer is not sleeping alone or is being monitored [2].

Stand Der Technik

Pose Estimation

Pose estimation addresses the problem of finding anatomically relevant regions of the human body [15] to describe a 2D or 3D pose. Convolutional neural networks (CNNs) are often used in 2D detection [16]. There are methods that first detect the joints from individual RGB images and then model their relationships, for example in [15]. In addition, there are holistic approaches in which the pose is detected directly in the image, for example by CNNs [17]. The 3D pose estimation from individual images can be differentiated between methods that first recognize the 2D pose from images and derive the 3D pose from it, and those that derive the pose directly. When deriving from the 2D pose, methods of varying complexity are used to estimate the depth information, for example nearest neighbor methods and neural networks. CNNs are also often used to derive 3D poses directly from RGB images. Training the CNNs requires a large amount of training data, which can be labeled manually in the case of 2D estimation. In the 3D application, on the other hand, the poses are recorded using motion capture systems. Thus, the recording takes place in a restricted environment, which limits the portability to real scenes [17].

To recognize the pose in 3D, data from depth cameras are used in addition to RGB images. They bring advantages in the case of occlusions and demanding exposure conditions and thus enable robust, fast pose recognition [18]. However, the use of depth data is associated with increased costs, low sensor accuracy and limited applicability [18]. The Kinect can only be used indoors [16].

Pose detection is challenging when there is occlusion of body parts by the body itself or other objects and people and the body parts in the non-visible space need to be estimated. A method for recognizing several, also obscuring, persons in an image is described in [15].

Action Detection

The recognition of human actions (action recognition) is a challenging problem, on the one hand due to the diversity of the human body, on the other hand due to the complexity and variability of human movements and actions [19]. So you can perform the same action by the same person when repeated multiple times. Differences in execution are to be expected, especially with different people. In action recognition, a distinction is made between the action classification of image sequences with a single action and the finding of actions in an image sequence that can also contain other actions, i.e. action detection. The detection of actions is mostly based on the concept of the sliding window and is characterized by a high computational complexity [18]. Action recognition methods are based on handcrafted action features or end-to-end deep learning. Handcrafted action features are different approaches to encoding human movements across space and time. For this purpose, trajectories of the derived joints can be used in the RGB and depth image area, but also other features. It is important that features are extracted via which activities can be robustly recognized. Skeleton-based methods depend on the performance of the pose recognition. Even with end-to-end action recognition with deep neural networks that learn the features independently, there are different approaches to using RGB data. For example, images and optical flow are used together in two stream convolutional networks or the movements of detected skeletons are learned. LSTMs (long short-term memory) and 3D convolution networks, for example, are used to integrate temporal sequences into the recognition [18].

There are also approaches that use depth data and skeletons derived from it [16] or train neural networks directly using the depth data [18]. The use of depth data in action detection has the advantage that the method is more robust to changes and movements in the background, as well as exposure conditions [18,19]. In addition to sequence recognition, actions can be statically recognized in individual images. It depends on the action whether individual images are sufficient for recognition or whether videos are better suited [20].

Detection Devices of Epileptic Seizures

Existing, medically approved or peer-reviewed devices for detecting epileptic seizures are presented in [1]. No systems based on video data were found. Instead, there are devices for detecting absence seizures via EEG and motor seizures via motion sensor bracelets/watches, under-mattress motion and audio sensors for nocturnal seizures, as well as surface electromyography and other wearable, multimodal devices. Some of the devices are approved as medical devices in the USA or the EU [1]. An evaluation of different, partly multimodal devices for the detection of motor seizures in [21] shows that almost all devices considered in the work are developed and evaluated in Epilepsy Monitoring Units (EMU), i.e. in the clinical environment, since video EEG Data are recorded as the gold standard of diagnostics. This raises the question of the transferability of the results, since it is not clear whether the sensitivity and false detection rate of the devices measured in the EMUs are representative for domestic use. The authors assume that this is not the case, since the behavior of the users at home differs from the behavior in the clinical environment, for example with regard to getting up at night [21].

Vision Based Seizure Detection

Existing methods for vision-based seizure detection can be divided into those that use conventional motion analysis methods and, more recently, those that use machine learning and, in particular, deep learning [22]. The work evaluated in [23-25] is based on data recorded in EMUs in hospitals. In [25], 161 RGB videos of seizures were recorded in order to automatically differentiate between two types of seizures. The aim is to achieve the greatest possible variability in seizure patterns. Training, testing, and validation datasets consist of data from different patients. In the videos, the face, the upper body with the head turned, and the right and left hand are detected separately. The 3D pose is derived from the detection of the upper body with head and used as input for an LSTM network, for the face and hands the RGB data is used directly for an LSTM. This shows that the inclusion of different body regions for the classification of the forms of epilepsy makes sense [25]. In [23], epileptic seizures are detected from depth and infrared data using CNNs. The results outperform classical methods and are best when the CNNs are trained separately for infrared and depth data, so late fusion produces the best results. In addition, the system is real-time capable [23].

In [24], when recognizing epileptic seizures, the data set is expanded to include videos of psychogenic non-epileptic seizures (40 epileptic seizures, 10 non-epileptic seizures) that are not to be classified as seizures. On the one hand, a landmark-based recognition is implemented, in which a skeleton model of the patient is calculated and used together with Optical Flow for training an LSTM network. In addition, a region-based detection is implemented, using a shallow CNN as the basis for an LSTM. The use of a deep state-of-the-art CNN is not possible due to the amount of data available, as this is associated with a large number of parameters and overfitting is to be expected. The region-based approach produces better results than the landmark-based approach [24].


The requirements for a system for the visual detection of epileptic seizures are linked to the application of the system. In principle, it can be assumed that only motor seizures can be detected visually; other seizures usually require an EEG to be detected [21]. One application is the tracking of the frequency of seizures in order to be able to adjust the therapy based on this [1]. Likewise, the detection of seizures can be used to trigger an alarm, so that the person affected can be helped if necessary, especially at night [1]. A distinction between focal and generalized seizures is necessary, since generalized seizures pose a particular risk to those affected. Another use case is to support the anamnesis. For this purpose, the type of seizure must be differentiated by the affected side (generalized / focal) and the movements performed, but the duration, frequency and development of the seizure are also relevant [26]. In addition, the circumstances of the seizure (e.g., whether it is a nocturnal seizure and whether the patient sleeps before or after the seizure) may also be of interest to the physician [26].

In order to be able to detect the movements more robustly, the technical inclusion of depth data can be useful. Night-time surveillance is also often necessary [26]. This is possible, for example, with the aid of infrared sensors. Depending on the type of seizure to be detected and the parts of the body affected, there may be different requirements for the resolution of the sensors. This is the case, for example, when fine movements of the face or fingers are to be recorded instead of only coarse movements of arms and legs [26]. In the case of particularly violent and fast seizure movements, the common frame rate of 25 fps may no longer be sufficient and a higher frame rate is recommended [27]. However, this can be associated with problems in storing and processing the data [26], which is particularly relevant with regard to real-time capability. A requirement relevant for those affected is the high reliability of the device. Concrete acceptable values depend on the application. The examination of different studies on the required reliability of systems for seizure detection for alarms in [1] shows that the systems need a sensitivity of over 90% and a low false alarm rate of 0.14 / day or 1 / week. The high sensitivity is important so that as many seizures as possible can be correctly identified [1]. The low false alarm rate ensures that the person concerned and the caregivers are rarely disturbed and unsettled by false alarms [1]. On the other hand, when examining the success of the therapy, the mere recognition of a decrease in seizures is sufficient [1].

Since the seizure data is patient data, the confidentiality of the collection, storage, use and security of the data is of interest to the person concerned [1]. Legal requirements for data protection must also be observed. The detection must be robust against covering by helping people or objects, e.g. blankets. In addition, for detection in the home environment of the person concerned, it is important that the detection is robust to different environments. Realtime capability is an elementary requirement for a visual seizure detection system to alert caregivers [1]. This is less relevant when monitoring treatment success. The data set used can result in a bias in the recognition. For example, unbalanced data sets with regard to skin color or gender can result in discriminatory algorithms [28]. The system must detect seizures reliably and independently of age group, origin and ethnic group, gender and disability. Since poses that can look similar to cramps occur, especially in older or disabled people due to comorbidities, it is important that a system can distinguish these from seizures.


The visually detectable motor seizures are characterized by a high level of complexity and high variability between those affected and the seizures. The use of post-processing data is therefore questionable [29]. The unavailability of larger visual seizure datasets is correspondingly challenging.

The work presented uses seizure videos recorded in a clinical setting, the scope of which is not comparable to large actionrecognition datasets. State-of-the-art action and seizure detection skills are based on neural networks that require large amounts of data for their training. One way to deal with this is to analyze the static pose to determine if a seizure is visible in an image. This is possible because the posture of one or more body parts during seizures often, but not always, differs from everyday poses. However, it is possible that poses resemble spasms caused by other diseases and only movement patterns distinguish them. In addition, characteristics such as frequency and development of the seizure can only be recognized to a limited extent on the basis of individual images. Thus, the inclusion of temporal information seems sensible. The question arises as to whether the use of clinical data can be transferred to use in the domestic context. A higher variability in the actions of the affected person, which should not be recognized, is to be expected outside of the hospital. In addition, there is a higher variability of the environment. It is conceivable that seizures may appear different in the home setting than in the clinical setting, for example, when the patient suffers a seizure while sitting or falls, while clinical data mostly include seizures in the hospital bed.


Recognizing motor seizures can make everyday life and treatment easier for those affected, caregivers and therapists. There are different use cases, from alerting nurses to supporting doctors during treatment. This goes hand in hand with variable requirements in relation to the reliability of the recognition, the performance, as well as the resolution and form of the data acquisition. Overall, however, data protection is an important requirement. The availability of data is currently a problem for the visual detection of seizures, since state-of-the-art methods for action detection, which are mostly based on deep learning, require large amounts of data. Even without using deep learning, a lot of data is required to map the variability of the seizure severity. Even if such a data set from clinical data is available, the question arises to what extent it can be transferred to the domestic area and whether the requirement of reliability can thus be met. In addition, there is a trade-off between the accuracy of the results and the real-time capability of the system. For a high level of accuracy, it may be necessary to analyze different data sources with high resolution, for example to be able to recognize changes in the face. This can further intensify the problem of high computational complexity, which is primarily present in action detection. Building a large data set of epileptic seizures is important for future development in order to improve the detection and classification of seizures characterized by high variability.

Conflict of Interest

No conflict of interest.




Sign up for Newsletter

Sign up for our newsletter to receive the latest updates. We respect your privacy and will never share your email address with anyone else.