A Brain-Computer Interface (BCI) system is the use of hardware and software to enable a brain to control peripherals and perceive electrical signals. BCI systems are split into 4 phases: preprocessing, feature extraction, signal classifications, and control. In this article, I will be outlining the key roles of each of these processes and how they interact with one another; then giving an example of the incredible impact that this technology can have on people with neuromuscular disorders. 


In machine learning, a feature is an individual property of an observable phenomenon that can be measured. A set of such features can be described by a feature vector which is then used as an input to the machine learning algorithm. Preprocessing is concerned with constructing feature vectors from raw data using preprocessing algorithms. Because so much raw data is often involved in this stage, subsampling (using a subset of the sample to act as inputs to the algorithms) is used to minimize cost and reduce computation; however, it’s important to note that this compression doesn’t result in the loss of any important information and is only done to reduce overhead. This process is called feature selection and is done because the data set often contains redundancies and irrelevancies that add to the dimensionality (the number of features) of the feature vectors and thereby lengthen training time (the length of time spent feeding the algorithm with pre-classified data to enable it to learn and adapt to future data sets). Another bonus of feature selection is it makes debugging and interpreting the code much easier for users/researchers. 

The term preprocessing spans a large number of validation stages to ensure that the data used is accurate and relevant, but the output is always a set of feature vectors that can be mapped onto a plane with the same dimensionality as the feature space (the number of features used to characterize your data e.g. height, weight, gender). Now that we’ve defined feature space, we can note that the validation stages are prioritized with the task of reducing the dimensionality of the feature space (i.e. the number of variables its dependent upon) so as to further reduce unnecessary computation. 

Feature Extraction 

Feature extraction is concerned with reducing the dimensionality of the data; however, it uses algorithms that operate in a different way to feature selection. These algorithms repeatedly manipulate the data that was outputted by the preprocessing stages to find a representation that uses fewer variables and is still sufficiently accurate. This is done because the classification stages of a machine learning algorithm should work for varieties of test data and as the dimensionality of the feature vectors is increased, the classification algorithm becomes less generic and so produces a correct classification for fewer test data sets. This ultimately defeats the purpose of machine learning as the solution should be adaptable to many scenarios and the use of specific classification algorithms that work for a select few of the possible data sets hinders this ability. One common algorithm used in feature extraction is Isomap which is an isometric mapping method that incorporates the geodesic distances (shortest distances between two vertices while maintaining a constant radius from the origin). It works in the following manner: 

  • Determine the K nearest neighbors of each vertex (maintaining fixed radius – geodesic) 
  • Construct a “neighborhood” graph (computational data structure used to hold the weight and direction of routes between neighboring points) 
  • Use Dijkstra’s algorithm to compute the shortest path between two nodes 
  • Conduct a multidimensional scale to reduce dimensionality (many techniques can be used at this stage) 

Dijkstra’s algorithm python representation: 

A screenshot of a social media post

Description automatically generated

The code above uses an input graph to create dictionaries describing all the neighbors of each node and the weighted values between them, and assigns these dictionaries to the tuple “initial_state”. The first node visited is the data held by the variable ‘start’ whose distance is set to 0 and ‘previous node’ to none. The program then repeatedly carries out the following process until all nodes have been visited: find node with smallest distance from start, remove it from unvisited array and find all its neighbors. For each of its neighbors, check whether the route via the current node is faster than the routes via any of the previous nodes, if so, update its distance. Finally, return the route taken (set of nodes) and the total distance travelled from the start to the end. Dijkstra’s algorithm is particularly efficient because once the dictionary has been produced, the shortest route for any vertex can be found easily without performing the whole algorithm again. 

Signal Classification 

Classification is the broadest term so far and describes any technique that can be used to predict the class/category of given data points. The most common example of a classifier is a spam detection program in email service providers which is used to filter spam emails out of your inbox and move them to the spam folder. This phase is concerned with using the results from training to determine what class the input data might belong to. The two types of classifiers are lazy and eager learners. Lazy learners compare the test data to the actual training data and output the most similar result whereas eager learners create a classification model based on the training data with can be used with all test data and incorporates all the characteristics of all the training data for every piece of test data. 

A simple classification algorithm is a decision tree where the only output tends to be a true or false statement and multiple questions are asked in order to determine which category the subject lies under. Modern classification algorithms are concerned with multiple possible states (e.g. brain states) and so decisions are made probabilistically. An example of a probabilistic classifier is Naive Bayes which uses Bayes’ theorem and certain assumptions about the data to produce the most probable state of the data point. 


Finally, the control phase is the branch of machine learning concerned with changing the environment/real world in such a way that the signal classification will be changed into a desirable state. This phase, often referred to as intelligent control, uses artificial intelligence techniques such as neural networks and evolutionary computation to discern what inputs are required to get the desired output and in turn makes changes to fit these needs. 

Impact of BCI 

On April 23rd, 2020, a team of researchers in the Ohio State University Wexner Medical Centre announced that they had successfully restored sensation to the hand of a 28-year-old participant with a severe spinal cord injury using a BCI system. The system enhances weak neural signals thereby improving motor ability and perception. The system uses haptic feedback, commonly known for its use in phones and controllers, to amplify signals from the skin to the brain. This was only possible because some fibers of the patient’s spinal cord remained intact. There was a signal being transmitted via the central nervous system, but the signal was so weak it wasn’t perceived by the brain. This is the case with the majority of spinal cord injuries and so this technology has a huge potential for changing the lives of many and boasts new possibilities for the future of this field of data science. 

About the author