Machine Learning: What are the Key Ingredients?
Posted on November 22, 2023 by Zach Strout
One of the biggest trends in biomechanics recently is the widespread use of machine learning. In this blog post, we introduce fundamental aspects of machine learning. Before we move on, let's get a grasp on what models are and the difference between machine learning and physics-based models.
Physics-based vs Machine Learning Modeling
The term "model" can mean many different things. In science and engineering, it can be used as a synonym for the word "relationship". By modeling, we find the relationship between one or more things and another thing. For example, an OpenSim model can find the relationship between body movements with ground reaction forces and muscle forces. Even complicated models like OpenAI's GPT-4 are just relationships of which words typically go together. Models can also be really simple. You could model weight loss as the difference between calories expended during exercise and calories consumed through food even though weight and weight loss is potentially much more complicated.
We can divide models into two different categories: physics-based and machine learning. Physics-based models are constructed from known laws of physics. A bunch of small relationships like between force and acceleration can be combined into one big relationship like between IMU signals and foot strike angle. Kalman filters rely on physics-based models for predicting how the state or system transitions from one moment to the next. Physics-based models can be extremely powerful. Since they are constructed from known relationships, they are inherently explainable. If you have complete or thorough knowledge of something that you want to model, the physics-based model will be difficult to beat and should work for any population of subject or patient. While it is always important to validate any model with data, physics-based models don’t require any data to construct or train the model. Some of the major problems with physics-based models are that they require more complete knowledge of the process that you want to model, and they require experts to craft the model. If they are not modeled correctly or there is too much uncertainty in the modeling parameters, the performance could also be really poor.
Machine learning on the other hand start with data. Instead of requiring a deep knowledge of the physics involved, machine learning can find relationships from data alone. It does this through a combination of statistics, curve fitting, and pattern matching. This can be incredibly useful in cases where information used for modeling is lacking. For example, if you wanted to estimate step length based on a wrist-mounted IMU embedded in a smartwatch, many parameters needed to create a physics-based model are missing or difficult to get. You could use user demographics to estimate body segment lengths, but these would lead to estimations on estimations. The machine-learning way would simply use the data to directly estimate the step length without focusing on the physics principles. Since the estimation happens directly, there is less logical connection in the relationship which is a strong reason why many people are uncomfortable with using them.
Data, Features, and Models
Let's break down each one of the three parts of machine learning: Data, Features, and Models
Data
Machine learning starts with data. By data, we mean the combination of both the raw data and the labels. We can dig into each one with more detail.
Raw Data
Raw data is a collection of values with no inherent meaning. A CSV file with just rows of numbers is the epitome of raw data. This raw data can get meaning by interpretation, which is partially what you do with physics-based models or by assigning it meaning with labels. While assigning meaning to the data by interpretation is not necessarily bad, it is just not what we want if we want an algorithm or model that can work without the influence of people. With the ubiquitous use of sensors and digitization, data is incredibly easy to get. Most smartphones have IMUs in them that can produce motion data. Smartphone videos are also a kind of raw data. We also have extraordinarily complex means of getting data like X-rays, MRIs, or CT scans in the form of images or 3d models. But without interpretation or labels, there is not much practical use for this data since there are no relationships between the data and anything external to the data.
Labels
The meaning we give directly to the raw data without interpretation is called the label. By saying that a picture has a cat or dog or Aunt Sue in it, we are giving it a label. One of the things that should be mentioned is that while data can be quick, cheap, and easy to collect, getting the labels can be incredibly difficult. If you want to apply machine learning to a biomechanical parameter, you will need to measure that parameter. This might involve motion capture systems and force plates. This type of label would be an automatic label since it does not require a person to assign it a label; it is algorithmically generated. For X-rays, you need to pay a doctor to generate the label "broken" or "not broken". This type of label would be manual labeling since a person needs to manually assign it into a category. Since the meaning of the data for machine learning is wrapped up strictly in the label, it is important to get the best possible labels.
There is an inherent simplification that happens from the “true label“ to the label useful for practical use. Let's say that we have a machine learning model where the data comes from an IMU on the foot. For the model, let's say that we are determining the foot strike pattern. Typical labels could be "midfoot", "forefoot", or "rearfoot", but these are extreme simplifications of exactly what is happening. More appropriate labels might be "22-year-old, healthy, male college student midfoot strike" - or more extreme, and accurate, "John, who is a 22-year-old, healthy, male college student midfoot strike at 2:43 pm on March 12th, 2023 in the biomechanics lab at X State University using a Vicon Motion capture system running version x.y of the Vicon Nexus software and using a Bertec model X instrumented treadmill ...". This would naturally be ludicrous to use as a label since many of the aspects of the “true label” have a minor effect and having all unique labels would not be useful for classification, but it is important to keep this "true label" in mind when trying to use the model. If you are using a model trained on data with a "true label" of "22 year old, healthy ... midfoot strike" for someone who is 89 years old with Parkinson's, there is a good chance that the model might have bad performance.
Features
With the basics of raw data and labels, we can move onto an important intermediary between the data and the labels - features. Features are extracted from the data and are used by the model. As a lens focuses light, features are used to "focus" aspects of data. In our everyday lives, we extract features automatically without any second thought. For instance, we look at a person and "extract" aspects of them such as fundamental attributes of the person, whether they are smiling or frowning or other postures, and other things and use these to predict outcomes of interaction with them - all this happens mostly in our subconscious. If someone is frowning, we use this feature to predict that our interaction with them might be not great. Another example of features is if we want to distinguish between pictures of cats and dogs, some of the features that could be extracted from pictures could be the shape of the ears (cats typically have more pointy ears) or the shape of the pupal (cats have vertical slit pupils while dogs have round pupils). In a way, the relationship defined by the model is between features and the labels instead of from the raw data and the labels, but there is an exception when we look at a specific type of machine learning. Some examples of features in models that use them include subject demographics like height or weight, basic statistics performed on a window of data, or even parameters calculated with physics-based methods.
Models
Lastly, the model creates a mathematical relationship between the features and the labels. The type of the model depends on what the label is. Models for classification use labels with discrete values. Regression models use labels with continuous values. So if you want to estimate the foot strike index, you would use a regression model while if you want to classify the foot strike pattern, that would be a classification model. In reality, there are very small differences between these two types of models. Many times, classification models are just regression models where they output N probabilities where N is the number of unique labels with the softmax function. When you run a classification model, you only select the most probable of the N probabilities. The two types of machine learning models that we will briefly discuss are traditional machine learning models and deep learning models.
Traditional Machine Learning Models
Traditional machine learning models behave exactly how we have laid out the relationship. First, you extract the features from the data. Then you use the features in the machine learning model. These two actions are separate. Traditional machine learning models require something called feature engineering where you determine what features to include in the model. Typically, you will include many features and then remove features that are not important to the performance of the model. As you can imagine, this is a pretty tedious process that requires a lot of skill. While less interpretable than physics-based models, these still have some interpretive power since you might be able to see relationships between known features and predicted labels. Some samples of traditional machine learning models include linear models such as linear regression or linear discriminant analysis (LDA) for classification, support vector machines (SVM) for both regression and classification, decision trees or ensemble models such as random forest, and in some cases, artificial neural networks.
Deep Learning Models
The need for feature engineering can be eliminated with deep learning models. Deep learning models combine the feature extraction part with the classification or regression part of machine learning. Since these models can automatically find what features work best for the data, they can be extremely powerful while also being complete "black boxes" - without any intuitive meaning, rationality, or understandability. There are quite a few arguments about what specifically defines "Deep Learning", but it might be helpful to determine deep learning by the input of the model. If the input is raw data, then it is deep learning. If the input is features, then it is traditional machine learning. In some cases, you can use artificial neural networks for deep learning if there is sufficient complexity with them. Another common type of deep learning model type is a convolutional neural network (CNN). This combines convolutions which extract the features and a neural network that uses the features for classification or regression. For time series data, there are recurrent neural networks (RNN) and more complicated variants.
You can see a diagram of how raw data, labels, features, and machine learning models work together in the diagram below.
Supervised vs Unsupervised Learning
There are actually two major types of machine learning: supervised (most common and what is described in the figure above) and unsupervised. Supervised learning looks for relationships between the data and the label. Unsupervised learning typically looks for relationships among the data (do any of the data “clump“ together?). Since unsupervised learning does not use labels, it could be really useful for activity classification for crowd-sourced IMU data from smartphones since that data probably does not have reliable labels - if any. Some popular machine learning models for unsupervised learning include k-means, DBSCAN, and OPTICS.
How to Validate Machine Learning Models
The question "If many machine learning models are just black boxes, how do we know if they work" is valid and important and one that has an answer - validation. Validation of machine learning models simply requires not using some of the data and labels for training the model. This data gets fed back into the model, and the output of the model is compared to the labels in the test set of data. Before we discuss different validation methods, it is important to understand how the organization of the data and labels affects the performance of the model. To best illustrate this, let's look at an example that involves three researchers working with the same set of data. The set of data includes 9 subjects with 9 trials of data each. They are all using identical features and models. The researcher will need to select what data is used to train the model and what data is used to test the model.
Tom is an undergraduate who is volunteering in the lab. His analysis of the data involves randomly shuffling the data while maintaining the correct relationship between the raw data and the labels. He then took 78 percent of the data for training and 22 percent for testing. He gets really good results and is pleased.
June is a master's student. Her analysis of the data involves taking the first 7 subjects as training and the 2 others as testing. Her results are not as good as Tom’s, but she is uncertain about why.
Emma is a PhD researcher about to graduate. Her analysis of the data involves using 8 subjects to train the model and 1 subject to test. She does this in a revolving fashion so that each subject is used to test. In the end, she has 9 measures of accuracy that she can average or find the max or min of. She can see that some of the subject's accuracy is good while others are worse. On a deeper investigation of the data, she determined that the demographics of the subjects with poor accuracy were significantly different from the demographics of the other subjects.
While these three people had the same data, features, and models, they got very different results. The goal of machine learning should be to generalize or to find deeper patterns or more fundamental relationships. If a model is given features and labels, it will learn both the subject-specific attributes and general patterns in the data. The subject-specific patterns refer to those of the specific trial or subject while the general patterns are related to the underlying relationship. When you go to validate the model, using data from the same trial or subject will give a much higher accuracy due to the model learning specific, subject-related patterns from the data. For example, a journal article about activity classification with IMUs found that these models can have results that look like they are 40% more accurate while being actually more than 55% worse. This is what Tom did. These are often called subject-specific machine learning models. For some applications, there are acceptable or even the best that can be done. For example, EMG data is typically highly dependent on individuals. Also, there might be cases where the labels are not hard to get, so if you need to collect data and labels for another subject and retrain the model, it is not too much of a hassle.
A better way of splitting the data involves what June did. Splitting the data by subject creates a subject-independent or universal model. This type of splitting is simple and does not require a lot of computation, so for complicated models or many subjects, this might be the best way to validate the model. There are some drawbacks though. There is a chance that you are just lucky when selecting the test subjects. You also can't run any statistics on how your model might do with more data.
The best way of validation is leave-one-subject-out cross-validation which is the process that Emma did with her dataset. This creates a universal model since the dataset is split by subject. Also, the accuracy of the scores can be used to further understand the performance of the model. By relating the cross-validation scores to other aspects of the subjects like weight or height, you can see how these affect the performance which might in turn mean that you should include these parameters in your model. Training the model N times for complicated models or for many subjects will be usually take a lot of time, so there are simplifications like k-fold where the data is partitioned into k partitions. For example, Emma could also use 3-fold cross-validation where two partitions (6 subjects) would be used to test and 1 partition (3 subjects) would be used to test.
Find Out More
Want to learn more? Check out these interesting and useful papers and resources related to machine learning:
Scikit-learn is a Python package that is all about machine learning and data science. It has many great examples and sample code.