Basics of Machine Learning (COMP30027 W1 L2)
Basic Framework
- Instances
- Input to a machine learning system. individual, independent samples of the world. (= examplars, observations)
- Composed of:
- Attributes (= features): measured aspects of each instance
- Concepts : Anything we aim to learn; often in the form of labels
- Examples of concepts
- Discrete class labels (classification): categorizing things into finite classes
- Numeric output (regression): weight, offset
- Clusters: identifying data structure, forming groups
- Probability of an event: imperical, applying relative frequency
- The most likely order of events
- A sequence of commands
- A complex model
- …
- Generalisation
- Learning a function that maps attributes to concepts, concept = f(attributes)
- Purpose: Return the concept for any set of attributes, including new sets
Example: weather dataset
- Supervised vs. unsupervised
- Supervised
- Receive labelled instances during training (attributes are labeled by the concept)
- Learn the association f in concept = f(attributes)
- Unsupervised
- Receive unlabelled data and learn both the concept and the function only from the attributes
- Discover structure in a dataset (correlated features, groups, sequences, etc.)
- Discover latent variables that explain patterns in the observed instances
- Reduce dimensionality for a supervised learner (first stage of supervised learning)
- Example: google search keyword
- Receive unlabelled data and learn both the concept and the function only from the attributes
- Supervised
- Supervised train & test
- Goal: learn mapping from attributes to concepts (concept = f(attributes))
- Three steps
- Training: model sees many examples of attributes-concepts pairs
- Model learns a trained function f() which produces a concept (i.e. probability distribution)
- Testing: model sees a new set of attributes, predicts concept
- Evaluation: compare prediction to ground truth
- Probability models: see if future samples from the same distribution are well-predicted by the model
- Find error probability (frequency of mistake)
- Training: model sees many examples of attributes-concepts pairs
- Example
- Training: 20 images of aminals and non-animals are given
- Testing: new images are given to be labeled.
- Evaluation: error probability is measured.
- Association learning
- Detect useful patterns, associations, correlations or casual relations between attributes or between attributes and the concept
- A “good pattern” is:
- A combination of attribute values where the presence of certain values strongly predicts the presence of other values
- Any kind of structure is considered interesting and there may be no “right” answer
- Evaluation can be difficult, potentially many possible association rules in one dataset
Levels of analysis
Framework for understanding information-processing systems (cognitive science)
Marr’s level of analysis and machine learning framework:
- Computational level
- What am I looking for?
- What is the goal of this system?
- Finding the model:
- What structure does this ML model expect to see in the world?
- What rule/pattern/model/etc. explains the data?
- Algorithmic level
- How will I do it? (i.e. drawing a circuit)
- How do you achieve the goal?
- Algorithms and data structures
- Finding the best fit of the data:
- Usually involves minimizing an error or loss function
- Implementational level
- Physical implementation (i.e. building a circuit)
- Python code writing:
- How to find that best fit in finite time?
- Not always possible to solve exactly
Example: linear regression
- Computational: looks linear - try fitting a line
- Algorithmic: Linear regression, minimize square error by turning the slope
- Implementational: Linear algebra or gradient descent
- Even when models have the same goal (find clusters), they make very different assumptions which leads to different results
- Changed parameter (assumptions) → different results
- Fewer assumptions != better model
- Models that make some assumptions to simplify the problem may find a better result
