FIT1043
password
icon
TOC
Week 1
The data science process
1, Pitching ideas 提出想法
for data science projects to investors / managers
2, Collecting data 收集数据
getting the data
eg. Researchers preparing to x-ray a patient.
3, Integration 整合
Data can come from many different sources.
4, Interpretation 解释
Data can be described using a database schema.
5, Governance 管理
(i) caring for the data and its subjects.
(ii) managing data standards and formats.
6, Engineering 工程
Data engineers make the back-end work
7, Wrangling 争论
Inspecting and cleaning the data
8, Modelling 建模
Proposing a conceptual / mathematical / functional model.
Analyst building models with his favorite tool.
Analysis, statistics and/or machine learning works on the data.
9, Visualisation 可视化
Visualizing data to interpret it and present results.
Choosing appropriate visualizations for the data. Many different options exist!
10, Operationalize 实施
Putting the results to work.
Standard Value Chain

Week 5
Machine Learning

Supervised ML
All data is labelled and the algorithms learn to predict (infer) the output from the input data.
The goal is to approximate the mapping function so well that when you have new input data (x), you can predict the output variable (y) for that data.
Examples
- Classification: The output variable is a category (e.g. “Red” or “Blue” for the Fish Classification)
- Regression: The output variable is a real value (e.g. “dollars” or “weight”)

Unsupervised ML
All data is unlabelled and the algorithms learn to inherent structure from the input data.
The goal is to model the underlying structure or distribution in the data in order to learn more about the data.
Examples
Example problems
Face similarity detection
- Clustering: Discover the inherent groupings in the data (e.g. grouping customers by purchasing behavior)
- Association: Discover rules that describe large portions of your data (e.g. people that buy X also tend to buy Y)
Example Algorithms
- K-means for clustering problems.
- Apriori algorithm for association rule learning problems.
Loading...