Fundamentals of Data Science
| Module title | Fundamentals of Data Science |
|---|---|
| Module code | EMGM001 |
| Academic year | 2025/6 |
| Credits | 30 |
| Module staff | Dr Tim Hughes (Lecturer) |
| Duration: Term | 1 | 2 | 3 |
|---|---|---|---|
| Duration: Weeks | 11 |
| Number students taking module (anticipated) | 50 |
|---|
Module description
This module develops core skills in data science, modelling, and essential programming skills. The ability to extract information from data as a basis for evidence-based decision making and policy is becoming increasingly important across a wide variety of sectors in the world of big data, including climate, health, technology, and the environment. This module will equip you with the tools required to collate, import and manipulate data, together with methods for inference. You will be introduced to different types and sources of data and the tools for performing data analysis, from producing informative graphical summaries to generating sophisticated visualisations. These techniques are crucial both as the basis for communication and for informing complex modelling. This will be placed in a contemporary and cutting edge setting through the use of locally curated and global open source datasets, and will draw on the flexible and freely available programming environments of Python and R.
Module aims - intentions of the module
This module aims to equip you with the skills that are required to collect, collate, process, manipulate, analyse and interpret data effectively and efficiently. You will be introduced to techniques for importing data from a range of sources into the format that is appropriate for many data types and their further processing and analysis. ?You will learn how to merge information from multiple sources to develop greater insight, and you will learn how to pre-process data to enable the effective application of analysis techniques. This will include data cleansing, handling of missing, corrupted, uncertain and/or biased data, and the graphical representation of data. You will develop an appreciation of these concepts, and the ways in which their effects might be mitigated. This will enable you to communicate possible issues with the analysis of data when writing reports and making recommendations based on statistical analyses.
?
This module will also equip you with the skills that are needed to perform a range of data science and statistical analysis techniques, and to understand and interpret their outputs. This will include an introduction to the mathematical and statistical techniques underpinning data science, familiarisation with the open source scientific computing languages R and Python, and an overview of supervised and unsupervised machine learning methods.
?
You will be encouraged and supported to develop your data science skills alongside your specialism, exploring datasets relevant to ecology; evolution; environment; sustainability; and/or renewable energy. Activities will include data wrangling, data analysis, report writing and presentation. Assessments will be based on a series of practical examples using real-world data examples that aim to demonstrate the full range of skills required to make effective use of data.
Intended Learning Outcomes (ILOs)
ILO: Module-specific skills
On successfully completing the module you will be able to...
- 1. Demonstrate the ability to import, manipulate and summarise data, including an understanding of the relative merits of different methods of formatting;
- 2. Demonstrate an understanding of how data source and way of collection effect subsequent data analyses;
- 3. Demonstrate effective use of Python and/or R/RStudio to facilitate data wrangling and data analysis;
ILO: Discipline-specific skills
On successfully completing the module you will be able to...
- 4. Demonstrate effective and efficient data processing and programming skills;
- 5. Demonstrate competencies of data visualization;
- 6. Demonstrate an understanding of the methodology and practical use of a range of data analysis techniques, including unsupervised and supervised machine learning and statistical modelling methods;
- 7. Demonstrate an understanding of common pitfalls in data processing and analysis and how to avoid them;
- 8. Demonstrate appreciation and understanding of relevant datasets in application areas;
ILO: Personal and key skills
On successfully completing the module you will be able to...
- 9. Data and statistical analysis skills;
- 10. Use of Python, R/RStudio and other software;
Syllabus plan
The precise syllabus may vary slightly from year to year, and the below is provided as an indication of the typical content.
- Data collection, pre-processing and communication:
- Cleansing;
- Visualisation;
- Handling missing, corrupted, uncertain and/or biased data;
- Effective programming:
- Coding in R/R Studio and Python;
- Computer Hardware;
- Version control, collaborative and high performance computing;
- Reproducible programming;
- Analysis:
- Fundamentals of probability, linear algebra and calculus;
- Fundamentals of statistical modelling;
- Sampling and sampled data;
- Inference, confidence intervals, and hypothesis testing;
- Regression analysis and model selection;
- Spatial-temporal and hierarchical models;
- Introduction to machine learning: supervised methods (e.g., classification and regression) and unsupervised methods (e.g., clustering and dimensionality reduction);
- Application areas:
- Datasets for ecology and evolution: populations, infectious diseases, biodiversity, genetics;
- Datasets for renewable energy: solar, wind, marine (resource and generation data), electricity/heat consumption, smart grid;
- Datasets for environment and sustainability: sustainable development indices, health, weather and climate, land and marine pollution.
?
The assessment structure on this module is subject to review and may change before the start of the new academic year. Any changes will be clearly communicated to you before the start of term and if you wish to change module as a result of this you can do so in the module change window.
?
Learning activities and teaching methods (given in hours of study time)
| Scheduled Learning and Teaching Activities | Guided independent study | Placement / study abroad |
|---|---|---|
| 60 | 240 |
Details of learning activities and teaching methods
| Category | Hours of study time | Description |
|---|---|---|
| Scheduled learning and teaching activities | 30 | Lectures and tutorials |
| Scheduled Learning & Teaching activities | 30 | Hands-on practical sessions |
| Guided Independent Study | 120 | Self-study and background reading |
| Guided Independent Study | 120 | Assessed data analyses, quizzes, report writing and preparation for presentations |
Formative assessment
| Form of assessment | Size of the assessment (eg length / duration) | ILOs assessed | Feedback method |
|---|---|---|---|
| Exercises | Several quizzes/ exercise sheets | 1-11 | Oral, during tutorial sessions |
| Practicals | Several practical sheets for self-directed and guided learning | 1-11 | Oral, during tutorial sessions |
Summative assessment (% of credit)
| Coursework | Written exams | Practical exams |
|---|---|---|
| 100 | 0 | 0 |
Details of summative assessment
| Form of assessment | % of credit | Size of the assessment (eg length / duration) | ILOs assessed | Feedback method |
|---|---|---|---|---|
| Exercises | 50 | Several quizzes/ exercise sheets (4 expected) | 1-11 | Written, oral or automated feedback |
| Report | 50 | Approx. 10-15 pages | 1-12 | Written |
Details of re-assessment (where required by referral or deferral)
| Original form of assessment | Form of re-assessment | ILOs re-assessed | Timescale for re-assessment |
|---|---|---|---|
| Exercises | Coursework | 1-11 | To be agreed by consequences of failure meeting |
| Report | Coursework | 1-12 | To be agreed by consequences of failure meeting |
Re-assessment notes
Deferral – if you miss an assessment for certificated reasons judged acceptable by the Mitigation Committee, you will normally be either deferred in the assessment or an extension may be granted. The mark given for a re-assessment taken as a result of deferral will not be capped and will be treated as it would be if it were your first attempt at the assessment.
?
Referral – if you have failed the module overall (i.e. a final overall module mark of less than 50%) you will be required to resubmit the original assessment as necessary. The mark given for a re-assessment taken as a result of referral will be capped at 50%.
Indicative learning resources - Basic reading
Basic reading:
?
- James, G., Witten, D., Hastie, T. and Tibshirani, R., An introduction to statistical learning, Springer, 2013.
- Rogers, S. and Girolami, M., A first course in machine learning, CRC Press, 2016.
- Murphy, K.P., Machine learning: a probabilistic perspective, MIT Press, 2012.
- Hastie, T., Tibshirani, R. and Friedman, J., The elements of statistical learning: data mining, inference, and prediction, Springer, 2009.
- Bishop, C.M., Pattern recognition and machine learning, Springer, 2006.
- G¨¦ron, A., Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems, O'Reilly Media, 2019.
- Raschka, S., Python machine learning, Packt Publishing Ltd., 2015.
?
Web-based and electronic resources:
?
- ELE – College to provide hyperlink to appropriate pages
?
Other resources:
?
- Recent articles and open-source codes provided by the tutors.
| Credit value | 30 |
|---|---|
| Module ECTS | 15 |
| Module pre-requisites | none |
| Module co-requisites | none |
| NQF level (module) | 7 |
| Available as distance learning? | No |
| Origin date | 01/05/2025 |