Big Data Management and Analysis in Linux

The growing availability of extremely large datasets requires scientists and analysts to use powerful supercomputers or computer clusters to store, manage, and analyze these data. These clusters typically run on Linux, which requires some programming skills and insights into suitable software packages. Our course will introduce you to programming in a Linux environment, teach you how to efficiently manage very large datasets (e.g. using sed, awk, and grep commands) and create simple shell scripts to analyze your data (e.g. using a Linux version of the freely available statistics program R). You will also learn how to visualize your data and results in customized plots and figures. These skills are extremely valuable for scientists from all disciplines as well as for business practitioners (e.g. consultants or financial analysts) who are planning to work with big data.

Session 14 July to 18 July 2020
Course levelAdvanced Bachelor/Master, open to PhD staff and professionals
Co-ordinating lecturers                
Dr. Aysu Okbay
Other lecturersRichard K. Linnér
Forms of tuitionInteractive seminar, practicals
Form of assessmentProgramming assignments
Credits3 ECTS
Contact hours45 hours
Tuition fee€1150, read more about what's included
Additional
Accommodation and social programme
How to apply
Find our application form here

Scientists and data analysts from all disciplines, as well as business practitioners (e.g. consultants or financial analysts) who are planning to work with big data. If you have doubts about your eligibility for the course, please let us know. Our courses are multi-disciplinary and therefore are open to students with a wide variety of backgrounds.

The course will be fairly technical, combined with many computer tutorials. There are no entry requirements other than a willingness to learn about programming Linux, but a decent background in statistics, mathematics, and pr.

The format of the course is three hour lectures in the morning, followed by two hours of supervised work in computer tutorials in the afternoon. Both the lectures and tutorials will be held in a computer room. The lectures will be interactive, with short examples that allow students to apply the introduced concepts. In the tutorials, students will get more hands-on training in a supervised environment with exercises covering the day’s topics, and they will have the opportunity to work on the assignments. The computer room will stay open to students for self-study after the tutorials.

Students are not required to bring their own laptops, but they are allowed to do so if they wish to work on their own computers.

By the end of this course, the student should understand and feel comfortable with:

  • Basic Linux programming
  • The Unix philosophy and environment; files, processes, pipes, filters and basic utilities
  • Login and logout procedures
  • File transfer between systems
  • Text file manipulation with sed, awk, cut, paste, cat, etc.
  • Basic text editing using the vim editor
  • Automation through functions, control structures and shell scripts
  • Version control with Git
  • Working with R through the UNIX command line 
  • Plotting in R

Visit to the SURFsara computer facilities at Amsterdam Science Park.

To be announced.

Do you want to make the most out of your summer? You can combine this course with the following courses in session 2: