Nesime Tatbul Bitim: Catalogue data in Autumn Semester 2009

Name	Dr. Nesime Tatbul Bitim
Field	Informatik
Department	Computer Science
Relationship	Assistant Professor (Tenure Track)

Number	Title	ECTS	Hours	Lecturers
251-0807-00L	Information Systems Laboratory In the Master Programme max. 10 credits can be accounted by Labs on top of the Interfocus Courses. Additional Labs will be listed on the Addendum.	10 credits	9P	M. Norrie, D. Kossmann, N. Tatbul Bitim
Abstract	The purpose of this laboratory course is to practically explore modern techniques to build large-scale distributed information systems. Participants will work in groups of three or more students, and develop projects in several phases. The course is offered in both Fall and Spring semesters.
Objective	The students will gain experience of working with technologies used in the design and development of information systems.
252-0201-00L	Information Systems	6 credits	3V + 2U	N. Tatbul Bitim
Abstract	The course extends the basic concepts of relational data management introduced in an earlier course to examine the internals of the architecture, implementation, and optimization of a relational database system. These include indexing, query processing and optimization, transactions and recovery, and performance tuning and benchmarking.
Objective	The goal of this course is to understand the internals of the architecture, implementation, and optimization of a relational database system.
252-3001-00L	Advanced Topics in Information Systems	2 credits	2S	D. Kossmann, M. Norrie, N. Tatbul Bitim
Abstract	This seminar course will discuss research topics in the area of information systems. We will read recent research papers on a selected topic, and present/discuss them in class. The course is offered every Fall semester.
Objective	The goal is to introduce students to current research, and to enable them to read, understand, and present scientific papers.
252-3500-06L	Information and Communication Systems Does not take place this semester.	2 credits	2S	G. Alonso, D. Kossmann, T. Roscoe, N. Tatbul Bitim
Abstract	The seminar deals with a current topic in distributed information systems. Students are expected to attend the entire seminar, choose a topic for presentation (may be either a collection of research papers or describing a system and/or evaluating a concrete product). Students are evaluated in the knowledge gained, the presentation made and the report they will present at the end of the semester.
Objective	In this edition (HS 2008): The seminar course will look at new architectures for data processing systems brought about by recent trends in hardware design such as multi-core and parallel processing.
263-3000-00L	Massively Parallel Data Analysis with MapReduce Does not take place this semester.	5 credits	2V + 2A	D. Kossmann, G. Alonso, T. Roscoe, N. Tatbul Bitim
Abstract	The purpose of this course is to teach students how to carry out massively parallel data analysis using MapReduce as the programming abstraction and Hadoop on top of a (large) cluster of machines in order to get hands on experience and solve real problems.
Objective
Content	Many applications involve the processing and analysis of huge amounts of data. Typical examples are Web-scale search engines (such as Google, MSN, or Yahoo), new Web applications such as Flickr or Google Maps, and scientific applications (e.g., in the life sciences or physics). A typical analysis of this data would, for instance, detect certain behavior patterns in a Web log or the detection of star constellations in telescope images. Given the amounts of data that need to be analyzed, parallelization on large clusters of machines is a must in order to get acceptable response times. The idea is to partition the data into "chunks" and process a large set of chunks in parallel. The first large-scale implementation of this idea on thousands of machines was implemented by Google using the so-called MapReduce paradigm. MapReduce is a programming framework designed for the analysis of masses of data. Its implementation makes use of the Google File System (GFS) which is a distributed file system designed to store peta-bytes of data on thousands of machines. Recently, Yahoo and the Apache Foundation launched an open-source implementation of MapReduce and a distributed file system. This implemenation is called Hadoop and has been shown to scale up to 2000 machines. Google is establishing a data center for Academic use with 1000 machines that operates using Hadoop. This data center can potentially be used to run programs as part of this course. The purpose of this course is to teach students how to carry out massively parallel data analysis using MapReduce as the programming abstraction and Hadoop on top of a (large) cluster of machines in order to get hands on experience and solve real problems. The course will have two parts: a.) Six week of classes in order to understand the underlying technology (distributed file system, scheduling in warehouse-size data centers, and the Sawzall programming language used in the MapReduce framework). b.) Projects: solving a big data analysis problem (e.g., Web log mining, discovering intelligent life in space, etc.)