Buch “Introduction to Data Technologies” von Paul Murrell unter CC Lizenz verfügbar
Paul Murrell hat ein unter einer CC Lizenz stehendes, knapp 400 Seiten umfassendes Buch mit dem Titel “Introduction to Data Technologies” veröffentlicht (sowohl als HTML als auch als PDF). Er schreibt zu seinem Anliegen:
The basic premise of this book is that scientists are required to perform many tasks with data other than statistical analyses. A lot of time and effort is usually invested in getting data ready for analysis: collecting the data, storing the data, transforming and subsetting the data, and transferring the data between different operating systems and applications.
Many scientists acquire data management skills in an ad hoc manner, as problems arise in practice. In most cases, skills are self-taught or passed down, guild-like, from master to apprentice. This book aims to provide a more structured and more complete introduction to the skills required for managing data.
The focus of this book is on computational tools that make the management of data faster, more accurate, and more efficient. The intention is to improve the awareness of what sorts of tasks can be achieved and to describe the correct approach to performing these tasks and there is an emphasis on working with data technologies via written computer languages.
Ich bin über das Blog von Andrew Gelman darauf gestoßen, der sich an dem Begriff “data technologies” stört und den Begriff “data management” bevorzugt. Wie auch immer, schaut man sich die zentralen Kapitel an, dann wird schnell klar, wohin die Reise geht: Writing computer code, HTML, CSS, Data Entry, HTML Forms, Data Storage, XML, Data Queries, SQL, Data Crunching, R, Regular Expressions.
Paul Murrell ist mir unter anderem als Autor des Buches “R Graphics” bekannt.

