R или Python — бесплатные книги для изучения

The one thing they love more than a hero is to see a hero fail, fall, die trying. In spite of everything you’ve done for them, eventually, they will hate you [Spider-Man].

– Green Goblin / Norman Osborn

R Vs. Python - by Roopam

Batman v Superman: Dawn of Justice will be released in March 2016. It will be great to see these two superheroes battle it out against each other on the screen. Both these superheroes were introduced through comic books in the late 1930s by DC Comics. Both of them fight crime and criminals. However, in over 75 years they have developed into characters that are contrasting to each other. They are as different as day and night. Superman represents the bright, sunny side of life while Batman the dark, chilling nights. Notably, Superman gets his superpowers from the sun while fear of bats and dark nights are the source of power for Batman.

R vs Python – Superheroes

Let us continue with the theme of contrasting superheroes with a common mission. This time we will make the superheroes for data scientists compete against each other – R vs Python. The idea for this article is to explain the superpowers of both R and Python, and also to suggest books to learn them. Most of these books are available online for free for the purpose of evaluation, and I will share those links here. To explain superpowers of R and Python let me create a few connections between them and the DC Comic superheroes.

You may find it unusual but I see a few similarities between R and Batman. Moreover for me, Python and Superman have some things in common as well. Let me create a table to list these similarities.

Analysis Tool Similar Superhero  Super Powers in Common
RR Batmanbatman
  • Detective Work
  • Intelligence
  • Cunning
  • Usage of Tools
  • More Brain than Muscles
Pythonpylogo SupermanSuperman
  • Muscle Power
  • Super Strength
  • Elegance
  • Wide Range
  • More Muscles than Brain

Let me try to explain the reasons for these distinctions between R and Python in the next segment. Also, let us figure out a good approach for data scientists while using these languages.

R vs Python / R and Python : Which is a Good Approach?

Both R and Python are open source and free to use high level programming languages. R is specifically developed for statistical computing. It has plenty of  add-on packages / tools to support machine learning and data analysis. On the other hand, Python is a general purpose and powerful programming language with special applications in data preparation, data munging, and data analysis.

This distinction is also the reason for different communities of analysts to prefer either of these languages. Python is often preferred by computer-programmers trying to develop skills in number crunching and analysis. On the other hand, R is preferred by mathematicians and statisticians. This difference is glaring in the learning resources (books and online) for these languages. For instance, consider the following four books for R available online for free (click on the books to read them for free). 

YOU CANalytics Book Rating 5 Stars (5 / 5) – for all the 4 books mentioned below

Elements of Statistical Learning

An Introduction to Statistical Learning


Doing Bayesian Data Analysis 1

All these books are high quality statistical texts with R as the preferred language. These are just a few examples. Please note, the first book is not for R, but is by the same authors as the second book. You will rarely find books of this nature with Python as the preferred language. Hence, R is much better equipped to tackle data mining and statistical analysis related problems. On the other hand, Python provides great applications to work with unstructured and complicated data-sets like images, written text (web, emails, etc.), genomics, sound etc.

In essence, Python and R together complete the toolkit for a data scientist. Hence, for a pragmatic and application oriented data scientist it is essential to understand the super-powers and qualities of both these languages.

R Qualities Python Qualities
Use R for analysis, data visualization, and modeling

  • Offers great flexibility for analysis
  • R makes it is easy to think while doing your analysis
  • Constant upgrades and enhancements of analysis packages because of highly active community in statistics and mathematics
  • Exceptional data visualization tools
Use Python for data preparation, data munging especially for unstructured data like web, images, text etc.

  • Great flexibility and ability to extract information from free text, websites, and social media sites
  • Good with mining images and prepare data for analysis
  • Can handle large volume of data better than R

For a serious data scientist, it is a good idea to have some functional knowledge of both R and Python. Hence, a practical approach is to think of them together as R and Python – instead of R vs Python. In the following section I will suggest books and online resources for both R and Python.

R – Books and Online Resources

In one of the earlier articles on YOU CANalytics I have suggested many books and online resources to learn R. In that article, I have recently added links to PDF files for the books for R. So, I suggest you revisit that post even if you have read it before. You could find that post on the following link – Learn R : 12-books (Free PDFs) and Online Resources.

R and Python – Books and Online Resources

This book uses both R and Python for marketing analytics. It is rare to find books that use both langugaes.

Marketing Data Science

Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python Thomas W. Miller

YOU CANalytics Book Rating 4.7 Stars (4.7 / 5)

“When I prepare data for analysis or work on the web, I use Python. For modeling or graphics, I often use R” – this statement by the author of this book summarizes the way data scientists want to use R and Python. This is an excellent book to learn marketing analytics. The book covers all the major data science activities for marketers including pricing, promotion, product design, recommendation etc. However, before you reach out for this book make sure you have some functional knowledge of either R or Python.

Partial Google Book

Python – Books and Online Resources

Now let me introduce a few books I have found useful to learn Python. I have divided these books into four different categories based on their utilities. These books will be presented under the following categories:

  1. Books: Python for General Purposes of Data Science
  2. Books: Python for Specialized Applications in Data Science
  3. Books: Python for Text Analytics
  4. Books: Python for Image Analytics

Also, be prepared to see some exotic animals on the cover pages of almost all the books to follow.

1. Books: Python for General Purposes of Data Science

python for data analysis

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython – Wes McKinney

YOU CANalytics Book Rating 4.2 Stars (4.2 / 5)

As I mentioned earlier, Python is excellent when it comes to data preparation, data munging, data wrangling etc. This is a good book to start learning these skills. In this book, a friendly interface IPython is used throughout to code. This makes it easy for beginners and non-programmers. Additionally, it provides good working knowledge of NumPy and Pandas.

Read Full PDF: Python for Data Analysis

Data Science from Scratch

Data Science from Scratch: First Principles with Python – Joel Grus

YOU CANalytics Book Rating 4 Stars (4 / 5)

This book has a much more balanced approach to theory and programming than most other books available in the market with Python as the choice of language. I still feel there are many better books on R to learn the machine learning and statistical aspect of data science. However, if you want to learn these topics through Python, ‘Data Science from Scratch’ is not a bad book to start.

Read Full PDF: Data Science from Scratch

2. Books: Python for Specialized Applications in Data Science

Programming Collective Intelligence

Programming Collective Intelligence: Building Smart Web 2.0 Applications – Toby Segaran

YOU CANalytics Book Rating 5 Stars (5 / 5)

This is a wonderful book for the following reasons: brilliantly written, fun to read, makes the reader think, and quite practical. While reading this book you can easily figure out that the author loves his subject. Collective intelligence is about making decisions through the wisdom of the crowd instead of one expert opinion. The book introduces practical approaches to extract this knowledge from the web. Given that the book was written in 2007 there are a few outdated codes in this book. However, the underlying principals and ideas are extremely relevant and will continue to be so. I strongly recommend that you read this book.

Read Full PDF : Programming Collective Intelligence (Use the first link in the Google Search)

Mining the Social Web

Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More Matthew A. Russell

YOU CANalytics Book Rating 4.8 Stars (4.8 / 5)

Are you interested in mining social media sites? Twitter, Facebook, LinkedIn, Google+: this book has a chapter to extract information from all these sites and more. This is a good book especially to extract information from Twitter. However, I must offer a word of caution: APIs for these social media sites change quite regularly so you will hit a roadblock a few times while using the codes from this book. I suggest you buy the latest edition and refer to the internet during your practice sessions.

Read Full PDF : Mining the Social Web (1st Edition)

3. Books: Python for Text Analytics

Text Processing in Python

Text Processing in Python– David Mertz

YOU CANalytics Book Rating 4.5 Stars (4.5 / 5)

One of the most complicated problems in machine learning is to extract meaning from a free flowing text through algorithms. This book is going to introduce you to the wonderful world of text processing in an intuitive fashion. You will learn about string functions and operations, regular expression, text parsing etc. This is a great book to start your text processing journey. Notice, there are no animals on the cover of this book – how fascinating!

Read Full PDFText Processing in Python

Natural Language Processing with Python

Natural Language Processing with PythonSteven Bird and Ewan Klein

YOU CANalytics Book Rating 3.8 Stars (3.8 / 5)

The book can be considered as a manual for Python NLTK (Natural Language Toolkit). NLTK is a powerful toolkit to implement natural language processing (NLP) i.e. make machines understand human languages. This book doesn’t cover the theoretical depth and nuances of NLP which is a bit frustrating. However, this is still a good book to learn NLTK.

Read Full PDF: Natural Language Processing with Python

4. Books: Python for Image Analytics

Programming Computer Vision with Python

Programming Computer Vision with Python – Jan Erik Solem

YOU CANalytics Book Rating 4.3 Stars (4.3 / 5)

A greyscale digital image is just a large matrix of numbers with pixel information. Each color image has 3 matrices with RGB (red-green-blue) level pixel information. A wide screen HD TV has image matrix dimensions of 1920 x 1200 pixels. A color image with these dimensions has over 6 million numbers stored to represent individual pixels for RGB. Now, if you want to learn more about manipulating image matrices and image processing read this book. It is a gentle introduction to computer vision. The question is, can the computer see the world the way you and I see it?

Read Full PDF: Programming Computer Vision with Python

Learning OpenCV

Learning OpenCV: Computer Vision with the OpenCV Library – Gary Bradsk & Adrian Kaehler

YOU CANalytics Book Rating 4.8 Stars (4.8 / 5)

Computer vision is a fascinating topic as mentioned earlier. While we see pictures of a butterfly, computers see matrices of numbers. The question is how to make the computer identify the butterfly within pixel-numbers? OpenCV (open computer vision) is a powerful C based library that has answers to this question. OpenCV can be called from Python for image processing. This book is a great introduction to OpenCV. A must read if you want to learn image processing and image analytics.

Read Full PDF: Learning OpenCV

Sign-off Note

The one thing they love more than a hero is to see a hero fail, fall, die trying. In spite of everything you’ve done for them, eventually, they will hate you [Spider-Man].

– Green Goblin / Norman Osborn

I guess, we do love to see our heroes fail. Why else will we make them compete against each other? I don’t know the reason for this. Possibly, as a race we are sadistic creatures. Possibly, we are just jealous of people better than us. Possibly, we love sadness despite our claims about our love for happiness. Possibly, we relate with the demons these superheroes fight. Possibly, we believe in the futility of life.

All the above reasons are just a half truth to me. For me a more likely reason is that we have both day and night inside us. Some days it is bright and sunny for us, and the other days it is pitch-dark. Let us embrace the grayness of life. In the same breath, let us embrace both Python and R with their individual insufficiency. Let’s stop pulling our superheroes down.

Источник: ucanalytics.com

Добавить комментарий