Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / NewStats: 3,224,605 members, 8,060,373 topics. Date: Thursday, 23 January 2025 at 02:11 PM |
Nairaland Forum / Science/Technology / Programming / Data Science Tutorial For Beginners With Python Programming Language (8960 Views)
Let's Build A Simple Blog With Python(django) / Astro Programming Language 0.2 (indefinite release) (2) (3) (4)
Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 8:41am On Aug 29, 2016 |
Hello NL, I will attempt to share the little I know about "Data Science" using the python programming language. The tutorial will work you through a typical Data Science problem where we will scrape for web data, analyze and visualize the dataset to extract values from it. Am not sure if someone out there will be interested? Here is the content of the tutorial:- 1) Introduction 2) Web data Scrapping 3) Data Cleaning 4) Data Analysis and Visualization PS: At the end of this tutorial, you will be able to pick a small dataset available online and, using Python language, quickly calculate descriptive statistics and show their results with basic charts and tables. 8 Likes 3 Shares |
Re: Data Science Tutorial For Beginners With Python Programming Language by Nasa28(m): 8:42am On Aug 29, 2016 |
following |
Re: Data Science Tutorial For Beginners With Python Programming Language by Favorite1: 8:48am On Aug 29, 2016 |
Following |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 8:56am On Aug 29, 2016 |
Ok, anyway lets start with an introduction. |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 9:36am On Aug 29, 2016 |
[size=20pt]Introduction[/size] Even though this tutorial is for beginners, I will assume some basic knowlegde of the following;- ~ Mathematics/Statistics ~ Use of Computer ~ Python programming [size=15pt]What is "Data Science"?[/size] There are several definitions available on "Data Science". Am going to use a simple one: "Data Science involves extracting and interpreting data effectively and presenting it in a simple, non-technical language to the end-users" (source: Edureka.co). According to (Drewconway.com data-science-venn-diagram), data science lies at the intersection of: • Hacking skills • Math and statistics knowledge • Substantive expertise There’s a joke that says "a data scientist is someone who knows more statistics than a computer scientist and more computer science than a statistician". [size=15pt]Tutorial Tool[/size] There are several tools for data science which includes:- ~ Python ~ R ~ MATLAB ~ SAS ~ Julia ~ SQL ~ RapidMiner ~ DataRobot ~ Weka ~ SPSS Any tool you decide to choose is good. However, I will use is the Python programming for this tutorial, because it is one of the top data science tools out there for crunching data. It is closely followed by R. [size=15pt]Tutorial Dataset[/size] The dataset we are going to use for this tutorial is the: Birthday list on NairaLand homepage (i.e: NairaLand Forum Members' Birthday Data). [img]http://2.bp..com/-FrimAg6bIts/V7XZg_rir3I/AAAAAAAABFY/JKfvcnRiHusAEWTTfF7Rytj0fWQ2duvVACLcB/s1600/Sample_NL_Birthdays.bmp[/img] Using the Birthday dataset we will attempt to answer some questions like:- a) How many members are celebrating their birthdays today? b) Who is the oldest and youngest member celebrating his/her birthdays today? c) What is the average age the celebrants? d) How old will each celebrant be in 10years? e) How old was each celebrant when NairaLand was established [size=15pt]Python Packages[/size] Since we are using Python programming, let me list the packages/libraries needed for this tutorial and what we are going to use them for. ~ re, requests, BeautifulSoup: libraries for Scraping and Cleaning the data ~ pandas, datetime: libraries for Analyzing and Visualizing the data You need to have them installed on your python distribution using "pip install package_name". Or get a python distribution (such as Anaconda python or Enthought canopy) that have all the packages installed by default. That is it for this class. Next we will look at how to scrap the birthday data from Nairaland home page. 4 Likes 3 Shares |
Re: Data Science Tutorial For Beginners With Python Programming Language by Stconvict(m): 1:10pm On Aug 29, 2016 |
Following... |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 2:14pm On Aug 29, 2016 |
I forgot to mention the applications or uses of Data Science. Here are some Applications/Uses of Data Science:- ~ Internet Search ~ Digital Advertisements (Targeted Advertising and re-targeting) ~ Recommender Systems ~ Image Recognition ~ Speech Recognition ~ Gaming ~ Price Comparison Websites ~ Airline Route Planning ~ Fraud and Risk Detection ~ Delivery logistics ~ Self Driving Cars ~ Robots Apart from the applications mentioned above, data science is also used in Marketing, Finance, Human Resources, Health Care, Government Policies and every possible industry where data gets generated. Read more at Analytics Vidhya 1 Like 1 Share |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 7:11am On Aug 30, 2016 |
[size=20pt]Web data Scrapping - Scrap birthday data from Nairaland home page[/size] In this class, we are going to collect our dataset for this tutorial. Recall that our dataset is the birthday list of Nairaland members on the home page. The home page url is at: https://www.nairaland.com/home. If you scroll down the page, you will see the list of members having their Birthday today! Fine, we now know where our data is located on the Web. So we need to collect it for Data Science purpose. We can easily go to the web page, then copy and edit the Birthday list for our analysis. But since we are going to collect this birthday dataset over a long period of time (probably for one year) on daily bases, copying and editing the list will not be efficient. So we need a way to automate the boring process by doing what is known as web scraping. Web scraping is a computer software technique of extracting information from websites. This technique mostly focuses on the transformation of unstructured data (HTML format) on the web into structured data (database or spreadsheet). Web scraping is the term for using a program to download and process content from the Web. For example, Google runs many web scraping programs to index web pages for its search engine. [size=15pt]Understanding the Birthday dataset[/size] Before we start collecting the dataset, let try to understand the nature of the dataset. The Birthday list is in this format: rodbel(29), Sirolad(29), mokei(27)... The first word is the username of the member and his age in braces. That is: member_username(age). The format above isn't friendly for Data Science. Tabular datasets are more suitable for Data Science, so we need to clean it into a tabular format useful in Python data science. Note: If you inspect the html of the Birthday list, you should see that it is contained in a cell of table tag (< td > ......... < /td >. In Summary: We want to scrap data from this format "rodbel(29), Sirolad(29), mokei(27)" into tabular format. [size=15pt]Warning before Scraping a website[/size] There are a few points that we need to go over before we start scraping. ~ Always check the website’s terms and conditions before you scrape them. They usually have terms that limit how often you can scrape or what you can you scrape ~ Because your script will run much faster than a human can browse, make sure you don’t hammer their website with lots of requests. This may even be covered in the terms and conditions of the website. ~ You can get into legal trouble if you overload a website with your requests or you attempt to use it in a way that violates the terms and conditions you agreed to. ~ Websites change all the time, so your scraper will break some day. Know this: You will have to maintain your scraper if you want it to keep working. ~ Unfortunately the data you get from websites can be a mess. As with any data parsing activity, you will need to clean it up to make it useful to you. With that out of the way, let’s start scraping! Lets extract the Birthday list first using these python modules: re, requests, and BeautifulSoup. Then in the next class, we will clean the dataset into tabular format. The code below does exactly the extraction for us. # import the libraries we are going to use The completed code is available in Jupyter/IPython notebook at: http://nbviewer.jupyter.org/github/forum2k9/NairaLand-Members-Birthday-Data/blob/master/NairaLand_Members_Birthday_Data.ipynb In the next class, we will Clean the data into a friendly (tabular) format.
|
Re: Data Science Tutorial For Beginners With Python Programming Language by noordean(m): 10:40am On Aug 30, 2016 |
Good job boss.
Please where is the Nairaland's terms and conditions located?
can't see it |
Re: Data Science Tutorial For Beginners With Python Programming Language by kingofthejungle(m): 1:22pm On Aug 30, 2016 |
u have a nice blog though |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 1:39pm On Aug 30, 2016 |
noordean: I also searched couldn't it except for the posting Rules |
Re: Data Science Tutorial For Beginners With Python Programming Language by tajoo: 2:53pm On Aug 30, 2016 |
thanks umar...im totally a novice but i find it interesting |
Re: Data Science Tutorial For Beginners With Python Programming Language by Fedric(m): 7:42pm On Aug 30, 2016 |
umaryusuf:bro what programming language can i use to create a forum like nairaland? phython or php? |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 5:46am On Aug 31, 2016 |
Before I post about today's class on: Data Cleaning, let me metion this.... The technique of web scrapping explained above can be used to extract virtually any kind of data from the web. For example: you can extract phone numbers or email from web url using web scrapping technique, all you have to do is to define the exact pattern of what you intend to extract. Weather data, Stock data, Social/Media data etc can all be scraped from the web. |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 7:35am On Aug 31, 2016 |
[size=15pt]Data Cleaning - Clean the data into a friendly format[/size] Lets clean our data by extracting all irrelevant text out and keep only the birthday list in the format of: Username, age. To be saved in a CSV file # lets read out the text only ignoring the tag cell in a table After you have collected the dataset for months, you can then Merge all csv files into one file using pandas concat() method. The concat() method takes in list of dataframes (the CSVs) to be merge together. As mentioned earlier, the complete source code is on IPython notebook at: http://nbviewer.jupyter.org/github/forum2k9/NairaLand-Members-Birthday-Data/blob/master/NairaLand_Members_Birthday_Data.ipynb In the next class, we will discus on how to Analyze and Visualize the dataset. 1 Like 1 Share
|
Re: Data Science Tutorial For Beginners With Python Programming Language by Fedric(m): 8:03am On Aug 31, 2016 |
great job, bro. your student is fully present. |
Re: Data Science Tutorial For Beginners With Python Programming Language by Stconvict(m): 11:07pm On Aug 31, 2016 |
Thanks Umar! |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 6:22am On Sep 01, 2016 |
No one asked any question so far! Well it means no problem with the classes above. We have to go through the above process of Data Scraping/Collection and Cleaning due to the nature and location of our dataset. "We have cook our data before we eat it". In a cases where you have already cooked data, then you won't pass through the collection and cleaning process. So at this point, each day you run the script, it will extract the day's Nairaland members birthday list and save it in a CSV file. So by the time you run the script for one week, you will have 7 different CSV file containing birthday list for those seven days. By the time you run it for 1year, you would have enough dataset to make analyses, visualizations and predictions. In the last class, we will do some basic analyses and visualizations. [img]http://4.bp..com/-zYjoae56TRo/V7dYxe7rYoI/AAAAAAAABF4/2xP5Q--oNqM1HdwDQJbEQDLI6Fp69kl0ACLcB/s1600/areaplot.png[/img] To Analyze and Visualize our data, below are some of the questions we are going to answer:- a) How many members are celebrating their birthdays today? b) Who is the oldest and youngest member celebrating his/her birthdays today? c) What is the average age the celebrants? d) How old will each celebrant be in 10years? e) How old was each celebrant when NairaLand was established? Till then, drop your questions or comments/suggestions. |
Re: Data Science Tutorial For Beginners With Python Programming Language by neahyo(m): 8:51am On Sep 01, 2016 |
umaryusuf:Thumbs up to you bro for sharing your codes. I use R for my analysis but I'm also a Python enthusiast; the python code for cleaning and scrapping the data is really esoteric. My question is this: 1. After collecting the data, I decided to scrape and clean it using Excel (I can't use python since I'm still a learner), what is the python command for importing the data which is already in a csv format. 2. How do I subset the command in order to answer objectives 1-5. 3. I'm proficient in R but I'm really passionate about learning Python. How can you help me? I have downloaded some materials already anyways but its not helping enough. |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 12:31pm On Sep 01, 2016 |
neahyo: Good to hear you are proficient in R, I would love to see you recode this tutorial in R - that is an R version to compare! Maybe you can post it here in the nearest future. I also want to challenge other gurus that use other packages like: MATLAB, SPSS, Excel, Julia, Java etc to kindly provide there versions for this tutorial. In response to your questions:- 1) There are several command to work with excel/csv files in Python. But since we used pandas library in this tutorial, the pandas command to import:- ~ Excel is: read_excel() ~ CSV is: read_csv() 2) The next class will answer the question on: How do I subset the command in order to answer objectives 1-5. 3) I will advice you use the same approach you used when learn R to learn Python. But have it at the back of your mind that python is general purpose language on like R, so don't waste your time learning packages in Python that are not useful for a Data Scientist. Here is a quick learning path to follow:- ~ Learn Python Basics ~ Learn Python Regular Expression ~ Learn Python Object Oriented Programming ~ Learn Python Data Science libraries 2 Likes 1 Share |
Re: Data Science Tutorial For Beginners With Python Programming Language by Stconvict(m): 5:51am On Sep 03, 2016 |
Have you considered Julia Umar? |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 8:31am On Sep 03, 2016 |
Stconvict: Yeah, that is another good one for Data Science. ARe you using it? |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 9:02am On Sep 03, 2016 |
[size=15pt]Data Analysis and Visualization - Analyze and Visualize the data[/size] In this last class "Analyze and Visualize the data", we will do some kind of Quick Data Exploration. Following is the library we will use: pandas Remember that we have imported it earlier in our code. So we use its commands to explore the dataset by attempting to answer some useful questions such as:- a) How many members are celebrating their birthdays today? b) Who is the oldest and youngest member celebrating his/her birthdays today? c) What is the average age the celebrants? d) How old will each celebrant be in 10years? e) How old was each celebrant when NairaLand was established? Use describe() function to get the summary statistics of numerical fields. Use sort_values() function to know the old and young ages. Create a new column for the ages in ten year to come. And create another column for age at 2005. To do some plottings, let plot the first 10 youngest members. Note: If you are using IPython (Jupyter) notebook, to display the plot within the notebook you have to call this magic command: %matplotlib inline # Checking the statistical summary of the age column Horizontal Bar Plot [img]http://3.bp..com/-pdRUjRMyt2s/V7dYvoIlGnI/AAAAAAAABF0/uLJITBL427Y1j_1Kp9LDkAIALhVfXO6xACLcB/s1600/10%2Byoungest%2Bmembers%2Bcelebrating.png[/img] Area Plot [img]http://4.bp..com/-zYjoae56TRo/V7dYxe7rYoI/AAAAAAAABF4/2xP5Q--oNqM1HdwDQJbEQDLI6Fp69kl0ACLcB/s1600/areaplot.png[/img] Box Plot [img]http://3.bp..com/-DYfHvDBUCkA/V7dYxzGqoFI/AAAAAAAABF8/aFORbu_xiQ03Kcj1iwwSEqP2_wO1NPg5ACLcB/s1600/boxplot.png[/img] As mentioned before, the complete code notebook is at: http://nbviewer.jupyter.org/github/forum2k9/NairaLand-Members-Birthday-Data/blob/master/NairaLand_Members_Birthday_Data.ipynb That is it. I hope now, you will be able to pick a small dataset available online and using Python to quickly calculate descriptive statistics and show their results with basic charts and tables. Goodluck in your data science career. 1 Like |
Re: Data Science Tutorial For Beginners With Python Programming Language by Stconvict(m): 1:57pm On Sep 03, 2016 |
umaryusuf:Yeah. I'm Nypro. Omaar Yosif. |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 3:45pm On Sep 03, 2016 |
Stconvict: Lol! Since its you, I challenge you to reproduce this thread with Julia? 1 Like |
Re: Data Science Tutorial For Beginners With Python Programming Language by Stconvict(m): 7:58pm On Sep 03, 2016 |
umaryusuf:I would have but I'm currently working on some project. I will definitely write this in Julia once I'm done. So challenge accepted. 1 Like |
Re: Data Science Tutorial For Beginners With Python Programming Language by LoveDecay(m): 7:36am On Sep 04, 2016 |
Good work Umar, you have done quite well. I dont know why Seun wont provide nairaland datasets. At least from the politics section. Reddit gave out their 2015 data set. I am into text analytics. 1 Like |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 5:22pm On Sep 04, 2016 |
LoveDecay: You are right boss. Almost all large websites like Twitter, Facebook, Google, Twitter, StackOverflow provide APIs to access their data in a structured manner. If you can get what you need through an API, it is almost always preferred approach over web scrapping. |
Re: Data Science Tutorial For Beginners With Python Programming Language by ibnquasale(m): 4:32am On Sep 05, 2016 |
Re: Data Science Tutorial For Beginners With Python Programming Language by umaryusuf(m): 7:15am On Sep 06, 2016 |
ibnquasale: You are welcome boss. |
Re: Data Science Tutorial For Beginners With Python Programming Language by Fosi: 9:55pm On Oct 24, 2016 |
Hmmmm............Good to know. I just started learning about Hadoop 3 days ago (because i need to build a datalake ) and the more i read, the more i realized that I need to know other prog. language like Python and R. Kudos to you for sharing your knowledge with people. |
My First Real PHP Based Website / Kehinde Adeyemi Among 12 Best Global IT Developers / Difference Between Computer Science And Computer Science Education
(Go Up)
Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health religion celebs tv-movies music-radio literature webmasters programming techmarket Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Nairaland - Copyright © 2005 - 2025 Oluwaseun Osewa. All rights reserved. See How To Advertise. 105 |