Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,218,664 members, 8,038,765 topics. Date: Saturday, 28 December 2024 at 07:57 AM

What you must consider before becoming a Data Scientist - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / What you must consider before becoming a Data Scientist (550 Views)

Net Salary For A Data Analyst Or Scientist Or Web Dev / Chronicle Of A Data Scientist/analyst / Aspiring Data Scientist. (2) (3) (4)

(1) (Reply) (Go Down)

What you must consider before becoming a Data Scientist by Professor2196(m): 2:30am On May 14, 2022
I am a computer engineering undergrad and I have been studying machine learning for a cumulative period of one year and I still consider myself a newbie, because anytime I think I have reached the end of the road with a concept, I discovered that there are 10 more roads waiting behind it and so on. Through my relatively short journey, I have gone through quite a few rough patches and have gotten stuck in a few quagmire, I have gotten through them with the help of authors from their books I read, helpful answers from stack overflow and through pure grits. I have gained insights that I think would be helpful to my fellow newbies who are not far behind me.

Passion isn’t everything, just as in the real world, so it is in the programming world. There are challenges that can and will try their possible best to crush the life out of that passion. Most people who go into data science are mesmerized by what can be achieved by it, what has been achieved by it, which leads them to drool at the prospects of what they can achieve by it when they use it.

And as the bible passage says “Faith cometh by hearing, and hearing by the word of God”, their passion grows stronger the more they read articles of amazing projects and developments going on in data science. Yes, they are prepared to study harder, sleep late night and all other things that people who are passionate about something do to reach a competency level, they have never been more passionate about anything else in their life before.

You may have gotten to the stage where you are now purely based on the strength of that passion, if you don’t find something else to support it, it will literally be impossible to go far.

Most newbie never for once consider that their own computer could be their greatest challenge. Data science is the only field of programming where the most emphasis is placed your computer processing capabilities. Other programming field which are near-like instantaneous like Web, Desktop (GUI) and Database development are not overly dependent on your RAM and processor speed, they can easily be developed on a low-end computer just as like on a high-end computer.

When I say low-end computers, those whose computers' have over 16gb RAM and additional GPU card are not included, I only refer to those who use computers with no more than 8GB of RAM and 1GB or less of GPU. Over 80% of developers in Nigeria uses computers with these spec, and with computer of 4GB of RAM and no GPU been the overwhelming majority.

Through my journey in studying Machine learning, my faithful companion is a Dell Windows 10 4GB RAM 2th Generation Corei5 Intel processors with no GPU. The beginning journey was extremely smooth, training on the iris, Boston housing datasets was with near-like instantaneous speed (or close enough to it). It was amazing, all my datasets could fit into memory, any error occurred could be easily be debugged, and most often than not, it could be a syntax error or a dataset not properly formatted, containing values that trips up the machine learning model (as I said easily overcome). But as my expertise increased one particular error became increasingly frequent, the Numpy Out of Memory error. I could no longer run codes written by authors without generating that error.

Obviously, The authors uses computer with a much higher RAM, and I bet, additional GPUs to execute their codes.

The out of memory error became so ubiquitous that we became old friends, it got to the point that whenever I run a code from a textbook and it didn't raise the error, I become suspicious and wonder if I may have made a mistake in implementing the author's code.

This was an almost unsurmountable obstacle for me, to the extent it forces me to give up on projects when their datasets couldn't all be loaded into memory because some machine learning model needed them all at once (e.g non-linear models) or that the dataset could be loaded into memory but there is no space left for the model to train it parameters (which is even worse).

The spirit (your passion) indeed is willing, but the flesh (your computer) is weak. - Matt 26: 41

This error forced me to gain a deeper understanding of memory management, I no longer see data types as oh those, I now say what data type with the minimum memory footprint will best represent my dataset with minimal loss of information. I adopted functional programming paradigm when executing tasks (i.e. I execute instructions in functions and return the needed value(s), leaving python to delete variables arbitrary created during execution when it has run out of scope ) to prevent leaving around unused variables that is hogging up memory.

Thank GOD for Deep learning algorithm who with all their power are not greedy enough to want all the data at once but are always hungry for chunks of them [no matter the size], and for Tensorflow data API and Numpy memmap function who provides the means to feed such chunks.

Believe me, if you persevere, you will become a more tenacious machine learning engineers than your peers who cruise with their huge RAM computer, creating a one-hot encoded arrays with a np.int64 data type, or worse not knowing what data type was used to build that array.

Should you eventually acquire a huge RAM computer, you would still maintain that frugality in assigning RAM resources.

I urge all machine learning beginners to look within themselves carefully and consider if they have the grit needed to forge on when their passion no longer pull them forward, not wasting months instead going on ahead only to then back down when faced with overwhelming obstacle, months that would have been used to learn programming in other fields (like web dev) because they think they can fake it. If you read this and still think you have what it takes, then forge on, see every obstacle as an opportunity to learn some new technical skills. Use tools in ways it wasn't designed to be used, if that still doesn't work, create your own toolset.

God loves you!

3 Likes

Re: What you must consider before becoming a Data Scientist by willian10: 3:56am On May 14, 2022
Professor2196:
I am a computer engineering undergrad and I have been studying machine learning for a cumulative period of one year and I still consider myself a newbie, because anytime I think I have reached the end of the road with a concept, I discovered that there are 10 more roads waiting behind it and so on. Through my relatively short journey, I have gone through quite a few rough patches and have gotten stuck in a few quagmire, I have gotten through them with the help of authors from their books I read, helpful answers from stack overflow and through pure grits. I have gained insights that I think would be helpful to my fellow newbies who are not far behind me.

Passion isn’t everything, just as in the real world, so it is in the programming world. There are challenges that can and will try their possible best to crush the life out of that passion. Most people who go into data science are mesmerized by what can be achieved by it, what has been achieved by it, which leads them to drool at the prospects of what they can achieve by it when they use it.

And as the bible passage says “Faith cometh by hearing, and hearing by the word of God”, their passion grows stronger the more they read articles of amazing projects and developments going on in data science. Yes, they are prepared to study harder, sleep late night and all other things that people who are passionate about something do to reach a competency level, they have never been more passionate about anything else in their life before.

You may have gotten to the stage where you are now purely based on the strength of that passion, if you don’t find something else to support it, it will literally be impossible to go far.

Most newbie never for once consider that their own computer could be their greatest challenge. Data science is the only field of programming where the most emphasis is placed your computer processing capabilities. Other programming field which are near-like instantaneous like Web, Desktop (GUI) and Database development are not overly dependent on your RAM and processor speed, they can easily be developed on a low-end computer just as like on a high-end computer.

When I say low-end computers, those whose computers' have over 16gb RAM and additional GPU card are not included, I only refer to those who use computers with no more than 8GB of RAM and 1GB or less of GPU. Over 80% of developers in Nigeria uses computers with these spec, and with computer of 4GB of RAM and no GPU been the overwhelming majority.

Through my journey in studying Machine learning, my faithful companion is a Dell Windows 10 4GB RAM 2th Generation Corei5 Intel processors with no GPU. The beginning journey was extremely smooth, training on the iris, Boston housing datasets was with near-like instantaneous speed (or close enough to it). It was amazing, all my datasets could fit into memory, any error occurred could be easily be debugged, and most often than not, it could be a syntax error or a dataset not properly formatted, containing values that trips up the machine learning model (as I said easily overcome). But as my expertise increased one particular error became increasingly frequent, the Numpy Out of Memory error. I could no longer run codes written by authors without generating that error.

Obviously, The authors uses computer with a much higher RAM, and I bet, additional GPUs to execute their codes.

The out of memory error became so ubiquitous that we became old friends, it got to the point that whenever I run a code from a textbook and it didn't raise the error, I become suspicious and wonder if I may have made a mistake in implementing the author's code.

This was an almost unsurmountable obstacle for me, to the extent it forces me to give up on projects when their datasets couldn't all be loaded into memory because some machine learning model needed them all at once (e.g non-linear models) or that the dataset could be loaded into memory but there is no space left for the model to train it parameters (which is even worse).

The spirit (your passion) indeed is willing, but the flesh (your computer) is weak. - Matt 26: 41

This error forced me to gain a deeper understanding of memory management, I no longer see data types as oh those, I now say what data type with the minimum memory footprint will best represent my dataset with minimal loss of information. I adopted functional programming paradigm when executing tasks (i.e. I execute instructions in functions and return the needed value(s), leaving python to delete variables arbitrary created during execution when it has run out of scope ) to prevent leaving around unused variables that is hogging up memory.

Thank GOD for Deep learning algorithm who with all their power are not greedy enough to want all the data at once but are always hungry for chunks of them [no matter the size], and for Tensorflow data API and Numpy memmap function who provides the means to feed such chunks.

Believe me, if you persevere, you will become a more tenacious machine learning engineers than your peers who cruise with their huge RAM computer, creating a one-hot encoded arrays with a np.int64 data type, or worse not knowing what data type was used to build that array.

Should you eventually acquire a huge RAM computer, you would still maintain that frugality in assigning RAM resources.

I urge all machine learning beginners to look within themselves carefully and consider if they have the grit needed to forge on when their passion no longer pull them forward, not wasting months instead going on ahead only to then back down when faced with overwhelming obstacle, months that would have been used to learn programming in other fields (like web dev) because they think they can fake it. If you read this and still think you have what it takes, then forge on, see every obstacle as an opportunity to learn some new technical skills. Use tools in ways it wasn't designed to be used, if that still doesn't work, create your own toolset.

God loves you!
Is it more difficult than web devepment?cos I just started learning python few weeks ago and I'm enjoying the lectures, atleast the few things I have learnt so far
Re: What you must consider before becoming a Data Scientist by sheddysk: 12:18pm On Apr 03
Diving into data science in Nigeria, or anywhere, can feel like trying to learn a new language overnight. It’s thrilling, sure, but packed with its fair share of “Oh no, what did I get myself into?” moments. Let’s unravel some common hurdles beginners face and, more importantly, how to overcome them. Click here to read the full article: https://datasetnexustech.com/data-science-in-nigeria-common-hurdles-for-beginners/

(1) (Reply)

GLO MIFI Or Mtn MIFI For A Junior Developer? / Extraction In Javascript: How Would You Write Your Own Code? / What Should Be Next ?

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 42
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.