Scraping Jiji Ideas - Programming

Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,205,610 members, 7,993,081 topics. Date: Monday, 04 November 2024 at 05:14 AM

Scraping Jiji Ideas - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / Scraping Jiji Ideas (1445 Views)

The Future Of Web Scraping / Jumia Black Friday Web Scraping Program / Help Needed Scraping Asp.net Website. (2) (3) (4)

(1) (Reply) (Go Down)

Scraping Jiji Ideas by Devaro: 6:32pm On Jan 05, 2023

Anyone have ideas on how to scrape data like name, business name, phone number

Re: Scraping Jiji Ideas by YoungCabal: 6:54pm On Jan 05, 2023

Devaro:
Anyone have ideas on how to scrape data like name, business name, phone number

why do you want to scrape data from the site ? if you are willing to pay for my time, I can cookup a solution for you

Re: Scraping Jiji Ideas by LittleBigDick(m): 5:09am On Jan 06, 2023

Beautiful soup can do that for you

2 Likes

Re: Scraping Jiji Ideas by Devaro: 5:49am On Jan 06, 2023

LittleBigDick:
Beautiful soup can do that for you

Have any resources?

Re: Scraping Jiji Ideas by chim14(m): 7:24am On Jan 06, 2023

YoungCabal:
why do you want to scrape data from the site ? if you are willing to pay for my time, I can cookup a solution for you

Cook up Beautiful Soup

1 Like

Re: Scraping Jiji Ideas by YoungCabal: 8:14am On Jan 06, 2023

chim14:

Cook up Beautiful Soup

just because there is a python library that eases the job a little doesn't mean my time should be always free, don't you agree ?

OP is clearly scraping the site for his personal business or intends to resell it, he should be willing to foot the bill if he really wants a professional job

3 Likes

Re: Scraping Jiji Ideas by chim14(m): 10:28pm On Jan 06, 2023

YoungCabal:
just because there is a python library that eases the job a little doesn't mean my time should be always free, don't you agree ?

OP is clearly scraping the site for his personal business or intends to resell it, he should be willing to foot the bill if he really wants a professional job

Of course you can't do it for free now, you have to bill him well. I was just humoring on words.

1 Like

Re: Scraping Jiji Ideas by Felixitie(m): 11:36pm On Jan 06, 2023

I have done a project on it before, beautifulsoup will not handle the jiji site due to the infinite scrolling pattern of the website. You have to use selenium + Bs4 + page rendering to render the javascript before scraping.

Re: Scraping Jiji Ideas by YoungCabal: 3:26am On Jan 07, 2023

Felixitie:
I have done a project on it before, beautifulsoup will not handle the jiji site due to the infinite scrolling pattern of the website. You have to use selenium + Bs4 + page rendering to render the javascript before scraping.

It's not even the infinite scrolling alone, you have to click on some data to unhide them, beautiful soup is not the right tool, even with selenium, it won't be an easy task because you either go category by category or build a mini js enabled crawler to index the site

I laughed when I saw someone comment he can show OP how to do it with beautiful soup,

Re: Scraping Jiji Ideas by Felixitie(m): 3:40pm On Jan 07, 2023

YoungCabal:
It's not even the infinite scrolling alone, you have to click on some data to unhide them, beautiful soup is not the right tool, even with selenium, it won't be an easy task because you either go category by category or build a mini js enabled crawler to index the site

I laughed when I saw someone comment he can show OP how to do it with beautiful soup,

Impossible for Bs4 alone, but selenium will work for sure, the clicking of buttons etc., depending on what you want to scrape from the site..not that complex..

Re: Scraping Jiji Ideas by Nobody: 12:05am On Jan 08, 2023

Try puppeteerJs or Nightmarejs using NodeJs

Re: Scraping Jiji Ideas by YoungCabal: 8:19am On Jan 08, 2023

Felixitie:

Impossible for Bs4 alone, but selenium will work for sure, the clicking of buttons etc., depending on what you want to scrape from the site..not that complex..

if it's not that complex, why don't you just paste the source code here for him or the full instruction on how to do it ? admit it, it's something that demands quality attention not just something you can run over.

Re: Scraping Jiji Ideas by bedfordng(m): 10:21am On Jan 08, 2023

YoungCabal:
if it's not that complex, why don't you just paste the source code here for him or the full instruction on how to do it ? admit it, it's something that demands quality attention not just something you can run over.

Jiji is not even as complex as most flight listing or betting website.

selenium can get the job done with ease. Playwright is also good for the job.

At least they have mentioned lots of tooling he can use. It is left for him to learn to use it regardless .

As for pasting source code or script for the op, he needs to pay for the job whether it is complex or not.

Re: Scraping Jiji Ideas by YoungCabal: 12:18pm On Jan 08, 2023

bedfordng:

Jiji is not even as complex as most flight listing or betting website.

selenium can get the job done with ease. Playwright is also good for the job.

At least they have mentioned lots of tooling he can use. It is left for him to learn to use it regardless .

As for pasting source code or script for the op, he needs to pay for the job whether it is complex or not.

You get my point ! OP needs to pay for the job.

Whether it is complex or not, the time the developer spent in acquiring the skill demands a befitting payment, if we keep emphasizing on it being simple, OP will want to underpay for the job or demand for it to be free.

That's why you should never tag any job simple when you bid, it's like demarketing yourself, just highlight your experience and let them decide if they want it or not

Re: Scraping Jiji Ideas by Felixitie(m): 1:37pm On Jan 08, 2023

YoungCabal:
if it's not that complex, why don't you just paste the source code here for him or the full instruction on how to do it ? admit it, it's something that demands quality attention not just something you can run over.

Nigga calm down, just tell me you need it. If it demands quality attention then it will not be free,otherwise he should do a personal search and learn how to do it if he can't pay for it. Besides, do you think the script is going to work for all the pages in jiji.. Abeg move.

Re: Scraping Jiji Ideas by bedfordng(m): 2:05pm On Jan 08, 2023

YoungCabal:
You get my point ! OP needs to pay for the job.

Whether it is complex or not, the time the developer spent in acquiring the skill demands a befitting payment, if we keep emphasizing on it being simple, OP will want to underpay for the job or demand for it to be free.

That's why you should never tag any job simple when you bid, it's like demarketing yourself, just highlight your experience and let them decide if they want it or not

yeah I get the point. Nice reasoning. this is also why tools were mentioned for op to try it for himself.

Re: Scraping Jiji Ideas by YoungCabal: 5:46pm On Jan 08, 2023

Felixitie:

Nigga calm down, just tell me you need it. If it demands quality attention then it will not be free,otherwise he should do a personal search and learn how to do it if he can't pay for it. Besides, do you think the script is going to work for all the pages in jiji.. Abeg move.

Lol! We are cool, man.

Sure, it can work on every page, it depends on how much time you are willing to invest in coding it, there are selenium libraries for some languages which you can integrate with a crawler you build and use regex pattern matching to determine which page is which, that's why I was against tagging it simple as you did since we both don't know OP 's full intention

Re: Scraping Jiji Ideas by Felixitie(m): 7:40pm On Jan 08, 2023

YoungCabal:
Lol! We are cool, man.

Sure, it can work on every page, it depends on how much time you are willing to invest in coding it, there are selenium libraries for some languages which you can integrate with a crawler you build and use regex pattern matching to determine which page is which, that's why I was against tagging it simple as you did since we both don't know OP 's full intention

I feel you bro, the script I developed won't work for all the pages cos it was for personal project. I said simple for the fact that I have seen many tough websites to scrape compared to the easier jiji type. Thanks brother.

Re: Scraping Jiji Ideas by nnuReader: 11:39am On Feb 28, 2023

Here is a python script to scrape all the data, including phone number in less than an hour.

The scripts works by directly fetching data from the jiji API endpoints and paginate:

https:///api_web/v1/listing?slug=X&webp=true&page=Y
where X is the category(vehicles, real-estate...) you want to scrape and Y is the page in the data(23 products returned per page),

You just change keep changing the slug when you're done scraping a ctegory,
and for every category, you keep increasing the page while you save the info and check for duplicates(a vendor can appear muliple times due to multiple product upload)

This approach is miles faster than using tools like puppeteer, selenium or beautiful soup because you're not loading irrelevant files like css, js, images, html...

You can run the script in CMD like the following:

python3 scrape.py vehicles Vehicles

The above scrape the vehicles category

python3 scrape.py real-estate Properties

The above scrape real estates.

If you need more info, mail me at hello@feyitech.com

The Script:

import requests
import time, sys
from common import get_profile_id_list_and_profiles, update_profiles, dict_to_profile_row
from coded_addesses import address_for_fresh, address_for_new, address_for_slider

S = requests.Session()

SCRAPE_TYPES = {
"fresh": "fresh",
"new": "new",
"slider": "slider"
}

ACCEPTED_TYPES = [
'vehicles', 'real-estate', 'mobile-phones-tablets',
'electronics', 'home-garden', 'health-and-beauty',
'fashion-and-beauty', 'hobbies-art-sport', 'seeking-work-cvs',
'services', 'babies-and-kids', 'animals-and-pets',
'agriculture-and-foodstuff', 'office-and-commercial-equipment-tools',
'repair-and-construction'
]
if len(sys.argv) < 2 or sys.argv[1] not in ACCEPTED_TYPES:
print('No category specified\n\n. Example: "python3 scrape.py vehicles"\n\n.Accepted categories are: %s' % ", ".join(ACCEPTED_TYPES))
else:
type = sys.argv[1]
name = type
if len(sys.argv) > 2:
name = sys.argv[2]
profile_id_list_and_profiles = get_profile_id_list_and_profiles()
#print(profile_id_list_and_profiles[1])
if profile_id_list_and_profiles is not None:
def get_address(page):
return "https:///api_web/v1/listing?slug=%s&webp=true&page=%d" % (type, page)

profile_id_list = profile_id_list_and_profiles[0]
keep_running = True
total_pages = 0
page = 1
total_new_profiles = 0
while keep_running:
res = S.get(get_address(page))
total = 0
counts = 0

if res.status_code == 200 and res.json()["status"] == "ok":
new_profiles = []
body = res.json()
data = body["adverts_list"]
list = data["adverts"]
total = len(list)
counts = data["count"]
total_pages = data["total_pages"]
#print(list)
print("Count: %d | Size: %d\n | Page: %d" % (counts, total, page))
for p in list:
if p["user_id"] not in profile_id_list:
new_profiles.append(p)
profile_id_list.append(p["user_id"])
#print("phone:", p["id"])
update_profiles(new_profiles)
total_new_profiles = total_new_profiles + len(new_profiles)
page = page + 1
else:
print("Error: %d\n" % res.status_code)
if page >= total_pages:
keep_running = False
else:
time.sleep(1.5)
print("TotalNewEntry: %d" % total_new_profiles)

Re: Scraping Jiji Ideas by LikeAking: 1:15pm On Feb 28, 2023

Please don't suggest a process you haven't used. U guys are the one killing tech in Nigeria.

All your solutions will not work for jiji... Make una calm down..

Scraping data on jiji is not a small task.. Don't make it sound small, if e easy do am for op.

(1) (Reply)

Google: Lagos Code Camp 10/2009 / Science: Could You Travel Back In Time? See What Scientists Achieved. / Frontend Devloper/designer Needed

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 40
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.