Python | 파이썬으로 정적 페이지 크롤링하기

개발_기타/Python

Python | 파이썬으로 정적 페이지 크롤링하기

zuyo 2022. 3. 24. 22:30

1. BeautifulSoup 설치 (정적 페이지 크롤링에 사용되는 라이브러리)

# 파이썬에 모듈을 설치하기 위해 pip를 설치한다.

# 1. 설치 되어있는지 확인
pip
# or
pip3

# 2. 설치
# Redhat 계열 (CentOS)
yum install python-pip
# MacOS
sudo easy_install pip

# MacOS에서 설치 시 Systax Error가 발생하는 경우
curl 'https://bootstrap.pypa.io/get-pip.py' > get-pip.py
sudo python3 get-pip.py
# 참고 : https://programmerah.com/solved-failed-to-install-pip-for-macos-prompt-syntax-error-invalid-syntax-41653/

#----------------------------------------------------------------

# Beautifulsoup4 설치
pip3 install beautifulsoup4

2. Python 코드

#!/usr/bin/env python3

from bs4 import BeautifulSoup
import requests
from requests import get
import urllib.request
import os
from urllib.parse import urlparse

page = requests.get(input('Enter URL: '))
soup = BeautifulSoup(page.text, 'html.parser')

for img in soup.find_all('img'):
    imgurl = img['src']
    a = urlparse(imgurl)
    print("imgurl : " + imgurl)
    imgname = os.path.basename(a.path)
    print("imgname : " + imgname)
    
    urllib.request.urlretrieve(imgurl, filename=imgname)

3. 실행

# 실행
python3 파일명.py

# 테스트용 URL
https://shield41791.github.io/dior/

# ModuleNotFoundError: No module named 'requests' 에러가 발생하는 경우
pip3 install requests

참고

https://geundung.dev/36

저작자표시 비영리 변경금지 (새창열림)

현재글Python | 파이썬으로 정적 페이지 크롤링하기

yohanistory

JSTL, jQuery, db, jsp, TCP/IP, Oracle, 소켓프로그래밍, JavaScript, 오라클, spring, Eclipse, Linux, Git, 단축키, 보충필요, 자바, html, 알고리즘, 네트워크, Java,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

yohanistory

Python | 파이썬으로 정적 페이지 크롤링하기

1. BeautifulSoup 설치 (정적 페이지 크롤링에 사용되는 라이브러리)

2. Python 코드

3. 실행

참고

'개발_기타/Python'의 다른글

티스토리툴바