python - unable to scrape text -
i trying title of websites. so, used snippet this
sys.stdout = open("test_data.txt", "w") url2 = "https://www.google.com/" headers = { 'user-agent': 'mozilla/5.0 (macintosh; intel mac os x 10_9_3) applewebkit/537.75.14 (khtml, gecko) version/7.0.3 safari/7046a194a'} req = urllib2.request(url2, none, headers) req.add_header('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8') html = urllib2.urlopen(req, timeout=60).read() soup = beautifulsoup(html) # extract title list1 = soup.title.string print list1.encode('utf-8')
this works , gives google title , flushes output test_data.txt.
but when try run same code web service, doesn't work. empty text file. hitting url run web service on local http://0.0.0.0:8881/get_title
from bottle import route, run, request @route('/get_title') def get_title(): sys.stdout = open("test_data.txt", "w") url2 = "https://www.google.com/" headers = { 'user-agent': 'mozilla/5.0 (macintosh; intel mac os x 10_9_3) applewebkit/537.75.14 (khtml, gecko) version/7.0.3 safari/7046a194a'} req = urllib2.request(url2, none, headers) req.add_header('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8') html = urllib2.urlopen(req, timeout=60).read() soup = beautifulsoup(html) # extract title list1 = soup.title.string print list1.encode('utf-8') if __name__ == "__main__": run(host='0.0.0.0', port=8881, debug=true)
another thing has made me more anxious when run web service msn.com, works both snippets(even web service).
any thankful !!
is flask? if so, need return
string want send user. print
statement writes web server log. should replace last line of get_title
function this:
return list1.encode('utf-8')
Comments
Post a Comment