Python爬虫urllib模块：post方式-创新互联

本程序以爬取 'http://httpbin.org/post' 为例

创新互联建站"三网合一"的企业建站思路。企业可建设拥有电脑版、微信版、手机版的企业网站。实现跨屏营销，产品发布一步更新，电脑网络+移动网络一网打尽，满足企业的营销需求！创新互联建站具备承接各种类型的成都网站建设、成都网站设计项目的能力。经过十多年的努力的开拓，为不同行业的企事业单位提供了优质的服务，并获得了客户的一致好评。

格式：

导入urllib.request

导入urllib.parse

数据编码处理，再设为utf-8编码: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')

打开爬取的网页: response = urllib.request.urlopen('网址', data = data)

读取网页代码: html = response.read()

打印:

1.不decode

print(html) #爬取的网页代码会不分行，没有空格显示，很难看

2.decode

print(html.decode()) #爬取的网页代码会分行，像写规范的代码一样，看起来很舒服

查询请求结果：

a. response.status # 返回 200：请求成功 404：网页找不到，请求失败

b. response.getcode() # 返回 200：请求成功 404：网页找不到，请求失败

1.不decode的程序如下：

import urllib.request
import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()

print(html)
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())

运行结果：

Python 爬虫 urllib模块：post方式

2.带decode的程序如下：

import urllib.request
import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()

print(html.decode())
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())

运行结果：

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "word": "hello"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Content-Length": "10", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.4"
  }, 
  "json": null, 
  "origin": "106.14.17.222", 
  "url": "http://httpbin.org/post"
}

------------------------------------------------------------------
------------------------------------------------------------------
200
200

为什么要用bytes转换？

因为

data = urllib.parse.urlencode({'word': 'hello'}) ##没有用bytes
response = urllib.request.urlopen('http://httpbin.org/post', data = data )
html = response.read()

错误提示：

Traceback (most recent call last):
  File "/usercode/file.py", line 15, in 
    response = urllib.request.urlopen('http://httpbin.org/post', data = data )
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 453, in open
    req = meth(req)
  File "/usr/lib/python3.4/urllib/request.py", line 1104, in do_request_
    raise TypeError(msg)
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

由此可见，post方式需要将请求内容用二进制编码。

class bytes([source[, encoding[, errors]]])

Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray– it has the same non-mutating methods and the same indexing and slicing behavior.

Accordingly, constructor arguments are interpreted as for bytearray().

另外有需要云服务器可以了解下创新互联scvps.cn，海内外云服务器15元起步，三天无理由+7*72小时售后在线，公司持有idc许可证，提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案，具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势，专为企业上云打造定制，能够满足用户丰富、多元化的应用场景需求。

网站栏目：Python爬虫urllib模块：post方式-创新互联
本文地址：http://jkwzsj.com/article/ihihs.html

Python爬虫urllib模块：post方式-创新互联

其他资讯