您的位置：首页 > 运维架构 > Shell

curl 命令,curl监控网页shell脚本,curl多进程实现并控制进程数

2015-06-07 22:17 901 查看

cURL > Docs > Tutorial: http://curl.haxx.se/docs/httpscripting.html

下载单个文件，默认将输出打印到标准输出中(STDOUT)中，能够通过http、ftp等方式下载文件，也能够上传文件。其实curl远不止前面所说的那些功能，类似的工具还有wget。

curl命令使用了libcurl库来实现，libcurl库常用在C程序中用来处理HTTP请求，curlpp是libcurl的一个C++封装，这几个东西可以用在抓取网页、网络监控等方面的开发，而curl命令可以帮助来解决开发过程中遇到的问题。

1. 通过-o/-O选项保存下载的文件到指定的文件中：
-o：将文件保存为命令行中指定的文件名的文件中
-O：使用URL中默认的文件名保存文件到本地
# 将文件下载到本地并命名为mygettext.html
curl -o mygettext.html http://www.gnu.org/software/gettext/manual/gettext.html # 将文件保存到本地并命名为gettext.html
curl -O http://www.gnu.org/software/gettext/manual/gettext.html 同样可以使用转向字符">"对输出进行转向输出

2. 同时获取多个文件
curl -O URL1 -O URL2

若同时从同一站点下载多个文件时，curl会尝试重用链接(connection)。

3. 通过-L选项进行重定向
默认情况下CURL不会发送HTTP Location headers(重定向).当一个被请求页面移动到另一个站点时，会发送一个HTTP Loaction header作为请求，然后将请求重定向到新的地址上。
例如：访问google.com时，会自动将地址重定向到google.com.hk上。
curl http://www.google.com <HTML>
<HEAD>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE>
</HEAD>
<BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.com.hk/url?sa=p&hl=zh-CN&pref=hkredirect&pval=yes&q=http://www.google.com.hk/&ust=1379402837567135amp;usg=AFQjCNF3o7umf3jyJpNDPuF7KTibavE4aA">here</A>.
</BODY>
</HTML>

可以通过使用-L选项进行强制重定向
# 让curl使用地址重定向，此时会查询google.com.hk站点
curl -L http://www.google.com
4. 断点续传
通过使用-C选项可对大文件使用断点续传功能，如：
# 当文件在下载完成之前结束该进程
$ curl -O http://www.gnu.org/software/gettext/manual/gettext.html ############## 20.1%
# 通过添加-C选项继续对该文件进行下载，已经下载过的文件不会被重新下载
curl -C - -O http://www.gnu.org/software/gettext/manual/gettext.html ############### 21.1%

5. 对CURL使用网络限速
通过--limit-rate选项对CURL的最大网络使用进行限制
# 下载速度最大不会超过1000B/second
curl --limit-rate 1000B -O http://www.gnu.org/software/gettext/manual/gettext.html
6. -z下载指定时间内修改过的文件
当下载一个文件时，可对该文件的最后修改日期进行判断，如果该文件在指定日期内修改过，就进行下载，否则不下载。
# 若yy.html文件在2011/12/21之后有过更新才会进行下载
curl -z 21-Dec-11 http://www.example.com/yy.html
7. CURL授权
在访问需要授权的页面时，可通过-u选项提供用户名和密码进行授权
curl -u username:password URL
# 通常的做法是在命令行只输入用户名，之后会提示输入密码，这样可以保证在查看历史记录时不会将密码泄露
curl -u username URL

8. 从FTP服务器下载文件
CURL同样支持FTP下载，若在url中指定的是某个文件路径而非具体的某个要下载的文件名，CURL则会列出该目录下的所有文件名而并非下载该目录下的所有文件
# 列出public_html下的所有文件夹和文件
curl -u ftpuser:ftppass -O ftp://ftp_server/public_html/ # 下载xss.php文件
curl -u ftpuser:ftppass -O ftp://ftp_server/public_html/xss.php
9. 上传文件到FTP服务器
通过 -T 选项可将指定的本地文件上传到FTP服务器上

# 将myfile.txt文件上传到服务器
curl -u ftpuser:ftppass -T myfile.txt ftp://ftp.testserver.com
# 同时上传多个文件
curl -u ftpuser:ftppass -T "{file1,file2}" ftp://ftp.testserver.com
# 从标准输入获取内容保存到服务器指定的文件中
curl -u ftpuser:ftppass -T - ftp://ftp.testserver.com/myfile_1.txt
10. 使用 -v 和 -trace获取更多的链接信息

11. 通过字典查询单词
# 查询bash单词的含义
curl dict://dict.org/d:bash
# 列出所有可用词典
curl dict://dict.org/show:db
# 在foldoc词典中查询bash单词的含义
curl dict://dict.org/d:bash:foldoc

12. 为CURL设置代理
-x 选项可以为CURL添加代理功能
# 指定代理主机和端口
curl -x proxysever.test.com:3128 http://google.co.in
13. 保存与使用网站cookie信息
# 将网站的cookies信息保存到sugarcookies文件中
curl -D sugarcookies http://localhost/sugarcrm/index.php
# 使用上次保存的cookie信息
curl -b sugarcookies http://localhost/sugarcrm/index.php
14. 传递请求数据
默认curl使用GET方式请求数据，这种方式下直接通过URL传递数据
可以通过 --data/-d 方式指定使用POST方式传递数据
复制代码
# GET
curl -u username https://api.github.com/user?access_token=XXXXXXXXXX
# POST
curl -u username --data "param1=value1¶m2=value" https://api.github.com
# 也可以指定一个文件，将该文件中的内容当作数据传递给服务器端
curl --data @filename https://github.api.com/authorizations
注：默认情况下，通过POST方式传递过去的数据中若有特殊字符，首先需要将特殊字符转义在传递给服务器端，如value值中包含有空格，则需要先将空格转换成%20，如：

curl -d "value%201" http://hostname.com
在新版本的CURL中，提供了新的选项 --data-urlencode，通过该选项提供的参数会自动转义特殊字符。

curl --data-urlencode "value 1" http://hostname.com
除了使用GET和POST协议外，还可以通过 -X 选项指定其它协议，如：

curl -I -X DELETE https://api.github.cim
上传文件
curl --form "fileupload=@filename.txt" http://hostname/resource
常用参数：

常用参数
curl命令参数很多，这里只列出我曾经用过、特别是在shell脚本中用到过的那些。
-A:随意指定自己这次访问所宣称的自己的浏览器信息
-b/--cookie <name=string/file> cookie字符串或文件读取位置，使用option来把上次的cookie信息追加到http request里面去。
-c/--cookie-jar <file> 操作结束后把cookie写入到这个文件中
-C/--continue-at <offset> 断点续转
-d/--data <data> HTTP POST方式传送数据
-D/--dump-header <file> 把header信息写入到该文件中
-F/--form <name=content> 模拟http表单提交数据
-v/--verbose 小写的v参数，用于打印更多信息，包括发送的请求信息，这在调试脚本是特别有用。
-m/--max-time <seconds> 指定处理的最大时长
-H/--header <header> 指定请求头参数
-s/--slient 减少输出的信息，比如进度
--connect-timeout <seconds> 指定尝试连接的最大时长
-x/--proxy <proxyhost[:port]> 指定代理服务器地址和端口，端口默认为1080
-T/--upload-file <file> 指定上传文件路径
-o/--output <file> 指定输出文件名称
--retry <num> 指定重试次数
-e/--referer <URL> 指定引用地址
-I/--head 仅返回头部信息，使用HEAD请求
-u/--user <user[:password]>设置服务器的用户和密码
-O:按照服务器上的文件名，自动存在本地
-r/--range <range>检索来自HTTP/1.1或FTP服务器字节范围
-T/--upload-file <file> 上传文件

常用参数：
-a/--append 上传文件时，附加到目标文件
-A/--user-agent <string> 设置用户代理发送给服务器
- anyauth 可以使用“任何”身份验证方法
-b/--cookie <name=string/file> cookie字符串或文件读取位置
- basic 使用HTTP基本验证
-B/--use-ascii 使用ASCII /文本传输
-c/--cookie-jar <file> 操作结束后把cookie写入到这个文件中
-C/--continue-at <offset> 断点续转
-d/--data <data> HTTP POST方式传送数据
--data-ascii <data> 以ascii的方式post数据
--data-binary <data> 以二进制的方式post数据
--negotiate 使用HTTP身份验证
--digest 使用数字身份验证
--disable-eprt 禁止使用EPRT或LPRT
--disable-epsv 禁止使用EPSV
-D/--dump-header <file> 把header信息写入到该文件中
--egd-file <file> 为随机数据(SSL)设置EGD socket路径
--tcp-nodelay 使用TCP_NODELAY选项
-e/--referer 来源网址
-E/--cert <cert[:passwd]> 客户端证书文件和密码 (SSL)
--cert-type <type> 证书文件类型 (DER/PEM/ENG) (SSL)
--key <key> 私钥文件名 (SSL)
--key-type <type> 私钥文件类型 (DER/PEM/ENG) (SSL)
--pass <pass> 私钥密码 (SSL)
--engine <eng> 加密引擎使用 (SSL). "--engine list" for list
--cacert <file> CA证书 (SSL)
--capath <directory> CA目录 (made using c_rehash) to verify peer against (SSL)
--ciphers <list> SSL密码
--compressed 要求返回是压缩的形势 (using deflate or gzip)
--connect-timeout <seconds> 设置最大请求时间
--create-dirs 建立本地目录的目录层次结构
--crlf 上传是把LF转变成CRLF
-f/--fail 连接失败时不显示http错误
--ftp-create-dirs 如果远程目录不存在，创建远程目录
--ftp-method [multicwd/nocwd/singlecwd] 控制CWD的使用
--ftp-pasv 使用 PASV/EPSV 代替端口
--ftp-skip-pasv-ip 使用PASV的时候,忽略该IP地址
--ftp-ssl 尝试用 SSL/TLS 来进行ftp数据传输
--ftp-ssl-reqd 要求用 SSL/TLS 来进行ftp数据传输
-F/--form <name=content> 模拟http表单提交数据
-form-string <name=string> 模拟http表单提交数据
-g/--globoff 禁用网址序列和范围使用{}和[]
-G/--get 以get的方式来发送数据
-h/--help 帮助
-H/--header <line>自定义头信息传递给服务器
--ignore-content-length 忽略的HTTP头信息的长度
-i/--include 输出时包括protocol头信息
-I/--head 只显示文档信息
从文件中读取-j/--junk-session-cookies忽略会话Cookie
- 界面<interface>指定网络接口/地址使用
- krb4 <级别>启用与指定的安全级别krb4
-j/--junk-session-cookies 读取文件进忽略session cookie
--interface <interface> 使用指定网络接口/地址
--krb4 <level> 使用指定安全级别的krb4
-k/--insecure 允许不使用证书到SSL站点
-K/--config 指定的配置文件读取
-l/--list-only 列出ftp目录下的文件名称
--limit-rate <rate> 设置传输速度
--local-port<NUM> 强制使用本地端口号
-m/--max-time <seconds> 设置最大传输时间
--max-redirs <num> 设置最大读取的目录数
--max-filesize <bytes> 设置最大下载的文件总量
-M/--manual 显示全手动
-n/--netrc 从netrc文件中读取用户名和密码
--netrc-optional 使用 .netrc 或者 URL来覆盖-n
--ntlm 使用 HTTP NTLM 身份验证
-N/--no-buffer 禁用缓冲输出
-o/--output 把输出写到该文件中
-O/--remote-name 把输出写到该文件中，保留远程文件的文件名
-p/--proxytunnel 使用HTTP代理
--proxy-anyauth 选择任一代理身份验证方法
--proxy-basic 在代理上使用基本身份验证
--proxy-digest 在代理上使用数字身份验证
--proxy-ntlm 在代理上使用ntlm身份验证
-P/--ftp-port <address> 使用端口地址，而不是使用PASV
-Q/--quote <cmd>文件传输前，发送命令到服务器
-r/--range <range>检索来自HTTP/1.1或FTP服务器字节范围
--range-file 读取（SSL）的随机文件
-R/--remote-time 在本地生成文件时，保留远程文件时间
--retry <num> 传输出现问题时，重试的次数
--retry-delay <seconds> 传输出现问题时，设置重试间隔时间
--retry-max-time <seconds> 传输出现问题时，设置最大重试时间
-s/--silent静音模式。不输出任何东西
-S/--show-error 显示错误
--socks4 <host[:port]> 用socks4代理给定主机和端口
--socks5 <host[:port]> 用socks5代理给定主机和端口
--stderr <file>
-t/--telnet-option <OPT=val> Telnet选项设置
--trace <file> 对指定文件进行debug
--trace-ascii <file> Like --跟踪但没有hex输出
--trace-time 跟踪/详细输出时，添加时间戳
-T/--upload-file <file> 上传文件
--url <URL> Spet URL to work with
-u/--user <user[:password]>设置服务器的用户和密码
-U/--proxy-user <user[:password]>设置代理用户名和密码
-v/--verbose
-V/--version 显示版本信息
-w/--write-out [format]什么输出完成后
-x/--proxy <host[:port]>在给定的端口上使用HTTP代理
-X/--request <command>指定什么命令
-y/--speed-time 放弃限速所要的时间。默认为30
-Y/--speed-limit 停止传输速度的限制，速度时间'秒
-z/--time-cond 传送时间设置
-0/--http1.0 使用HTTP 1.0
-1/--tlsv1 使用TLSv1（SSL）
-2/--sslv2 使用SSLv2的（SSL）
-3/--sslv3 使用的SSLv3（SSL）
--3p-quote like -Q for the source URL for 3rd party transfer
--3p-url 使用url，进行第三方传送
--3p-user 使用用户名和密码，进行第三方传送
-4/--ipv4 使用IP4
-6/--ipv6 使用IP6
-#/--progress-bar 用进度条显示当前的传送状态

常用curl实例
2，用-O（大写的），后面的url要具体到某个文件，不然抓不下来。我们还可以用正则来抓取东西
[root@krlcgcms01 mytest]# curl -O http://www.codesky.net/wp-content/uploads/2010/09/compare_varnish.jpg [root@krlcgcms01 mytest]# curl -O http://www.codesky.net/wp-content/uploads/2010/[0-9][0-9]/aaaaa.jpg
3，模拟表单信息，模拟登录，保存cookie信息
[root@krlcgcms01 mytest]# curl -c ./cookie_c.txt -F log=aaaa -F pwd=****** http://www.codesky.net/wp-login.php
4，模拟表单信息，模拟登录，保存头信息
[root@krlcgcms01 mytest]# curl -D ./cookie_D.txt -F log=aaaa -F pwd=****** http://www.codesky.net/wp-login.php -c(小写)产生的cookie和-D里面的cookie是不一样的。

5，使用cookie文件
[root@krlcgcms01 mytest]# curl -b ./cookie_c.txt http://www.codesky.net/wp-admin
6，断点续传，-C(大写的)
[root@krlcgcms01 mytest]# curl -C -O http://www.codesky.net/wp-content/uploads/2010/09/compare_varnish.jpg
7，传送数据,最好用登录页面测试，因为你传值过去后，curl回抓数据，你可以看到你传值有没有成功
[root@krlcgcms01 mytest]# curl -d log=aaaa http://www.codesky.net/wp-login.php
8，显示抓取错误，下面这个例子，很清楚的表明了。
[root@krlcgcms01 mytest]# curl -f http://www.codesky.net/asdf curl: (22) The requested URL returned error: 404
[root@krlcgcms01 mytest]# curl http://www.codesky.net/asdf <HTML><HEAD><TITLE>404,not found</TITLE>

9，伪造来源地址，有的网站会判断，请求来源地址。
[root@krlcgcms01 mytest]# curl -e http://localhost http://www.codesky.net/wp-login.php

10，当我们经常用curl去搞人家东西的时候，人家会把你的IP给屏蔽掉的,这个时候,我们可以用代理
[root@krlcgcms01 mytest]# curl -x 24.10.28.84:32779 -o home.html http://www.codesky.net
11，比较大的东西，我们可以分段下载
[root@krlcgcms01 mytest]# curl -r 0-100 -o img.part1 http://www.codesky.net/wp- content/uploads/2010/09/compare_varnish.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 101 100 101 0 0 105 0 --:--:-- --:--:-- --:--:-- 0
[root@krlcgcms01 mytest]# curl -r 100-200 -o img.part2 http://www.codesky.net/wp- content/uploads/2010/09/compare_varnish.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 101 100 101 0 0 57 0 0:00:01 0:00:01 --:--:-- 0
[root@krlcgcms01 mytest]# curl -r 200- -o img.part3 http://www.codesky.net/wp- content/uploads/2010/09/compare_varnish.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 104k 100 104k 0 0 52793 0 0:00:02 0:00:02 --:--:-- 88961
[root@krlcgcms01 mytest]# ls |grep part | xargs du -sh
4.0K one.part1
112K three.part3
4.0K two.part2
用的时候，把他们cat一下就OK了,cat img.part* >img.jpg

12，不会显示下载进度信息
[root@krlcgcms01 mytest]# curl -s -o aaa.jpg http://www.codesky.net/wp-content/uploads/2010/09/compare_varnish.jpg
13，显示下载进度条
[root@krlcgcms01 mytest]# curl -# -O http://www.codesky.net/wp-content/uploads/2010/09/compare_varnish.jpg ######################################################################## 100.0%

14,通过ftp下载文件
[zhangy@BlackGhost ~]$ curl -u 用户名:密码 -O http://www.codesky.net/demo/curtain/bbstudy_files/style.css % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
101 1934 101 1934 0 0 3184 0 --:--:-- --:--:-- --:--:-- 7136
或者用下面的方式
[zhangy@BlackGhost ~]$ curl -O ftp://用户名:密码@ip:port/demo/curtain/bbstudy_files/style.css
15，通过ftp上传
[zhangy@BlackGhost ~]$ curl -T test.sql ftp://用户名:密码@ip:port/demo/curtain/
curl命令详解
1，抓取页面内容到一个文件中
　　[root@xi mytest]# curl -o home.html http://www.baidu.com --将百度首页内容抓下到home.html中
　　 [root@xi mytest]#curl -o #2_#1.jpghttp://cgi2.tky.3web.ne.jp/~{A,B}/[001-201].JPG
由于A/B下的文件名都是001，002...，201，下载下来的文件重名，这样，自定义出来下载下来的文件名，就变成了这样：原来： A/001.JPG —-> 下载后： 001-A.JPG 原来： B/001.JPG ---> 下载后： 001-B.JPG

2，用-O（大写的），后面的url要具体到某个文件，不然抓不下来。还可以用正则来抓取东西
　　[root@xi mytest]# curl -O http://www.baidu.com/img/bdlogo.gif 运行结果如下：
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1575 100 1575 0 0 14940 0 --:--:-- --:--:-- --:--:-- 1538k
会在当前执行目录中生成一张bdlogo.gif的图片。
　　[root@xi mytest]# curl -O http://XXXXX/screen[1-10].JPG --下载screen1.jpg~screen10.jpg

3，模拟表单信息，模拟登录，保存cookie信息
　　[root@xi mytest]# curl -c ./cookie_c.txt -F log=aaaa -F pwd=******http://www.XXXX.com/wp-login.php

4，模拟表单信息，模拟登录，保存头信息
　　[root@xi mytest]# curl -D ./cookie_D.txt -F log=aaaa -F pwd=******http://www.XXXX.com/wp-login.php
　　-c(小写)产生的cookie和-D里面的cookie是不一样的。

5，使用cookie文件
　　[root@xi mytest]# curl -b ./cookie_c.txt http://www.XXXX.com/wp-admin
6，断点续传，-C(大写)
　　[root@xi mytest]# curl -C -O http://www.baidu.com/img/bdlogo.gif
7，传送数据,最好用登录页面测试，因为你传值过去后，curl回抓数据，你可以看到你传值有没有成功
　　[root@xi mytest]# curl -d log=aaaa http://www.XXXX.com/wp-login.php
8，显示抓取错误，下面这个例子，很清楚的表明了。
　　[root@xi mytest]# curl -fhttp://www.XXXX.com/asdf
　　curl: (22) The requested URL returned error: 404
　　[root@xi mytest]# curlhttp://www.XXXX.com/asdf
　　<HTML><HEAD><TITLE>404,not found</TITLE>

9，伪造来源地址，有的网站会判断，请求来源地址，防止盗链。
　　[root@xi mytest]# curl -ehttp://localhosthttp://www.XXXX.com/wp-login.php

10，当我们经常用curl去搞人家东西的时候，人家会把你的IP给屏蔽掉的,这个时候,我们可以用代理
　　[root@xi mytest]# curl -x 24.10.28.84:32779 -o home.htmlhttp://www.XXXX.com

11，比较大的东西，我们可以分段下载
　　[root@xi mytest]# curl -r 0-100 -o img.part1http://www.XXXX.com/wp-content/uploads/2010/09/compare_varnish.jpg
　　% Total % Received % Xferd Average Speed Time Time Time Current
　　Dload Upload Total Spent Left Speed
　　100 101 100 101 0 0 105 0 --:--:-- --:--:-- --:--:-- 0
　　[root@xi mytest]# curl -r 100-200 -o img.part2http://www.XXXX.com/wp-ontent/uploads/2010/09/compare_varnish.jpg
　　% Total % Received % Xferd Average Speed Time Time Time Current
　　Dload Upload Total Spent Left Speed
　　100 101 100 101 0 0 57 0 0:00:01 0:00:01 --:--:-- 0
　　[root@xi mytest]# curl -r 200- -o img.part3http://www.XXXX.com/wp-content/uploads/2010/09/compare_varnish.jpg
　　% Total % Received % Xferd Average Speed Time Time Time Current
　　Dload Upload Total Spent Left Speed
　　100 104k 100 104k 0 0 52793 0 0:00:02 0:00:02 --:--:-- 88961
　　[root@xi mytest]# ls |grep part | xargs du -sh
　　4.0K one.part1
　　112K three.part3
　　4.0K two.part2
　　用的时候，把他们cat一下就OK,cat img.part* >img.jpg

12，不会显示下载进度信息
　　[root@xi mytest]# curl -s -o aaa.jpg http://www.baidu.com/img/bdlogo.gif
13，显示下载进度条
　　[root@xi mytest]# curl -0 http://www.baidu.com/img/bdlogo.gif (以http1.0协议请求)

####################################################################### 100.0%
14,通过ftp下载文件
　　[xifj@Xi ~]$ curl -u用户名:密码 -Ohttp://www.XXXX.com/demo/curtain/bbstudy_files/style.css
　　% Total % Received % Xferd Average Speed Time Time Time Current
　　Dload Upload Total Spent Left Speed
　　101 1934 101 1934 0 0 3184 0 --:--:-- --:--:-- --:--:-- 7136
　　[xifj@Xi ~]$ curl -u 用户名:密码 -O http://www.XXXX.com/demo/curtain/bbstudy_files/style.css 　　% Total % Received % Xferd Average Speed Time Time Time Current
　　Dload Upload Total Spent Left Speed
　　101 1934 101 1934 0 0 3184 0 --:--:-- --:--:-- --:--:-- 7136
　　或者用下面的方式
　　[xifj@Xi ~]$ curl -O ftp://用户名:密码@ip:port/demo/curtain/bbstudy_files/style.css 　　[xifj@Xi ~]$ curl -O ftp://用户名:密码@ip:port/demo/curtain/bbstudy_files/style.css
15，通过ftp上传
　　[xifj@Xi ~]$ curl -T test.sql ftp://用户名:密码@ip:port/demo/curtain/bbstudy_files/ 　　[xifj@Xi ~]$ curl -T test.sql ftp://用户名:密码@ip:port/demo/curtain/bbstudy_files/
15,模拟浏览器头
　　[xifj@Xi ~]$ curl -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -x 123.45.67.89:1080 -o page.html -D cookie0001.txthttp://www.www.baidu.com

16,PUT、GET、POST
比如 curl -T localfile http://cgi2.tky.3web.ne.jp/~zz/abc.cgi，这时候，使用的协议是HTTP的PUT method
刚才说到PUT，自然想起来了其他几种methos－－GET和POST。
http提交一个表单，比较常用的是POST模式和GET模式
GET模式什么option都不用，只需要把变量写在url里面就可以了
比如：
curl http://www.yahoo.com/login.cgi?user=nick&password=12345 而POST模式的option则是 -d
比如，curl -d "user=nick&password=12345" http://www.yahoo.com/login.cgi 就相当于向这个站点发出一次登陆申请~~~~~
到底该用GET模式还是POST模式，要看对面服务器的程序设定。
一点需要注意的是，POST模式下的文件上的文件上传，比如
<form method="POST" enctype="multipar/form-data" action="http://cgi2.tky.3web.ne.jp/~zz/up_file.cgi">
<input type=file name=upload>
<input type=submit name=nick value="go">
</form>
这样一个HTTP表单，我们要用curl进行模拟，就该是这样的语法：
curl -F upload=@localfile -F nick=go http://cgi2.tky.3web.ne.jp/~zz/up_file.cgi

curl网站开发指南
curl不仅是一个编程用的函数库，这个命令本身，就是一个无比有用的网站开发工具。
curl是一种命令行工具，作用是发出网络请求，然后得到和提取数据，显示在"标准输出"（stdout）上面。
它支持多种协议，下面举例讲解如何将它用于网站开发。

一、查看网页源码
直接在curl命令后加上网址，就可以看到网页源码。我们以网址www.sina.com为例（选择该网址，主要因为它的网页代码较短）：
　　curl www.sina.com
　　<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
　　<html><head>
　　<title>301 Moved Permanently</title>
　　</head><body>
　　<h1>Moved Permanently</h1>
　　<p>The document has moved <a href="http://www.sina.com.cn/">here</a>.</p>
　　</body></html>
如果要把这个网页保存下来，可以使用-o参数，这就相当于使用wget命令了。
　　curl -o [文件名] www.sina.com

二、自动跳转
有的网址是自动跳转的。使用-L参数，curl就会跳转到新的网址。
　　curl -L www.sina.com
键入上面的命令，结果就自动跳转为www.sina.com.cn。

三、显示头信息
-i参数可以显示http response的头信息，连同网页代码一起。
　　curl -i www.sina.com

　　HTTP/1.0 301 Moved Permanently
　　Date: Sat, 03 Sep 2011 23:44:10 GMT
　　Server: Apache/2.0.54 (Unix)
　　Location: http://www.sina.com.cn/ 　　Cache-Control: max-age=3600
　　Expires: Sun, 04 Sep 2011 00:44:10 GMT
　　Vary: Accept-Encoding
　　Content-Length: 231
　　Content-Type: text/html; charset=iso-8859-1
　　Age: 3239
　　X-Cache: HIT from sh201-9.sina.com.cn
　　Connection: close

　　<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
　　<html><head>
　　<title>301 Moved Permanently</title>
　　</head><body>
　　<h1>Moved Permanently</h1>
　　<p>The document has moved <a href="http://www.sina.com.cn/">here</a>.</p>
　　</body></html>
-I参数则是只显示http response的头信息。

四、显示通信过程
-v参数可以显示一次http通信的整个过程，包括端口连接和http request头信息。
　　curl -v www.sina.com

　　* About to connect() to www.sina.com port 80 (#0)
　　* Trying 61.172.201.195... connected
　　* Connected to www.sina.com (61.172.201.195) port 80 (#0)
　　> GET / HTTP/1.1
　　> User-Agent: curl/7.21.3 (i686-pc-linux-gnu) libcurl/7.21.3 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
　　> Host: www.sina.com
　　> Accept: */*
　　>
　　* HTTP 1.0, assume close after body
　　< HTTP/1.0 301 Moved Permanently
　　< Date: Sun, 04 Sep 2011 00:42:39 GMT
　　< Server: Apache/2.0.54 (Unix)
　　< Location: http://www.sina.com.cn/ 　　< Cache-Control: max-age=3600
　　< Expires: Sun, 04 Sep 2011 01:42:39 GMT
　　< Vary: Accept-Encoding
　　< Content-Length: 231
　　< Content-Type: text/html; charset=iso-8859-1
　　< X-Cache: MISS from sh201-19.sina.com.cn
　　< Connection: close
　　<
　　<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
　　<html><head>
　　<title>301 Moved Permanently</title>
　　</head><body>
　　<h1>Moved Permanently</h1>
　　<p>The document has moved <a href="http://www.sina.com.cn/">here</a>.</p>
　　</body></html>
　　* Closing connection #0

如果你觉得上面的信息还不够，那么下面的命令可以查看更详细的通信过程。
　　curl --trace output.txt www.sina.com
或者
　　curl --trace-ascii output.txt www.sina.com
运行后，请打开output.txt文件查看。

五、发送表单信息
发送表单信息有GET和POST两种方法。GET方法相对简单，只要把数据附在网址后面就行。
　　curl example.com/form.cgi?data=xxx
POST方法必须把数据和网址分开，curl就要用到--data参数。
　　curl --data "data=xxx" example.com/form.cgi
如果你的数据没有经过表单编码，还可以让curl为你编码，参数是--data-urlencode。
　　curl --data-urlencode "date=April 1" example.com/form.cgi

六、HTTP动词
curl默认的HTTP动词是GET，使用-X参数可以支持其他动词。
　　curl -X POST www.example.com
　　curl -X DELETE www.example.com

七、文件上传
假定文件上传的表单是下面这样：
　　<form method="POST" enctype='multipart/form-data' action="upload.cgi">
　　　　<input type=file name=upload>
　　　　<input type=submit name=press value="OK">
　　</form>
你可以用curl这样上传文件：
　　curl --form upload=@localfilename --form press=OK http://example.com 
十二、HTTP认证
有些网域需要HTTP认证，这时curl需要用到--user参数。
　　curl --user name:password example.com

2015-05-11 22:51 1253人阅读评论" target=_blank>

八、Referer字段
有时你需要在http request头信息中，提供一个referer字段，表示你是从哪里跳转过来的。
　　curl --referer http://www.example.com http://www.example.com

九、User Agent字段
这个字段是用来表示客户端的设备信息。服务器有时会根据这个字段，针对不同设备，返回不同格式的网页，比如手机版和桌面版。
iPhone4的User Agent是
　　Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7
curl可以这样模拟：
　　curl --user-agent "[User Agent]" [URL]

十、cookie
使用--cookie参数，可以让curl发送cookie。
　　curl --cookie "name=xxx" www.example.com
至于具体的cookie的值，可以从http response头信息的Set-Cookie字段中得到。

十一、增加头信息
有时需要在http request之中，自行增加一个头信息。--header参数就可以起到这个作用。
　　curl --header "Content-Type:application/json" http://example.com
十二、HTTP认证
有些网域需要HTTP认证，这时curl需要用到--user参数。
　　curl --user name:password example.com[/code]

2015-05-11 22:51 1253人阅读 [url=http://www.cnblogs.com/stevendes1/p/7205951.html#comments]评论(0) 收藏 [url=http://www.cnblogs.com/stevendes1/p/7205951.html#report]举报

curl是一种命令行工具，作用是发出网络请求，然后得到和提取数据，显示在"标准输出"（stdout）上面。

它支持多种协议，下面举例讲解如何将它用于网站开发。

一、查看网页源码

直接在curl命令后加上网址，就可以看到网页源码。我们以网址www.sina.com为例（选择该网址，主要因为它的网页代码较短）：

　　curl www.sina.com

　　<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
　　<html><head>
　　<title>301 Moved Permanently</title>
　　</head><body>
　　<h1>Moved Permanently</h1>
　　<p>The document has moved <a href="http://www.sina.com.cn/">here</a>.</p>
　　</body></html>

如果要把这个网页保存下来，可以使用-o参数，这就相当于使用wget命令了。

　　curl -o [文件名] www.sina.com

二、自动跳转

有的网址是自动跳转的。使用-L参数，curl就会跳转到新的网址。

　　curl -L www.sina.com

键入上面的命令，结果就自动跳转为www.sina.com.cn。

三、显示头信息

-i参数可以显示http response的头信息，连同网页代码一起。

　　curl -i www.sina.com

　　HTTP/1.0 301 Moved Permanently
　　Date: Sat, 03 Sep 2011 23:44:10 GMT
　　Server: Apache/2.0.54 (Unix)
　　Location: http://www.sina.com.cn/ 　　Cache-Control: max-age=3600
　　Expires: Sun, 04 Sep 2011 00:44:10 GMT
　　Vary: Accept-Encoding
　　Content-Length: 231
　　Content-Type: text/html; charset=iso-8859-1
　　Age: 3239
　　X-Cache: HIT from sh201-9.sina.com.cn
　　Connection: close

　　<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
　　<html><head>
　　<title>301 Moved Permanently</title>
　　</head><body>
　　<h1>Moved Permanently</h1>
　　<p>The document has moved <a href="http://www.sina.com.cn/">here</a>.</p>
　　</body></html>

-I参数则是只显示http response的头信息。

四、显示通信过程

-v参数可以显示一次http通信的整个过程，包括端口连接和http request头信息。

　　curl -v www.sina.com

　　* About to connect() to www.sina.com port 80 (#0)
　　* Trying 61.172.201.195... connected
　　* Connected to www.sina.com (61.172.201.195) port 80 (#0)
　　> GET / HTTP/1.1
　　> User-Agent: curl/7.21.3 (i686-pc-linux-gnu) libcurl/7.21.3 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
　　> Host: www.sina.com
　　> Accept: */*
　　>
　　* HTTP 1.0, assume close after body
　　< HTTP/1.0 301 Moved Permanently
　　< Date: Sun, 04 Sep 2011 00:42:39 GMT
　　< Server: Apache/2.0.54 (Unix)
　　< Location: http://www.sina.com.cn/ 　　< Cache-Control: max-age=3600
　　< Expires: Sun, 04 Sep 2011 01:42:39 GMT
　　< Vary: Accept-Encoding
　　< Content-Length: 231
　　< Content-Type: text/html; charset=iso-8859-1
　　< X-Cache: MISS from sh201-19.sina.com.cn
　　< Connection: close
　　<
　　<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
　　<html><head>
　　<title>301 Moved Permanently</title>
　　</head><body>
　　<h1>Moved Permanently</h1>
　　<p>The document has moved <a href="http://www.sina.com.cn/">here</a>.</p>
　　</body></html>
　　* Closing connection #0

如果你觉得上面的信息还不够，那么下面的命令可以查看更详细的通信过程。

　　curl --trace output.txt www.sina.com

或者

　　curl --trace-ascii output.txt www.sina.com

运行后，请打开output.txt文件查看。

五、发送表单信息

发送表单信息有GET和POST两种方法。GET方法相对简单，只要把数据附在网址后面就行。

　　curl example.com/form.cgi?data=xxx

POST方法必须把数据和网址分开，curl就要用到--data参数。

　　curl --data "data=xxx" example.com/form.cgi

如果你的数据没有经过表单编码，还可以让curl为你编码，参数是--data-urlencode。

　　curl --data-urlencode "date=April 1" example.com/form.cgi

六、HTTP动词

curl默认的HTTP动词是GET，使用-X参数可以支持其他动词。

　　curl -X POST www.example.com

　　curl -X DELETE www.example.com

七、文件上传

假定文件上传的表单是下面这样：

你可以用curl这样上传文件：

　　curl --form upload=@localfilename --form press=OK

十、cookie

使用--cookie参数，可以让curl发送cookie。

　　curl --cookie "name=xxx" www.example.com

至于具体的cookie的值，可以从http response头信息的Set-Cookie字段中得到。

十一、增加头信息

有时需要在http request之中，自行增加一个头信息。--header参数就可以起到这个作用。

　　curl --header "Content-Type:application/json" http://example.com

十二、HTTP认证

有些网域需要HTTP认证，这时curl需要用到--user参数。

　　curl --user name:password example.com

【参考资料】

　　* Using cURL to automate HTTP jobs" target=_blank>

[/quote]
八、Referer字段

有时你需要在http request头信息中，提供一个referer字段，表示你是从哪里跳转过来的。

　　curl --referer http://www.example.com http://www.example.com

九、User Agent字段

这个字段是用来表示客户端的设备信息。服务器有时会根据这个字段，针对不同设备，返回不同格式的网页，比如手机版和桌面版。

iPhone4的User Agent是

　　Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7

curl可以这样模拟：

　　curl --user-agent "[User Agent]" [URL]

十、cookie

使用--cookie参数，可以让curl发送cookie。

　　curl --cookie "name=xxx" www.example.com

　　curl --header "Content-Type:application/json" http://example.com

十二、HTTP认证

有些网域需要HTTP认证，这时curl需要用到--user参数。

　　curl --user name:password example.com

【参考资料】

　　* [url=http://curl.haxx.se/docs/httpscripting.html]Using cURL to automate HTTP jobs

　　* [url=http://bbs.et8.net/bbs/showthread.php?t=568472]教你学用CURL

　　* 9 uses for cURL worth knowing

（完）

linux下利用curl监控网页shell脚本

#!/bin/bash

smail() {
mail -s "$1" gjw_apparitor@gmail.com <<EOF
$1
$2
====
report time: `date +"%F %T"`
current user: `whoami`
shell script: `echo $0`
====
EOF
}

ssms() {
/usr/local/feixin/fetion --mobile=150000000 --pwd=******** --to=13810000000 --msg-gb="fx $1"
}

cd /home/maintain/gaojianwei/Script/
File=Monitor_IP.txt
sed -i /.*/d Curl_Out.txt
sed -i /.*/d Curl_Out_1.txt

sed -e '/^#/d;/^$/d' ${File} | while read Ip Port URL
do
/usr/bin/curl --connect-timeout 8 --max-time 12 -o /dev/null -s -w %{time_total}:%{size_download}:%{http_code} http://${URL} -x ${Ip}:${Port} >> Curl_Out.txt
echo ":${Ip}:${URL}" >> Curl_Out.txt
done

awk -F":" '{if(($1*1000<8000)&&($2>0)&&($3=="200"||$3=="301"||$30=="302"||$3=="401")) {} else {print $0 >> "Curl_Out_1.txt"}}' Curl_Out.txt

if [ -s Curl_Out_1.txt ];then
Warning="`awk '{printf("%s#",$0)}' Curl_Out_1.txt`"
ssms ${Warning}
smail CURL_Monitor ${Warning}
fi

备注：

curl是一个命令行下的http下载工具，类似wget。与wget相似，它也可以通过发送指定的http header到服务器来判断服务的状态。
这里介绍一个使用curl监控页面可用性的方法。
可以使用下面的命令，来采集页面的状态码。如果这条命令返回结果为200，说明服务正常。如果返回的是其他的页面，说明异常。
curl -o /dev/null -s -w %{http_code} http://zys.8800.org/ -o 参数，是把下载的所有内容都重定向到/dev/null，-s命令，是屏蔽了curl本身的输出，而-w参数，是根据我们自己的需要，自定义了curl的输出格式。
使用这条命令，再配合邮件和短信，就可以实现对页面的可用性监控。将这个程序部署在全国各地的机器上，就可以对cdn网络进行可用性监控。

curl只返回服务器响应状态，不返回内容，返回200是正常的，其它的不正常，简单的命令如下：

[coomix@localhost ~]$ echo `curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "http://www.coomix.net/index.jsp"`
200
[coomix@localhost ~]$ echo `curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "http://www.coomix.net/index5.jsp"`
404

====================================================

监控机器列表文件：

server.list

server1

server2

server3

建立监控脚本： webstatus.sh

#!/bin/sh
monitor_dir=/home/admin/monitor/
if [ ! -d $monitor_dir ]; then
mkdir $monitor_dir
fi
cd $monitor_dir
web_stat_log=web.status
if [ ! -f $web_stat_log ]; then
touch $web_stat_log
fi
server_list_file=server.list
if [ ! -f $server_list_file ]; then
echo "`date '+%Y-%m-%d %H:%M:%S'` ERROR:$server_list_file NOT exists!" >>$web_stat_log
exit 1
fi
#total=`wc -l $server_list_file|awk '{print $1}'`
for website in `cat $server_list_file`
do
url="http://$website/app.htm"
server_status_code=`curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "$url"`
if [ "$server_status_code" = "200" ]; then
echo "`date '+%Y-%m-%d %H:%M:%S'` visit $website status code 200 OK" >>$web_stat_log
else
echo "`date '+%Y-%m-%d %H:%M:%S'` visit $website error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ..." >>$web_stat_log
echo "!app alarm @136xxxxxxxx server:$website can't connect at 10s or stop response at 10s ..." | nc smsserver port &
fi
done
exit 0

主要是利用 curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "$url" 返回状态码是否200,如果10s没有返回200状态码，则发警报

最后让linux 定时执行脚本：

crontab -e

*/10 * * * * /home/admin/app/bin/webstatus.sh

这样每隔10分钟就会执行一次

这个是另外一种脚本写法：

#!/bin/bash

while read URL
do
echo `date`
result=`curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} $URL`
test=`echo $result`
if [[ "$test" = "200" ]]
then
echo "$URL is ok"
else
echo "test err"
/usr/sbin/sendmail -t << EOF
From:SD-Detect
To:13918888888@139.com,13800000000@139.com
Subject:Detected $URL
------------------------------
${URL} is err!!
------------------------------
EOF
fi
done < /root/jiankong/httplist.txt

#!/bin/sh
mkdir -p ./cookie_files
#for((i=999106;i<999888;i++));
for((i=1;i<777;i++));
do
curl -v\
-L "https://www.sogou.com/web?query=%E6%9C%AC%E5%B8%AE%E8%8F%9C+site:baike.sogou.com&sugsuv=1500290804133876&sut=4799&sugtime=1500299663368&s_from=result_up&lkt=2,1500299659783,1500299659927&sst0=1500299663368&cid=&page=2&ie=utf8&p=40040100&dp=1&w=01029901&dr=1" \
--user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36"  \
--header "Host: www.sogou.com" \
--header "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \
--header "Accept-Encoding: gzip, deflate, sdch" \
--header "Accept-Language: en-US,en;q=0.8,zh;q=0.6,zh-CN;q=0.4" \
--header "Connection:keep-alive" \
--header "Upgrade-Insecure-Requests:1" \
-D ./cookie_files/sogou.cookie.$i.txt \
-c cookie.cache.$i \
-b cookie.cache.$((i-1));
#--header "Cache-Control:max-age=0" \
#--header "Cookie:ABTEST=3|1500301568|v17; IPLOC=CN3100; SUID=3A5916D25218910A00000000596CC901; PHPSESSID=ruv8uiahtr3ri7euel72uqhqu1; SUIR=1500301569; SUV=00041792D216593A596CC901B5DA0452; SNUID=C8ABE521F2F7A676DCE85D0EF36F3036; sst0=368; ld=6Zllllllll2BbG6xlllllVO7hQ7lllll1PBBnkllllolllll9ylll5@@@@@@@@@@; browerV=3; osV=3; LSTMV=787%2C123; LCLKINT=293603"
#sleep 20;
done

#-c, write all cookies after a completed operation
#-b, read cookie file.
#-L,重定向

多进程实现并控制进程数

#!/bin/bash
#允许的进程数
THREAD_NUM=200
#定义描述符为9的管道
mkfifo tmp
exec 9<>tmp
#预先写入指定数量的换行符，一个换行符代表一个进程
for ((i=0;i<$THREAD_NUM;i++))
do
echo -ne "\n" 1>&9
done

#if [ $# != 1 ] ;then
#        echo "The parameters you enter is not correct !";
#        exit -1;
#fi

for((i=0;i<10000;i++))
do
{
#进程控制
read -u 9
{
#isok=`curl  -o $i".html" -s -w %{http_code} $line`
if [ "$isok" = "200" ]; then
echo $line "OK"
else
echo $line "no"
fi
echo -ne "\n" 1>&9
}&
}
done #< $1
wait
echo "执行结束"
rm tmp

上面的代码就可以保证子进程在指定数量了，其进程控制原理是通过管道实现的，当管道无内容可读时就不会执行

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航