豆瓣爬虫日志(一):豆瓣电影url类型

豆瓣电影主页上爬到了如下url:

href=""http://movie.douban.com/subject/26289144/?from=showing""
href=""http://movie.douban.com/subject/26289144/?from=showing""
href=""http://movie.douban.com/subject/26289144/cinema/""
href=""http://movie.douban.com/subject/25723907/?from=showing""
href=""http://movie.douban.com/subject/25723907/?from=showing""
href=""http://movie.douban.com/subject/25723907/cinema/""
href=""http://movie.douban.com/subject/25778488/?from=showing""
href=""http://movie.douban.com/subject/25778488/?from=showing""
href=""http://movie.douban.com/subject/25778488/cinema/""
href=""http://movie.douban.com/subject/26356871/?from=showing""
href=""http://movie.douban.com/subject/26356871/?from=showing""
href=""http://movie.douban.com/subject/26356871/cinema/""
href=""http://movie.douban.com/subject/26373447/?from=showing""
href=""http://movie.douban.com/subject/26373447/?from=showing""
href=""http://movie.douban.com/subject/26373447/cinema/""
href=""http://movie.douban.com/subject/25904357/?from=showing""
href=""http://movie.douban.com/subject/25904357/?from=showing""
href=""http://movie.douban.com/subject/25904357/cinema/""
href=""http://movie.douban.com/subject/26344993/?from=showing""
href=""http://movie.douban.com/subject/26344993/?from=showing""
href=""http://movie.douban.com/subject/26344993/cinema/""
href=""http://movie.douban.com/subject/25853727/?from=showing""
href=""http://movie.douban.com/subject/25853727/?from=showing""
href=""http://movie.douban.com/subject/25853727/cinema/""
href=""http://movie.douban.com/subject/26365738/?from=showing""
href=""http://movie.douban.com/subject/26365738/?from=showing""
href=""http://movie.douban.com/subject/26365738/cinema/""
href=""http://movie.douban.com/subject/21943062/?from=showing""
href=""http://movie.douban.com/subject/21943062/?from=showing""
href=""http://movie.douban.com/subject/21943062/cinema/""
href=""http://movie.douban.com/subject/25895276/?from=showing""
href=""http://movie.douban.com/subject/25895276/?from=showing""
href=""http://movie.douban.com/subject/25895276/cinema/""
href=""http://movie.douban.com/subject/26417725/?from=showing""
href=""http://movie.douban.com/subject/26417725/?from=showing""
href=""http://movie.douban.com/subject/26417725/cinema/""
href=""http://movie.douban.com/subject/26382888/?from=showing""
href=""http://movie.douban.com/subject/26382888/?from=showing""
href=""http://movie.douban.com/subject/26382888/cinema/""
href=""http://movie.douban.com/subject/25837175/?from=showing""
href=""http://movie.douban.com/subject/25837175/?from=showing""
href=""http://movie.douban.com/subject/25837175/cinema/""
href=""http://movie.douban.com/subject/26289144/?from=reviews""
href=""http://movie.douban.com/subject/26289144/?from=reviews""
href=""http://movie.douban.com/subject/25778488/?from=reviews""
href=""http://movie.douban.com/subject/25778488/?from=reviews""
href=""http://movie.douban.com/subject/1295124/?from=reviews""
href=""http://movie.douban.com/subject/1295124/?from=reviews""
href=""http://movie.douban.com/subject/6845667/?from=reviews""
href=""http://movie.douban.com/subject/6845667/?from=reviews""
href=""http://movie.douban.com/subject/10533913/""
href=""http://movie.douban.com/subject/26289144/""
href=""http://movie.douban.com/subject/26147706/""
href=""http://movie.douban.com/subject/10773239/""
href=""http://movie.douban.com/subject/11624706/""
href=""http://movie.douban.com/subject/26378698/""
href=""http://movie.douban.com/subject/3338862/""
href=""http://movie.douban.com/subject/25777784/""
href=""http://movie.douban.com/subject/25881067/""
href=http://movie.douban.com/subject/25766663/
 

这些 url 后缀分为四类:

1,无后缀:电影信息页面

2,?from=reviews:同上

3,?from=showing:同上

4,cinema:选择影院页面(购票)

posted @ 2015-08-18 14:26  蒙面  阅读(545)  评论(0编辑  收藏  举报