php 正则匹配 - 慧姐

通过curl抓取到的数据，再用正则匹配出符合要求的数据
preg_match('/<ul[^>]*class="honor_2014"[^>]*>(.*?) <\/ul>/s',$file_contents,$matches);
匹配出类名为honor_2014 的ul里的全部内容
再循环分类，入库

匹配h1，要获取标题

/<h1>(?P<title>.*?)<\/h1>/is

匹配div 要闭合的，一定要头尾的表示唯一，否则范围就不准确，你获取内容的范围都还没有写，然后一边用*为贪婪模式，就是匹配到最后一个,*?为非贪婪模式，匹配第一关

1@2@3@4 你匹配 *@会显示1@2@3 ，*@？是会显示1

最简单/<div class="content">(?P<content>.*?)<div class="clear"><\/div>/is就可以了

S是非空数据

preg_match_all('/<div class="content">(?P<content>.*?)<div class="clear"><\/div>/is', $file_contents,$mat);

匹配a标签，获取其href
preg_match_all('/<a href="(?P<url>\S*?)" class="honor_img">/is',$file_contents,$matches);

获取关键词和描述
preg_match_all('/<meta name="keywords" content=(?P<keywords>.*?)>/is', $file_contents,$keywords);
preg_match_all('/<meta name="description" content=(?P<description>.*?)>/is', $file_contents,$description);

//将所有单引号替换为双引号，否则sql会报错
$content = preg_replace("/'/is", '"', $mat['content'][0]);

获取title
preg_match_all('/<h1>(?P<title>.*?)<\/h1>/is', $file_contents,$match);

获取闭包的div内嵌元素
preg_match_all('/<div class="content">(?P<content>.*?)<div class="clear"><\/div>/is', $file_contents,$mat);

.*？经常组合起来用，意思是匹配多个？不确定

发表于 2016-06-08 14:41 慧姐阅读(448) 评论(0) 收藏举报