创新实训定期汇报5.14

周末时间比较充裕,项目取得了较大进展。下面介绍一下我们组近几天的进展:

1.确定采用scrapy为web抓取框架,共同学习通过xpath()和css()来获取html代码中的关键内容

2.继续沿用SDUOJ Python-Django框架,负责前端的成员认真学习研究Django模型和语法

3.完成数据库的基础设计

4.组内成员相互帮助,初步完成了HDU的题目抓取、数据库存取并在页面中显示

 

遇到的困难和解决方案:

1.scrapy的环境配置。配置scrapy的过程中遇到了很多很多的麻烦,一方面,一上来照着scrapy的官网的安装命令基本是会失败的,而且多数参考的博客都有些许问题,另一方面因为部分成员既有python2又有python3,在运行pip命令的过程中遇到了一些很迷的问题。

解决:多个版本python的pip命令的问题,通过使用命令python3 -m pip install xxxx来解决(我们的项目是使用python3的)。对于环境配置,参考博客http://www.cnblogs.com/wuxl360/p/5567065.html,注意一定要安装pywin32,而且要尤其注意python的版本和pywin32的版本要相同。

2.成员对python的语法还不够熟练。

3.数据库操作中要对字符串内的单引号进行转移,不然会导致操作失败。

4.Django学习http://www.runoob.com/django/django-model.html。SDUOJ的题目界面不能直接拿来用,因为其变量名与数据库直接相关,需要部分修改,去掉和原来数据库相关的代码,重新编写代码适应当前建立的测试用数据库。

 

Scrapy抓取题目

2017-05-14 21:36:53 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: crawl)
2017-05-14 21:36:53 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'crawl', 'NEWSPIDER_MODULE': 'crawl.spiders', 'SPIDER_MODULES': ['crawl.spiders']}
2017-05-14 21:36:53 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2017-05-14 21:36:54 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-05-14 21:36:54 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
<<<<<<<<<<<<<<<pipeline init>>>>>>>>>>>>>>>>>>>>
2017-05-14 21:36:54 [scrapy.middleware] INFO: Enabled item pipelines:
['crawl.pipelines.SolPipeline']
2017-05-14 21:36:54 [scrapy.core.engine] INFO: Spider opened
2017-05-14 21:36:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-05-14 21:36:54 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-05-14 21:36:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://acm.hdu.edu.cn/showproblem.php?pid=3456> (referer: None)
-------------------------------------------
desc : <div class="panel_content">In computer science, an oracle is something that gives you the answer to a particular question. For this problem, you need to write an oracle that gives the answer to everything. But it's not as bad as it sounds; you know that 42 is the answer to life, the universe, and everything.</div>
-------------------------------------------
>>>>>>>>>>>>>>>>>>>>>pipeline process
Database version : 5.7.14-log
>>>>>>>>>ProblemItem
processProblemItem
select * from problem  where originOj = 'hdu' and problemId = '3456'
---beautiful split two---
sql get!!!!!!!! : %s insert into problem values('hdu','3456','http://acm.hdu.edu.cn/showproblem.php?pid=3456','Universal Oracle','2000/1000 MS','32768/32768 K','In computer science, an oracle is something that gives you the answer to a particular question. For this problem, you need to write an oracle that gives the answer to everything. But it\'s not as bad as it sounds; you know that 42 is the answer to life, the universe, and everything.','The input consists of a single line of text with at most 1000 characters. This text will contain only well-formed English sentences. The only characters that will be found in the text are uppercase and lowercase letters, spaces, hyphens, apostrophes, commas, semicolons, periods, and question marks. Furthermore, each sentence begins with a single uppercase letter and ends with either a period or a question mark. Besides these locations, no other uppercase letters, periods, or question marks will appear in the sentence. Finally, every question (that is, a sentence that ends with a question mark) will begin with the phrase "What is..."','For each question, print the answer, which replaces the "What" at the beginning with "Forty-two" and the question mark at the end with a period. Each answer should reside on its own line. ','Let me ask you two questions. What is the answer to life? What is the answer to the universe?','Forty-two is the answer to life.
Forty-two is the answer to the universe.','2017-05-14 21:36:55')
2017-05-14 21:36:55 [scrapy.core.scraper] DEBUG: Scraped from <200 http://acm.hdu.edu.cn/showproblem.php?pid=3456>
{'desc': 'In computer science, an oracle is something that gives you the '
         'answer to a particular question. For this problem, you need to write '
         "an oracle that gives the answer to everything. But it\\'s not as bad "
         'as it sounds; you know that 42 is the answer to life, the universe, '
         'and everything.',
 'input': 'The input consists of a single line of text with at most 1000 '
          'characters. This text will contain only well-formed English '
          'sentences. The only characters that will be found in the text are '
          'uppercase and lowercase letters, spaces, hyphens, apostrophes, '
          'commas, semicolons, periods, and question marks. Furthermore, each '
          'sentence begins with a single uppercase letter and ends with either '
          'a period or a question mark. Besides these locations, no other '
          'uppercase letters, periods, or question marks will appear in the '
          'sentence. Finally, every question (that is, a sentence that ends '
          'with a question mark) will begin with the phrase "What is..."',
 'memoryLimit': '32768/32768 K',
 'originOj': 'hdu',
 'output': 'For each question, print the answer, which replaces the "What" at '
           'the beginning with "Forty-two" and the question mark at the end '
           'with a period. Each answer should reside on its own line. ',
 'problemId': '3456',
 'problemUrl': 'http://acm.hdu.edu.cn/showproblem.php?pid=3456',
 'sampleInput': 'Let me ask you two questions. What is the answer to life? '
                'What is the answer to the universe?',
 'sampleOutput': 'Forty-two is the answer to life.\r\n'
                 'Forty-two is the answer to the universe.',
 'timeLimit': '2000/1000 MS',
 'title': 'Universal Oracle',
 'updateTime': '2017-05-14 21:36:55'}
2017-05-14 21:36:55 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-14 21:36:55 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 236,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 3637,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 5, 14, 13, 36, 55, 418228),
 'item_scraped_count': 1,
 'log_count/DEBUG': 3,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2017, 5, 14, 13, 36, 54, 855729)}
2017-05-14 21:36:55 [scrapy.core.engine] INFO: Spider closed (finished)

  

 

通过Scrapy抓取并存在数据库中的题目

 

HDU5722在我们前端框架下的显示,可以看见数学符号仍然有问题

 

 

数据库设计方面,进行了基本的数据库ER图、UML图、数据描述的设计。

UML图设计初稿:

ER图设计初稿:

 

数据描述:

实体集

属性

类型及大小

允许空

USER

Account

varchar(15)

Key

varchar(20)

Nickame

varchar(20)

Y

Sex

varchar(2)

University

varchar(20)

Y

Userid

Integer

Blog

varchar(100)

Y

isAdmin

Boolean

desc

varchar(255)

Y

Submit

Integer

N

AC

Integer

N

Problem

ProId

Integer

originOj

varchar(10)

problemId

varchar(10)

problemUrl

varchar(100)

title

varchar(10)

desc

varchar(255)

timeLimit

varchar(10)

memoryLimit

varchar(10)

input

varchar(255)

output

varchar(255)

sampleInput

varchar(255)

sampleOutput

varchar(255)

updateTime

date

contest

 

ContestId

Integer

ContestName

varchar(40)

ContestPro

varchar(200)

ContestSTime

Date

ContestLTime

Integer

ContestAdmin

Integer

N

result

TestId

Integer

ProId

Integer

originOj

varchar(10)

problemId

varchar(10)

code

Varchar(255)

timec

Integer

memoryc

Integer

result

Varchar(20)

Time

Integer

UserId

integer

 

各成员博客:

 李忠利 http://blog.csdn.net/qq_26572969/article/details/72083094

 李绩成 http://www.jianshu.com/p/f59afedde9b6

 程轩昂http://blog.csdn.net/c_x_a/article/details/72084626

 沈松青 http://blog.csdn.net/ssq352906788/article/details/72055500

 王禹秋 http://blog.csdn.net/qq_32498805/article/details/72230614

posted @ 2017-05-14 22:17  SDU-VJ  阅读(158)  评论(0编辑  收藏  举报