io : str or file-like
A URL, a file-like object, or a raw string containing HTML. Note that lxml only accepts the http, ftp and file url protocols. If you have a URL that starts with 'https' you might try removing the 's' .
接收网址、文件、字符串。网址不接受https,尝试去掉s后爬去
match : str or compiled regular expression, optional
The set of tables containing text matching this regex or string will be returned. Unless the HTML is extremely simple you will probably need to pass a non-empty string here. Defaults to ‘.+’ (match any non-empty string). The default value will return all tables contained on a page. This value is converted to a regular expression so that there is consistent behavior between Beautiful Soup and lxml.
正则表达式,返回与正则表达式匹配的表格。
flavor : str or None, container of strings
The parsing engine to use. ‘bs4’ and ‘html5lib’ are synonymous with each other, they are both there for backwards compatibility. The default of None tries to use lxml to parse and if that fails it falls back on bs4 + html5lib .
解析器默认为‘lxml’
header : int or list-like or None, optional
The row (or list of rows for a MultiIndex ) to use to make the columns headers.
指定列标题所在的行,list为多重索引
index_col : int or list-like or None, optional
The column (or list of columns) to use to create the index.
指定行标题对应的列,list为多重索引
skiprows : int or list-like or slice or None, optional
0-based. Number of rows to skip after parsing the column integer. If a sequence of integers or a slice is given, will skip the rows indexed by that sequence. Note that a single element sequence means ‘skip the nth row’ whereas an integer means ‘skip n rows’.
跳过第n行(序列标示)或跳过n行(整数标示)
attrs : dict or None, optional
This is a dictionary of attributes that you can pass to use to identify the table in the HTML. These are not checked for validity before being passed to lxml or Beautiful Soup. However, these attributes must be valid HTML table attributes to work correctly. For example,
is a valid attribute dictionary because the ‘id’ HTML tag attribute is a valid HTML attribute for any HTML tag as per this document.
attrs = {'asdf': 'table'}
is not a valid attribute dictionary because ‘asdf’ is not a valid HTML attribute even if it is a valid XML attribute. Valid HTML 4.01 table attributes can be found here. A working draft of the HTML 5 spec can be found here. It contains the latest information on table attributes for the modern web.
传递一个字典,标示表格的属性值。
parse_dates : bool, optional
boolean or list of ints or names or list of lists or dict, default False
- boolean. If True -> try parsing the index.
- list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
- list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
- dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv
Note: A fast-path exists for iso8601-formatted dates.
解析日期
tupleize_cols : bool, optional
If False try to parse multiple header rows into a MultiIndex , otherwise return raw tuples. Defaults to False .
Deprecated since version 0.21.0: This argument will be removed and will always convert to MultiIndex
不推荐使用
thousands : str, optional
Separator to use to parse thousands. Defaults to ',' .
千位分隔符
encoding : str or None, optional
The encoding used to decode the web page. Defaults to None .``None`` preserves the previous encoding behavior, which depends on the underlying parser library (e.g., the parser library will try to use the encoding provided by the document).
解码方式,默认使用文档提供的编码
decimal : str, default ‘.’
Character to recognize as decimal point (e.g. use ‘,’ for European data).
New in version 0.19.0.
小数点标示,默认使用“.”
converters : dict, default None
Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the cell (not column) content, and return the transformed content.
New in version 0.19.0.
转换某些列的函数的字典:键为列名或者整数,值为转换函数,函数只能传入一个参数,就是该列单元格的值。
na_values : iterable, default None
Custom NA values
New in version 0.19.0.
标示那些为NA值
keep_default_na : bool, default True
If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to
New in version 0.19.0.
保持默认的NA值,与na_values一起使用
|