[笔记] Google Python Course

原始地址在这里

Intro and string

  1. Python sets __name__ to __main__ while executing

  2. from package_name import func as aliase creates namespace package_name, and you can use func directly

  3. Use str instead of old string

  4. Use // for integer division instead of /

  5. Something on str:

    1. A raw string: r'raw_string' (treat everything literally); a unicode string: u'unicode_string'
    2. Print without new line: print mystr, (note the trailing comma)
    3. s.lower(), s.upper() -- returns the lowercase or uppercase version of the string
    4. s.strip() -- returns a string with whitespace removed from the start and end
    5. s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes
    6. s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string
    7. s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found
    8. s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'
    9. s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.
    10. s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc
    11. s.encode(utf-8)-- convert unicode to utf-8. **The built-in print does not work fully with unicode strings. **
    12. after = unicode(before,'uft-8') --convert utf-8 to unicode
    13. The reason why python doesn't have s.len() but instead has s.__len__:link
    14. Note: my_str[m,n] doesn't produce mystr[n]. It ends at my_str[n-1]. However, str[m:] reaches to the end.
    15. str is immutable, i.e., cannot be changed. (You can convert it into a list to assign values to items)
  6. Formatted output: var_to_print = (" %formatter1 %formatter2") % (tuple, to_print)

  7. Group multi-line code ():

    1.   # add parens to make the long-line work:
        text = ("%d little pigs come out or I'll %s and %s and %s" %
          (3, 'huff', 'puff', 'blow down'))
      
  8. Do not put boolean test in parentheses, e.g., if some_boolean_exp: (note the trailing colon)

  9. Difference between del and set to None:link

List, tuple and sorting

  1. List assignment = only makes new list point to the old

  2. Examples for for and in (use for in favor of your own loop. Use range to generate loop indices if you need):

    1. list = ['larry', 'curly', 'moe']
      if 'curly' in list:
          print 'yay'
          
      ## print the numbers from 0 through 99
      for i in range(100):
          print i
      
  3. Example on while

    1. ## Access every 3rd element in a list
        i = 0
        while i < len(a):
          print a[i]
          i = i + 3
      
  4. List methods:

    1. list.append(elem) -- adds a single element to the end of the list. Common error: does not return the new list, just modifies the original.
    2. list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.
    3. list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().
    4. list.index(elem) -- searches for the given element from the start of the list and returns its index. Throws a ValueError if the element does not appear (use "in" to check without a ValueError).
    5. list.remove(elem) -- searches for the first instance of the given element and removes it (throws ValueError if not present)
    6. list.sort() -- sorts the list in place (does not return it). (The sorted() function shown below is preferred.)
    7. list.reverse() -- reverses the list in place (does not return it)
    8. list.pop(index) -- removes and returns the element at the given index. Returns the rightmost element if index is omitted (roughly the opposite of append()).
    9. Common error: *note that the above methods **do not *return the modified list, they just modify the original list and return None.
  5. Favor new_list = sorted(list, key=func) instead of list.sort(key=func)(doesn't return sorted list). sorted() can work on any enumerable objects while sort() can't. However, sort() is slightly faster than sorted() if the elements to sort are already in a list. key transfers element to "proxy"

    to compare

  6. Python sort is stable, which means that sorting the list by length leaves the elements in alphabetical order when the length is equal.

    1.   ## "key" argument specifying str.lower function to use for sorting
        print sorted(strs, key=str.lower)  ## ['aa', 'BB', 'CC', 'zz']
      
      
      
  7. Tuple (elem1, elem2...) is immutable but can contain mutable elements (like list). Because tuple only holds references, and the mutability is affected by presence of method that changes the data. See this for explanation.

  8. To create a size-1 tuple, the lone element must be followed by a comma.

      tuple = ('hi',)   ## size-1 tuple
    
  9. A way to assigning tuple: (x, y, z) = (42, 13, "hike")

  10. List comprehension:

  11. Example:

      nums = [1, 2, 3, 4]
    
      squares = [ n * n for n in nums ]   ## [1, 4, 9, 16]
    
  12. Conditional evaluation

      ## Select values <= 2
      nums = [2, 8, 1, 6]
      small = [ n for n in nums if n <= 2 ]  ## [2, 1]
    
      ## Select fruits containing 'a', change to upper case
      fruits = ['apple', 'cherry', 'bannana', 'lemon']
      afruits = [ s.upper() for s in fruits if 'a' in s ] # note the "if"
      ## ['APPLE', 'BANNANA']
    

Dict, Hash, and Files

  1. Looping through keys in a dict is in an arbitrary order. Use for key in sorted(dict.keys()) to loop sequentially

  2. More dict example:

    1.   ## Get the .keys() list:
        print dict.keys()  ## ['a', 'o', 'g']
      
        ## Likewise, there's a .values() list of values
        print dict.values()  ## ['alpha', 'omega', 'gamma']
      
        ## Common case -- loop over the keys in sorted order,
        ## accessing each key/value
        for key in sorted(dict.keys()):
          print key, dict[key]
      
        ## .items() is the dict expressed as (key, value) tuples
        print dict.items()  ##  [('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')]
      
        ## This loop syntax accesses the whole dict by looping
        ## over the .items() tuple list, accessing one (key, value)
        ## pair on each iteration.
        for k, v in dict.items(): print k, '>', v
        ## a > alpha    o > omega     g > gamma
      
      
      
  3. iterkeys(), itervalues() and iteritems() are slightly faster

  4. dict formatted output:

      hash = {}
      hash['word'] = 'garfield'
      hash['count'] = 42
      s = 'I want %(count)d copies of %(word)s' % hash  # %d for int, %s for string
      # 'I want 42 copies of garfield'
    
  5. Difference between del and None: this. del can also used to delete list and dict entries

  6. File open:rU :convert whatever EOL to '\n', r: read, w:override, a: append

  7. f.readlines() read the whole file into memory, f.read() read the whole file into a string

  8. import codecs for reading unicode a file

  9. sys.exit(0): abort

Regex

import re
# use r to transfer raw input in regex matching
match = re.search(r'pattern', str) #  only returns first matching
match = re.findall(r'pattern', string)# returns a list of matching strings.
  • Python's regex is Perl Compatible Regular Expressions

  • a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)

  • . (a period) -- matches any single character except newline '\n'

  • \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.

  • \b -- boundary between word and non-word

  • \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.

  • \t, \n, \r -- tab, newline, return

  • \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but they all support \w and \s)

  • ^ = start, $ = end -- match the start or end of the string

  • \ -- inhibit the "specialness" of a character. So, for example, use . to match a period or \ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, @, to make sure it is treated just as a character.

  • + -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's

  • * -- 0 or more occurrences of the pattern to its left

  • ? -- match 0 or 1 occurrences of the pattern to its left

  • Regex matching is greedy by default, use a trailing ? to do non-greedy matching (stop as soon as you can)

  • [] character set. ^ to invert and - indicates range

  • () group patterns for output extraction,(?: ) to suppress this group

  • Example:

    •   str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
        tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
        print tuples  ## [('alice', 'google.com'), ('bob', 'abc.com')]
        for tuple in tuples:
          print tuple[0]  ## username
          print tuple[1]  ## host
      
  • flags (r'pattern',str,flag):

    • re.IGNORECASE -- ignore upper/lowercase differences for matching, so 'a' matches both 'a' and 'A'.
    • re.DOTALL -- allow dot (.) to match newline -- normally it matches anything but newline. This can trip you up -- you think .* matches everything, but by default it does not go past the end of a line. Note that \s (whitespace) includes newlines, so if you want to match a run of whitespace that may include a newline, you can just use \s*
    • re.MULTILINE -- Within a string made of many lines, allow ^ and $ to match the start and end of each line. Normally ^/$ would just match the start and end of the whole string.
  • re.sub(pat, replacement, str) substitution

    • str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
        ## re.sub(pat, replacement, str) -- returns new string with all replacements,
        ## \1 is group(1), \2 group(2) in the replacement
        print re.sub(r'([\w\.-]+)@([\w\.-]+)', r'\1@yo-yo-dyne.com', str)
        ## purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher
      

Utils

The os and os.path modules include many functions to interact with the file system. The shutil module can copy files.

  • os module docs

  • filenames = os.listdir(dir) -- list of filenames in that directory path (not including . and ..). The filenames are just the names in the directory, not their absolute paths.

  • os.path.join(dir, filename) -- given a filename from the above list, use this to put the dir and filename together to make a path

  • os.path.abspath(path) -- given a path, return an absolute form, e.g. /home/nick/foo/bar.html

  • os.path.dirname(path), os.path.basename(path) -- given dir/foo/bar.html, return the dirname "dir/foo" and basename "bar.html"

  • os.path.exists(path) -- true if it exists

  • os.mkdir(dir_path) -- makes one dir, os.makedirs(dir_path) makes all the needed dirs in this path

  • shutil.copy(source-path, dest-path) -- copy a file (dest path directories should exist)

The commands module is a simple way to run an external command and capture its output.

  • commands module docs
  • (status, output) = commands.getstatusoutput(cmd) -- runs the command, waits for it to exit, and returns its status int and output text as a tuple. The command is run with its standard output and standard error combined into the one output text. The status will be non-zero if the command failed. Since the standard-err of the command is captured, if it fails, we need to print some indication of what happened.
  • output = commands.getoutput(cmd) -- as above, but without the status int.
  • There is a commands.getstatus() but it does something else, so don't use it -- dumbest bit of method naming ever!
  • If you want more control over the running of the sub-process, see the "popen2" module (http://docs.python.org/lib/module-popen2.html)
  • There is also a simple os.system(cmd) which runs the command and dumps its output onto your output and returns its error code. This works if you want to run the command but do not need to capture its output into your python data structures.

Python debugger pdb

Exception handling try/except:

 try:
    ## Either of these two lines could throw an IOError, say
    ## if the file does not exist or the read() encounters a low level error.
    f = open(filename, 'rU')
    text = f.read()
    f.close()
  except IOError:
    ## Control jumps directly to here if any of the above lines throws IOError.
    sys.stderr.write('problem reading:' + filename)
  ## In any case, the code then continues with the line after the try/except

The module urllib provides url fetching -- making a url look like a file you can read form. The urlparse module can take apart and put together urls.

  • urllib module docs
  • ufile = urllib.urlopen(url) -- returns a file like object for that url
  • text = ufile.read() -- can read from it, like a file (readlines() etc. also work)
  • info = ufile.info() -- the meta info for that request. info.gettype() is the mime time, e.g. 'text/html'
  • baseurl = ufile.geturl() -- gets the "base" url for the request, which may be different from the original because of redirects
  • urllib.urlretrieve(url, filename) -- downloads the url data to the given file path
  • urlparse.urljoin(baseurl, url) -- given a url that may or may not be full, and the baseurl of the page it comes from, return a full url. Use geturl() above to provide the base url.

posted on 2017-03-10 11:20  小硕鼠  阅读(497)  评论(0)    收藏  举报

导航