【Python/Numpy】list/tuple/dictionary/numpy 的操作

Common Data Structures

Lists

Lists are mutable arrays.

普通操作

# Two ways to create an empty list
empty_list = []
empty_list = list()

# Create a list that contains different data types,this is allowed in Python
mylist = ["aa", "bb", 1, 2, ["Jack", 12]]

# Index into list by index
print(mylist[0]) # "aa"

# Append to end of list
mylist.append("append")

# Get length of list
len(mylist)

# Concatenate two lists
mylist += ["concatenate", "two"]

切片

List slicing is a useful way to access a slice of elements in a list.

nums = [0, 1, 2, 3, 4, 5, 6]

# Slices from start index (inclusive) to end index (exclusive)
print(nums[0:3]) # [0, 1, 2]

# When start index is not specified, it is start of list
# When end index is not specified, it is end of list
print(nums[:3]) # [0, 1, 2]
print(nums[5:]) # [5, 6]

# : takes the slice of all elements along a dimension, is very useful when working with numpy arrays
print(nums[:]) # [0, 1, 2, 3, 4, 5, 6]

# Negative index wraps around, start counting from the end of list
print(nums[-1]) # 6
print(nums[-3:]) # [4, 5, 6]
print(nums[3:-2]) # [3, 4]

注意:如果使用赋值操作将 nums 赋值给另一个变量,那么修改新变量的值会影响原始列表 nums 的值。例如,如果执行新列表变量 new_nums = nums,那么在对 new_nums 进行修改后,nums 的值也会被修改。但如果使用 nums[:] 进行切片操作赋值给新变量,则对新变量进行的任何修改都不会影响原始列表 nums 值。

Tuples

Tuples are immutable arrays. Unlike lists, tuples do not support item re-assignment

# Two ways to create an empty tuple
empty_tuple = ()
empty_tuple = tuple()

# Use parentheses for tuples, square brackets for lists
names = ("Zach", "Jay")

# Index
print(names[0])

# Get length
len(names)

# Create a tuple with a single item, the comma is important
single = (10,)
print(single) # (10,)

Dictionary

Dictionaries are hash maps.

# Two ways to create an empty dictionary
phonebook = {}
phonebook = dict()

# Create dictionary with one item
phonebook = {"Zach": "12-37"}
# Add anther item
phonebook["Jay"] = "34-23"

# Check if a key is in the dictionary
print("Zach" in phonebook) # True
print("Kevin" in phonebook) # False

# Get corresponding value for a key
print(phonebook["Jay"]) # 34-23

# Delete an item
del phonebook["Zach"]
print(phonebook) # {'Jay': '34-23'}

Loops

# Basic for loop
for i in range(5):
    print(i)
    
# To iterate over a list
names = ["Zach", "Jay", "Richard"]
for name in names:
    print(name)

# To iterate over indices and values in a list
# Way 1
for i in range(len(names)):
    print(i, names[i])
# Way 2
for i, name in enumerate(names):
    print(i, name)
    
#########################################

# To iterate over a dictionary
phonebook = {"Zach": "12-37", "Jay": "34-23"}

# Iterate over keys
for name in phonebook:
    print(name)

# Iterate over values
for number in phonebook.values():
    print(number)
    
# Iterate over keys and values
for name, number in phonebook.items():
    print(name, number)

Numpy

Optimized library for matrix and vector computation.

Numpy is a Python library, which adds support for large, multi-dimensional arrays and matrices, along with a large collection of optimized, high-level mathematical of functions to operate on these arrays.

Vectors can be represented as 1-D arrays of shape (N,) or 2-D arrays of shape (N,1) or (1,N). But it's important to note that the shapes (N,), (N,1), and (1,N) are not the same and may result in different behavior (we'll see some examples below involving matrix multiplication and broadcasting).

Matrices are generally represented as 2-D arrays of shape (M,N).

# Import numpy
import numpy as np

# Create numpy arrays from lists
x = np.array([1, 2, 3])
y = np.array([[3, 4, 5]])
z = np.array([[6, 7], [8, 9]])

# Get shapes
print(y.shape) # (1, 3)

# reshape
a = np.arange(10) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = a.reshape((5, 2))
'''
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
'''

Array Operations

There are many Numpy operations that can be used to reduce a numpy array along an axis.

x = np.array([[1,2], [3,4], [5,6]])

# np.max operation
print(np.max(x, axis = 1)) # [2 4 6]
print(np.max(x, axis = 1).shape) # (3,)

print(np.max(x, axis = 1, keepdims = True))
'''
[[2]
 [4]
 [6]]
'''
print(np.max(x, axis = 1, keepdims = True).shape) # (3, 1)

#######################################################
# some matrix operations

# take an element-wise product(Hadamard product)
# A.shape must equal B.shape
A = np.array([[1, 2], [3, 4]])
B = np.array([[3, 3], [3, 3]])
print(A * B) 
'''
[[3 6]
 [9 12]]
'''

# do matrix multiplication with np.matmul or @
# the last dimension of A must equal the first dimension of B
print(np.matmul(A, B))
print(A @ B)
'''
[[9 9]
 [21 21]]
'''

# dot product or a matrix vector product with np.dot
u = np.array([1, 2, 3])
v = np.array([1, 10, 100])

print(np.dot(u, v)) # 321
print(u.dot(v)) # 321

# Taking the dot product of a vector and a multidimensional matrix is actually doing matrix multiplication
W = np.array([1, 2], [3, 4], [5, 6])
print(np.dot(v, W)) # [531 642]
print(np.dot(W, v)) # ValueError: shapes (3,2) and (3,) not aligned:2 (dim 1) != 3 (dim 0)
# We can fix the above issue by transposing W.
print(np.dot(W.T, v))

Slicing

Slicing / indexing numpy arrays is a extension of Python concept of slicing(lists) to N dimensions.

x = np.random.random((3, 4))

# Selects all of x
print(x[:])
'''
[[0.51640626 0.3041091  0.27188644 0.87484083]
 [0.79114758 0.99308623 0.98326875 0.04455941]
 [0.39529208 0.54231156 0.15966311 0.63360179]]
'''

# Selects the 0th and 2nd rows
print(x[np,array([0, 2]), :])
'''
[[0.51640626 0.3041091  0.27188644 0.87484083]
 [0.39529208 0.54231156 0.15966311 0.63360179]]
'''

# Selects 1st row as 1-D vector and and 1st through 2nd elements
print(x[1, 1:3])
# [0.99308623 0.98326875]

# Boolean indexing
print(x[x > 0.5])
# [0.51640626 0.87484083 0.79114758 0.99308623 0.98326875 0.54231156 0.63360179]

# 3-D vector of shape (3, 4, 1)
print(x[:, :, np.newaxis])
'''
[[[0.51640626] 
  [0.3041091]
  [0.27188644]
  [0.87484083]]
  
 [[0.79114758]
  [0.99308623]
  [0.98326875]
  [0.04455941]]
  
 [[0.39529208]
  [0.54231156]
  [0.15966311]
  [0.63360179]]]
'''

Broadcasting

The term broadcasting describes how Numpy treats arrays with different shapes during arithmentic operations.

General Broadcasting Rules

When operating on two arrays, Numpy compares their shapes element-wise(逐元素的).It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when:

  • they are equal, or
  • one of them is 1 (in which case, elements on the axis are repeated along the dimension)
image-20240317204043591
x = np.random.random((3, 4))
y = np.random.random((3, 1))
z = np.random.random((1, 4))

# In this example, y and z are broadcasted to match the shape of x.
# y is broadcasted along dim 1.
s = x + y
# z is broadcasted along dim 0.
p = x * z

# more example
a = np.zeros((3, 3))
b = np.array([[1, 2, 3]])
print(a+b)
'''
[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]
'''

# more complex example
a = np.random.random((3, 4))
b = np.random.random((3, 1))
c = np.random.random((3, ))

result1 = b + b.T
print(b.shape) # (3, 1)
print(b.T.shape) # (1, 3)
print(result1.shape) # (3, 3)

result2 = a + c # ValueError: operands could not be broacast together whih shapes (3, 4) (3,)

result3 = b + c
print(b)
print(c)
print(result3)
'''
[[0.14781386]
 [0.89302824]
 [0.28916391]]

[0.96525397 0.86351595 0.29259715]

[[1.11306782 1.01132981 0.44041101]
 [1.8582822  1.75654419 1.18562539]
 [1.25441788 1.15267986 0.58176106]]
'''

Efficient Numpy Code

When working with numpy, avoid explicit for-loops over indices/axes at costs. For-loops will dramatically slow down your code.

We can time code uising the %%timeit magic. Let's compare using explicit for-loop vs. using numpy operations.

%%timeit
x = np.random.ran(1000, 1000)
for i in range(100, 1000):
    for j in range(x.shape[1]):
        x[i, j] += 5

\(459 ms\underline+10.5ms \text{ per loop (mean} \underline+\text{ std. dev. of 7 runs, 1 loops each)}\)

%%timeit
x = np.random.rand(1000, 1000)
x[np.arange(100, 1000), :] += 5

\(12.2 ms\underline+143\mu s \text{ per loop (mean} \underline+\text{ std. dev. of 7 runs, 100 loops each)}\)

posted @ 2024-03-17 21:21  hzyuan  阅读(2)  评论(0编辑  收藏  举报