IPython安装环境和基础工具箱介绍

  • Part 1: Python简介
  • Part 2: IPython安装环境及notebook介绍
  • Part 3: 基础工具箱介绍
  • Part 4: Working with Notebooks

Part1:Python简介

python定义

Python是一种面向对象、直译式的计算机程序设计语言。它的语法简洁而清晰,具有丰富和强大的类库。它能够很轻松的把用其它语言制作的各种模块(尤其是C/C++)轻松地联结在一起。

Python 本身也是由诸多其他语言发展而来的,这包括ABC、Modula-3、C、C++、Algol-68、SmallTalk、Unix shell 和其他的脚本语言等等。Python 就是”浓缩的精华”:范·罗萨姆研究过很多语言,从中吸收了许多觉得不错的特性,并将它们溶于一炉。

Python特点

  • 高级简单、易学、易读、易维护

    • Python是一种代表简单主义思想的语言,有简单的语法,容易上手。
    • Python的这种伪代码本质是它最大的优点之一。
    • Python使你能够专注于解决问题而不是去搞明白语言本身。
    • 不像其他语言,Python 没有给你多少机会使你能够写出晦涩难懂的代码,而是让其他人很快就能理解你写的代码,反之亦然。
  • 面向对象、高层

    • 无需关注底层细节,而C/C++中需要操作指针。
    • 与其他语言相比,Python以强大而又简单的方式实现面向对象编程。
  • 解释性、健壮性

    • Python程序不需要编译成二进制代码,可以直接在源代码上运行。
    • 对于编译性语言(C/C++),源文件->编译/链接器->可执行文件。
    • 一旦Python 由于错误崩溃,解释程序就会转出一个“堆栈跟踪”,那里面有可用到的全部信息,包括程序崩溃的原因以及是那段代码(文件名、行数、行数调用等等)出错了
  • 免费开源、可移植

    • Unix衍生系统,Win32系统家族,掌上平台(掌上电脑/手机),游戏控制台(PSP)等等。
  • 可扩展、可嵌入

    • 如果一段关键代码希望运行得更快或者希望算法不公开,你可以把这部分程序用C或C++编写,然后在Python程序中使用它们。
    • 你可以把Python嵌入到C/C++程序,从而向程序用户提供脚本功能。
  • 丰富的库

    • Python标准库确实很庞大,包括正则表达式、文档生成、单元测试、线程、数据库、网页浏览器、等等。
    • 此外,还有其他高质量的库,如wxPython、Twisted和图像库等等。

Python生态圈

  • 多种解释器
  • Web应用:Dropbox、豆瓣、Instagram
  • 科学计算和大数据分析:Numpy、SciPy、Sckit-learn、Pandas、NLTK、Spark、Scrapy
  • 云计算:OpenStack,GAE、SAE、AWS
  • 自动化测试:unittest、Nose

Python学习方法

  • 使用F1查看帮助文档,搜索不懂的问题
  • 借助help(变量名)查看该变量的方法和属性
  • 阅读其他人、其他项目的源代码
  • 动手编写些小项目程序(计算器/记账软件…)

解释器的选择

  • 官方原始的Python
    • 不建议使用,自己一个一个安装所有的包
  • Pythonxy
    • 集成式python科学计算环境,内置了很多科学计算常用的python扩展包,如numpy,matplotlib
  • Enthought Python Distribution (EPD)
    • EPD是一个商业的Python发行版本,包括了众多的科学软件包,作为教学使用免费
  • Anaconda(推荐选择)
    • Python之父吉多·范罗苏姆作为核心成员之一开发
    • 支持python2.X和python3.X版本
    • 支持windows,linux,mac os系统
    • 集成了数据分析,科学计算相关的几乎所有常用安装包;比如Numpy, Scipy, Matplotlib, IPython等等
    • 方便安装第三方扩展包
    • Anaconda将所有文件都安装属于自己的目录中(home/anaconda)。这意味着它的安装不会对你电脑上的任何版本Python产生影响,同时也不需要特殊权限(像 admin,root)来执行安装
    • ./conda list 查看Anaconda安装

IDE

  • 原生的IDLE并不好用
  • Ipython
    • ipython 是一个python 的交互式shell,比默认的python shell 功能强大,支持变量自动补全,自动缩进,支持bash shell 命令,内置了许多很有用的功能和函数

Notebook功能

  • Eclipse+pydev
  • PyCharm
    • 非常好用的Python集成开发环境
    • Community版本是免费的

认识模块

  • Python代码的组织形式:.py文件
  • 典型的模块文件样式

    • #!/usr/bin/python
    • #encoding: utf-8 | #-*-coding:utf8-*- | coding:utf8
  • python模块的执行方法

    • python .py
    • import

安装并使用pip和virtualenv(后续课程会介绍使用conda)

pip

  • pip是最受欢迎的管理第三方(非标准库)Python包的包。
  • python3.4之后,pip包含在标准Python之中

virtualenv

  • 经常和pip一起使用,允许我们将Python包安装到指定的路径(文件夹)中
  • 多个Python 运行环境共存
  • 每个环境维护自己的包版本
  • 相同模块的不同版本
    • virtualenv Dev
    • activate 激活环境
    • deactivate 退出环境
In [1]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Part 2: IPython安装环境

$ ipython
Python 2.7.11 |Anaconda 2.5.0 (x86_64)| (default, Dec  6 2015, 18:57:58) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: print("hello world")
hello world

利用?访问文档

ipython
In [1]: help(len)
Help on built-in function len in module builtins:

len(...)
    len(object) -> integer

    Return the number of items of a sequence or mapping.
ipython
In [2]: len?
Type:        builtin_function_or_method
String form: <built-in function len>
Namespace:   Python builtin
Docstring:
len(object) -> integer

Return the number of items of a sequence or mapping.
ipython
In [3]: L = [1, 2, 3]
In [4]: L.insert?
Type:        builtin_function_or_method
String form: <built-in method insert of list object at 0x1024b8ea8>
Docstring:   L.insert(index, object) -- insert object before index
ipython
In [5]: L?
Type:        list
String form: [1, 2, 3]
Length:      3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable's items
  • 函数文档访问
ipython
In [6]: def square(a):
  ....:     """Return the square of a."""
  ....:     return a ** 2
  ....:
ipython
In [7]: square?
Type:        function
String form: <function square at 0x103713cb0>
Definition:  square(a)
Docstring:   Return the square of a.
In [4]:
len??
  • ??
ipython
In [9]: len??
Type:        builtin_function_or_method
String form: <built-in function len>
Namespace:   Python builtin
Docstring:
len(object) -> integer

Return the number of items of a sequence or mapping.

Tab 健的使用

Tab-completion of object contents

Like with the help function discussed before, Python has a built-in dir function that returns a list of these, but the tab-completion interface is much easier to use in practice. To see a list of all available attributes of an object, you can type the name of the object followed by a period (".") character and the Tab key:

ipython
In [10]: L.<TAB>
L.append   L.copy     L.extend   L.insert   L.remove   L.sort     
L.clear    L.count    L.index    L.pop      L.reverse

To narrow-down the list, you can type the first character or several characters of the name, and the Tab key will find the matching attributes and methods:

ipython
In [10]: L.c<TAB>
L.clear  L.copy   L.count  

In [10]: L.co<TAB>
L.copy   L.count

If there is only a single option, pressing the Tab key will complete the line for you. For example, the following will instantly be replaced with L.count:

ipython
In [10]: L.cou<TAB>
ipython
In [10]: L._<TAB>
L.__add__           L.__gt__            L.__reduce__
L.__class__         L.__hash__          L.__reduce_ex__

For brevity, we've only shown the first couple lines of the output. Most of these are Python's special double-underscore methods (often nicknamed "dunder" methods).

Tab completion when importing

Tab completion is also useful when importing objects from packages. Here we'll use it to find all possible imports in the itertools package that start with co:

In [10]: from itertools import co<TAB>
combinations                   compress
combinations_with_replacement  count

Similarly, you can use tab-completion to see which imports are available on your system (this will change depending on which third-party scripts and modules are visible to your Python session):

In [10]: import <TAB>
Display all 399 possibilities? (y or n)
Crypto              dis                 py_compile
Cython              distutils           pyclbr
...                 ...                 ...
difflib             pwd                 zmq

In [10]: import h<TAB>
hashlib             hmac                http         
heapq               html                husl

(Note that for brevity, I did not print here all 399 importable packages and modules on my system.)

通配符匹配

Tab completion is useful if you know the first few characters of the object or attribute you're looking for, but is little help if you'd like to match characters at the middle or end of the word. For this use-case, IPython provides a means of wildcard matching for names using the * character.

ipython
In [10]: *Warning?
BytesWarning                  RuntimeWarning
DeprecationWarning            SyntaxWarning
FutureWarning                 UnicodeWarning
ImportWarning                 UserWarning
PendingDeprecationWarning     Warning
ResourceWarning

Notice that the * character matches any string, including the empty string.

Similarly, suppose we are looking for a string method that contains the word find somewhere in its name. We can search for it this way:

ipython
In [10]: str.*find*?
str.find
str.rfind

I find this type of flexible wildcard search can be very useful for finding a particular command when getting to know a new package or reacquainting myself with a familiar one.

Jupyter Notebook

The Jupyter notebook is a browser-based graphical interface to the IPython shell, and builds on it a rich set of dynamic display capabilities. As well as executing Python/IPython statements, the notebook allows the user to include formatted text, static and dynamic visualizations, mathematical equations, JavaScript widgets, and much more. Furthermore, these documents can be saved in a way that lets other people open them and execute the code on their own systems.

Though the IPython notebook is viewed and edited through your web browser window, it must connect to a running Python process in order to execute code. This process (known as a "kernel") can be started by running the following command in your system shell:

$ jupyter notebook

This command will launch a local web server that will be visible to your browser. It immediately spits out a log showing what it is doing; that log will look something like this:

$ jupyter notebook
[NotebookApp] Serving notebooks from local directory: /Users/jakevdp/PythonDataScienceHandbook
[NotebookApp] 0 active kernels 
[NotebookApp] The IPython Notebook is running at: http://localhost:8888/
[NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Upon issuing the command, your default browser should automatically open and navigate to the listed local URL; the exact address will depend on your system. If the browser does not open automatically, you can open a window and manually open this address (http://localhost:8888/ in this example).

Part 3: IPython的功能和基础工具箱介绍

一. 编写完整代码

IPython tells you about the objects you're working with, showing you helpful usage info about functions, methods available, datatypes, and more.

In [2]:
things = ["First Name", "Last Name"]
def snakify(txt):
    "return string in snake_case"
    return txt.replace(" ","_").lower()
In [3]:
print ([snakify(thing) for thing in things] )
snakify( "My Name is Greg" )
['first_name', 'last_name']
Out[3]:
'my_name_is_greg'

Tips: Comment and uncomment lines with command/ if you're on a mac.

二. 使用交互式命令

Use many shell commands directly from IPython, including shell scripts.

A full intro to using the shell/terminal/command-line is well beyond the scope of this chapter, but for the uninitiated we will offer a quick introduction here. The shell is a way to interact textually with your computer. Ever since the mid 1980s, when Microsoft and Apple introduced the first versions of their now ubiquitous graphical operating systems, most computer users have interacted with their operating system through familiar clicking of menus and drag-and-drop movements. But operating systems existed long before these graphical user interfaces, and were primarily controlled through sequences of text input: at the prompt, the user would type a command, and the computer would do what the user told it to. Those early prompt systems are the precursors of the shells and terminals that most active data scientists still use today.

Someone unfamiliar with the shell might ask why you would bother with this, when many results can be accomplished by simply clicking on icons and menus. A shell user might reply with another question: why hunt icons and click menus when you can accomplish things much more easily by typing? While it might sound like a typical tech preference impasse, when moving beyond basic tasks it quickly becomes clear that the shell offers much more control of advanced tasks, though admittedly the learning curve can intimidate the average computer user.

As an example, here is a sample of a Linux/OSX shell session where a user explores, creates, and modifies directories and files on their system (osx:~ $ is the prompt, and everything after the $ sign is the typed command; text that is preceded by a # is meant just as description, rather than something you would actually type in):

osx:~ $ echo "hello world"             # echo is like Python's print function
hello world

osx:~ $ pwd                            # pwd = print working directory
/home/jake                             # this is the "path" that we're sitting in

osx:~ $ ls                             # ls = list working directory contents
notebooks  projects 

osx:~ $ cd projects/                   # cd = change directory

osx:projects $ pwd
/home/jake/projects

osx:projects $ ls
datasci_book   mpld3   myproject.txt

osx:projects $ mkdir myproject          # mkdir = make new directory

osx:projects $ cd myproject/

osx:myproject $ mv ../myproject.txt ./  # mv = move file. Here we're moving the
                                        # file myproject.txt from one directory
                                        # up (../) to the current directory (./)
osx:myproject $ ls
myproject.txt

Notice that all of this is just a compact way to do familiar operations (navigating a directory structure, creating a directory, moving a file, etc.) by typing commands rather than clicking icons and menus. Note that with just a few commands (pwd, ls, cd, mkdir, and cp) you can do many of the most common file operations. It's when you go beyond these basics that the shell approach becomes really powerful.

In [4]:
# try these out too
! pwd
!ls -lt
/Users/lee/Documents/HZ-python-notebook-base/HZ-python-notebook-base
total 31296
drwxr-xr-x   7 lee  staff      238  8 11 18:43 课件html
-rwxr-xr-x   1 lee  staff   662209  8 11 18:42 第3讲-练习题.ipynb
-rw-r--r--   1 lee  staff  1850788  8 11 18:22 第4讲-其他一些模块介绍.ipynb
-rw-r--r--   1 lee  staff   491826  8 11 17:20 stocks.html
-rw-r--r--   1 lee  staff   306711  8 11 17:18 linked_panning.html
-rw-r--r--   1 lee  staff   277157  8 11 17:17 color_scatter.html
-rw-r--r--   1 lee  staff     7613  8 11 17:15 lines.html
-rwxr-xr-x   1 lee  staff  8524136  8 11 10:55 第3讲-基础工具库.ipynb
-rw-r--r--   1 lee  staff    11527  8 11 08:38 graph.pdf
-rw-r--r--   1 lee  staff    51316  8 11 08:38 graph.png
-rw-r--r--   1 lee  staff    50936  8 11 08:38 figure-2.jpg
-rw-r--r--@  1 lee  staff    36996  8 11 08:31 figure-1.jpg
-rwxr-xr-x   1 lee  staff   366763  8 10 14:56 第2讲-python环境和基础语法.ipynb
-rw-r--r--   1 lee  staff       15  8 10 14:06 me.txt
-rw-r--r--   1 lee  staff      371  8 10 11:11 mprun_demo.pyc
-rw-r--r--   1 lee  staff      181  8 10 11:10 mprun_demo.py
-rw-r--r--@  1 lee  staff      157  8 10 10:06 myscript.py
-rwxr-xr-x   1 lee  staff    26033  8  9 23:25 第1讲-数据分析方法概述.ipynb
-rw-r--r--@  1 lee  staff  3316934  7  7 16:54 Bokeh.pdf
drwxr-xr-x@ 10 lee  staff      340  6 28 07:14 data
drwxr-xr-x@ 38 lee  staff     1292  6 28 07:14 figures

Capture the output of a shell command and store the results in a python variable.

In [5]:
foo = !echo hello, world
foo
Out[5]:
['hello, world']

Use curl to download data.

In [6]:
# download salary_data.csv save contents to 
# a local file in the same directory as this notebook
!curl http://www.justinmrao.com/salary_data.csv >> ./salary_data.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  381k  100  381k    0     0  48250      0  0:00:08  0:00:08 --:--:--  103k
In [7]:
# peak at the first row
! head -n 1 salary_data.csv | tr  -s "," "\n"
team
year
player
contract_years_remaining
contract_thru
position
full_name
salary_year
salary_total
year_counter
obs
mean_salary

In [8]:
# peak at the second row
! head -n+2 salary_data.csv | tail -n-1 | tr -s "," "\n"
"Boston Celtics"
"2002-03"
"Bremer
 J.R."
1
"2002-03"
"G"
" Bremer"
349458
349458
1
2
456568.5

三. 输入输出历史

In both the shell and the notebook, IPython exposes several ways to obtain the output of previous commands, as well as string versions of the commands themselves.

In [1]:
import math
In [2]:
math.sin(2)
Out[2]:
0.9092974268256817
In [3]:
math.cos(2)
Out[3]:
-0.4161468365471424
In [4]:
print (In)
['', 'import math', 'math.sin(2)', 'math.cos(2)', 'print (In)']
In [5]:
Out
Out[5]:
{2: 0.9092974268256817, 3: -0.4161468365471424}
In [6]:
print (In[1])
import math
In [7]:
print (Out[2])
0.9092974268256817
In [8]:
Out[2] ** 2 + Out[3] ** 2
Out[8]:
1.0
In [9]:
print (_)
1.0
In [10]:
print (__)
-0.4161468365471424
In [11]:
print (___)
0.9092974268256817
In [12]:
14 in Out
Out[12]:
False

%history 呈现历史命令

In [13]:
%history -n 1-4
   1: import math
   2: math.sin(2)
   3: math.cos(2)
   4: print (In)

四. 错误与调试

Code development and data analysis always require a bit of trial and error, and IPython contains tools to streamline this process.

控制异常 %xmode

Most of the time when a Python script fails, it will raise an Exception. When the interpreter hits one of these exceptions, information about the cause of the error can be found in the traceback, which can be accessed from within Python. With the %xmode magic function, IPython allows you to control the amount of information printed when the exception is raised.

In [1]:
def func1(a, b):
    return a / b

def func2(x):
    a = x
    b = x - 1
    return func1(a, b)
In [2]:
func2(1)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-2-b2e110f6fc8f> in <module>()
----> 1 func2(1)

<ipython-input-1-d849e34d61fb> in func2(x)
      5     a = x
      6     b = x - 1
----> 7     return func1(a, b)

<ipython-input-1-d849e34d61fb> in func1(a, b)
      1 def func1(a, b):
----> 2     return a / b
      3 
      4 def func2(x):
      5     a = x

ZeroDivisionError: division by zero

%xmode takes a single argument, the mode, and there are three possibilities: Plain, Context, and Verbose. The default is Context, and gives output like that just shown before. Plain is more compact and gives less information:

In [3]:
%xmode Plain
Exception reporting mode: Plain
In [4]:
func2(1)
Traceback (most recent call last):

  File "<ipython-input-4-b2e110f6fc8f>", line 1, in <module>
    func2(1)

  File "<ipython-input-1-d849e34d61fb>", line 7, in func2
    return func1(a, b)

  File "<ipython-input-1-d849e34d61fb>", line 2, in func1
    return a / b

ZeroDivisionError: division by zero
In [5]:
%xmode Verbose
Exception reporting mode: Verbose
In [6]:
func2(1)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-6-b2e110f6fc8f> in <module>()
----> 1 func2(1)
        global func2 = <function func2 at 0x104847730>

<ipython-input-1-d849e34d61fb> in func2(x=1)
      5     a = x
      6     b = x - 1
----> 7     return func1(a, b)
        global func1 = <function func1 at 0x1048476a8>
        a = 1
        b = 0

<ipython-input-1-d849e34d61fb> in func1(a=1, b=0)
      1 def func1(a, b):
----> 2     return a / b
        a = 1
        b = 0
      3 
      4 def func2(x):
      5     a = x

ZeroDivisionError: division by zero

This extra information can help narrow-in on why the exception is being raised. So why not use the Verbose mode all the time? As code gets complicated, this kind of traceback can get extremely long. Depending on the context, sometimes the brevity of Default mode is easier to work with.

调试

The standard Python tool for interactive debugging is pdb, the Python debugger. This debugger lets the user step through the code line by line in order to see what might be causing a more difficult error. The IPython-enhanced version of this is ipdb, the IPython debugger.

In IPython, perhaps the most convenient interface to debugging is the %debug magic command. If you call it after hitting an exception, it will automatically open an interactive debugging prompt at the point of the exception. The ipdb prompt lets you explore the current state of the stack, explore the available variables, and even run Python commands!

In [7]:
%debug
> <ipython-input-1-d849e34d61fb>(2)func1()
      1 def func1(a, b):
----> 2     return a / b
      3 
      4 def func2(x):
      5     a = x

ipdb> print (a)
1
ipdb> print (b)
0
ipdb> quit

逐步调试

In [8]:
%debug
> <ipython-input-1-d849e34d61fb>(2)func1()
      1 def func1(a, b):
----> 2     return a / b
      3 
      4 def func2(x):
      5     a = x

ipdb> up
> <ipython-input-1-d849e34d61fb>(7)func2()
      3 
      4 def func2(x):
      5     a = x
      6     b = x - 1
----> 7     return func1(a, b)

ipdb> print (x)
1
ipdb> up
> <ipython-input-6-b2e110f6fc8f>(1)<module>()
----> 1 func2(1)

ipdb> quit

使用 %pdb 调试自动化

In [9]:
%xmode Plain
%pdb on
func2(1)
Exception reporting mode: Plain
Automatic pdb calling has been turned ON
Traceback (most recent call last):

  File "<ipython-input-9-569a67d2d312>", line 3, in <module>
    func2(1)

  File "<ipython-input-1-d849e34d61fb>", line 7, in func2
    return func1(a, b)

  File "<ipython-input-1-d849e34d61fb>", line 2, in func1
    return a / b

ZeroDivisionError: division by zero
> <ipython-input-1-d849e34d61fb>(2)func1()
      1 def func1(a, b):
----> 2     return a / b
      3 
      4 def func2(x):
      5     a = x

ipdb> print (b)
0
ipdb> quit

Finally, if you have a script that you'd like to run from the beginning in interactive mode, you can run it with the command %run -d, and use the next command to step through the lines of code interactively.

部分调试命令

There are many more available commands for interactive debugging than we've listed here; the following table contains a description of some of the more common and useful ones:

Command Description
list Show the current location in the file
h(elp) Show a list of commands, or find help on a specific command
q(uit) Quit the debugger and the program
c(ontinue) Quit the debugger, continue in the program
n(ext) Go to the next step of the program
<enter> Repeat the previous command
p(rint) Print variables
s(tep) Step into a subroutine
r(eturn) Return out of a subroutine

For more information, use the help command in the debugger, or take a look at ipdb's online documentation.

五. 分析和计时

  • %time: Time the execution of a single statement
  • %timeit: Time repeated execution of a single statement for more accuracy
  • %prun: Run code with the profiler
  • %lprun: Run code with the line-by-line profiler
  • %memit: Measure the memory use of a single statement
  • %mprun: Run code with the line-by-line memory profiler

安装line_profilermemory_profiler 模块

Timing Code Snippets: %timeit and %time

In [10]:
%timeit sum(range(100))
The slowest run took 6.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.89 µs per loop

Note that because this operation is so fast, %timeit automatically does a large number of repetitions. For slower commands, %timeit will automatically adjust and perform fewer repetitions:

In [11]:
%%timeit
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j
1 loop, best of 3: 501 ms per loop
In [12]:
import random
L = [random.random() for i in range(100000)]
%timeit L.sort()
The slowest run took 19.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.81 ms per loop
In [13]:
import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()
sorting an unsorted list:
CPU times: user 54.4 ms, sys: 1.63 ms, total: 56 ms
Wall time: 56.3 ms
In [15]:
print ("sorting an already sorted list:")
%time L.sort()
sorting an already sorted list:
CPU times: user 5.49 ms, sys: 207 µs, total: 5.69 ms
Wall time: 5.66 ms

Notice how much faster the presorted list is to sort, but notice also how much longer the timing takes with %time versus %timeit, even for the presorted list! This is a result of the fact that %timeit does some clever things under the hood to prevent system calls from interfering with the timing. For example, it prevents cleanup of unused Python objects (known as garbage collection) which might otherwise affect the timing. For this reason, %timeit results are usually noticeably faster than %time results.

For %time as with %timeit, using the double-percent-sign cell magic syntax allows timing of multiline scripts:

In [16]:
%%time
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j
CPU times: user 605 ms, sys: 6 ms, total: 611 ms
Wall time: 631 ms

分析完整脚本: %prun

Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun.

In [1]:
def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
    return total

Now we can call %prun with a function call to see the profiled results:

In [17]:
%prun sum_of_lists(1000000)
 

使用 %lprun逐行分析

安装 line_profiler

$ pip install line_profiler
In [17]:
%load_ext line_profiler
In [20]:
%lprun -f sum_of_lists sum_of_lists(5000)
Timer unit: 1e-06 s

Total time: 0.015923 s
File: <ipython-input-16-fa2be176cc3e>
Function: sum_of_lists at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def sum_of_lists(N):
     2         1            2      2.0      0.0      total = 0
     3         6            9      1.5      0.1      for i in range(5):
     4     25005        15710      0.6     98.7          L = [j ^ (j >> i) for j in range(N)]
     5         5          201     40.2      1.3          total += sum(L)
     6         1            1      1.0      0.0      return total

The information at the top gives us the key to reading the results: the time is reported in microseconds and we can see where the program is spending the most time. At this point, we may be able to use this information to modify aspects of the script and make it perform better for our desired use case.

For more information on %lprun, as well as its available options, use the IPython help functionality (i.e., type %lprun? at the IPython prompt).

分析内存使用: %memit %mprun

Another aspect of profiling is the amount of memory an operation uses. This can be evaluated with another IPython extension, the memory_profiler. As with the line_profiler, we start by pip-installing the extension:

$ pip install memory_profiler
In [18]:
%load_ext memory_profiler
  • %memit (类比 %timeit
  • %mprun (类比%lprun
In [4]:
%memit sum_of_lists(10000)
peak memory: 37.47 MiB, increment: 1.25 MiB
In [5]:
%%file mprun_demo.py
def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
        del L # remove reference to L
    return total
Overwriting mprun_demo.py
In [6]:
from mprun_demo import sum_of_lists
%mprun -f sum_of_lists sum_of_lists(10000)
('',)
Filename: mprun_demo.py

Line #    Mem usage    Increment   Line Contents
================================================
     1     37.7 MiB      0.0 MiB   def sum_of_lists(N):
     2     37.7 MiB      0.0 MiB       total = 0
     3     37.7 MiB      0.1 MiB       for i in range(5):
     4     37.7 MiB      0.0 MiB           L = [j ^ (j >> i) for j in range(N)]
     5     37.7 MiB      0.0 MiB           total += sum(L)
     6     37.7 MiB      0.0 MiB           del L # remove reference to L
     7     37.7 MiB      0.0 MiB       return total

Here the Increment column tells us how much each line affects the total memory budget: This is on top of the background memory usage from the Python interpreter itself.

六. 更多资源

Web Resources

  • The IPython website: The IPython website links to documentation, examples, tutorials, and a variety of other resources.
  • The nbviewer website: This site shows static renderings of any IPython notebook available on the internet. The front page features some example notebooks that you can browse to see what other folks are using IPython for!
  • A gallery of interesting Jupyter Notebooks: This ever-growing list of notebooks, powered by nbviewer, shows the depth and breadth of numerical analysis you can do with IPython. It includes everything from short examples and tutorials to full-blown courses and books composed in the notebook format!
  • Video Tutorials: searching the Internet, you will find many video-recorded tutorials on IPython. I'd especially recommend seeking tutorials from the PyCon, SciPy, and PyData conferenes by Fernando Perez and Brian Granger, two of the primary creators and maintainers of IPython and Jupyter.

Books

有很多媒体库和工具包

IPython gives you inline plots and allows you to use virtually any browser supported media.

In [1]:
## Inline plotting
%matplotlib inline
import matplotlib.pyplot as plt
#import pylab as pl
X = range(10)
y = range(11,21)
plt.scatter(X,y, c='r')
plt.show()

Write and run javascript among other languages including Ruby, Octave, Julia, R, etc.

In [ ]:
%%javascript
function say_hi ( ) {
    alert("Hello, World");
};

say_hi();
console.log("Welcome!");

Render images from the web via a url or images on your local computer via the filename.

In [2]:
from IPython.display import Image
Image("http://ipython.org/_static/IPy_header.png")
Out[2]:


NumPy

One of the fundamental packages for scientific computing with Python. Many other popular packages and projects use NumPy under the hood, so it's pretty helpful to be familiar with its core concepts.

A collection of tools, utilities, and data structures for working with data.

  • A powerful N-dimensional array object
  • Element by element broadcasting operations (as opposed to iterating)
  • Tools for integrating C/C++ and Fortran code
  • Linear algebra and other mathematical and random number facilities

N-dimensional arrays

The ndarray is a bit like a python list only more efficient.

In [3]:
"""example taken from Scipy and NumPy by Eli Bressert (p. 6)"""
import numpy as np

def list_times(alist, scalar):
    for i, val in enumerate(alist):
        alist[i] = val * scalar
    return alist
In [4]:
arr = np.arange(1e7)
l   = arr.tolist()
In [5]:
%timeit arr * 1.1
10 loops, best of 3: 35.2 ms per loop
In [6]:
%timeit list_times(l, 1.1)
1 loop, best of 3: 1.17 s per loop
In [7]:
print ("len(l)", len(l))
print ("len(arr)", len(arr))
len(l) 10000000
len(arr) 10000000

The ndarray is capible of more advanced slicing.

In [8]:
l = [ [1,2], [3,4] ]
arr   = np.array(l)

print ("Value in Row One, Column One: %d" % l[0][0])
print ("Value in Row One, Column One: %d" % arr[0,0])
print ("Value in All Rows, Column Two: %s" % str(arr[::,1]))
print ("Value in Row Two, Both Columns: %s" % str(arr[1::,]))
Value in Row One, Column One: 1
Value in Row One, Column One: 1
Value in All Rows, Column Two: [2 4]
Value in Row Two, Both Columns: [[3 4]]

You can access elements stored in an ndarray using arr[] sub notation arr[rows, columns].

Select the entire 10th row like so arr[9, :]. 9 indicates "row 10" and : indicates all columns.

In [9]:
zero_to_1000 = np.arange(0,1000)               # create an array of integers from 0 to 1000
zero_to_1000 = zero_to_1000.reshape( (500,2) ) # reshape into 2 dimensions (500 Rows x 2 Cols2)
zero_to_1000[:100, 1]                          # select the 2nd columns from the top 100 rows
Out[9]:
array([  1,   3,   5,   7,   9,  11,  13,  15,  17,  19,  21,  23,  25,
        27,  29,  31,  33,  35,  37,  39,  41,  43,  45,  47,  49,  51,
        53,  55,  57,  59,  61,  63,  65,  67,  69,  71,  73,  75,  77,
        79,  81,  83,  85,  87,  89,  91,  93,  95,  97,  99, 101, 103,
       105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129,
       131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155,
       157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181,
       183, 185, 187, 189, 191, 193, 195, 197, 199])
In [10]:
numbers = np.random.uniform(size=100)
numbers = numbers.reshape((50,2))
mask = (numbers >= 0.7) & (numbers <= 0.9)
mask[:10]
Out[10]:
array([[False, False],
       [False,  True],
       [False, False],
       [ True, False],
       [False, False],
       [ True,  True],
       [False,  True],
       [ True, False],
       [False, False],
       [False, False]], dtype=bool)
In [11]:
numbers[mask]
Out[11]:
array([ 0.71448078,  0.78642688,  0.76502446,  0.75724199,  0.71141945,
        0.83279868,  0.78487885,  0.85064108,  0.78789528,  0.77373465,
        0.76903859,  0.86070058,  0.84117924,  0.78322484,  0.89412315,
        0.89325357,  0.70454971,  0.72114956,  0.77694227,  0.72865709,
        0.70459049,  0.75678771,  0.7463222 ,  0.86741022,  0.72185287,
        0.87948935])


SciPy

SciPy contains functions and utilities for common scientific tasks. Most of SciPy's features rely on NumPy and many other scientific projects rely on SciPy.

We aren't using SciPy directly in this tutorial, and since it's rather large in scope, we'll just give you some of the project highlights.

  • Optimization and Minimization
    • 80 continuous distributions
    • 60 statistical functions
  • Interpolation
  • Integration
  • Statistics
  • Spatial and Clustering Analysis
  • Signal and Image Processing
  • Sparse Matrices


Matplotlib

In [12]:
%matplotlib inline
import pylab as pl
import matplotlib.pyplot as plt
import numpy as np
In [13]:
# Make some data to plot
x = np.linspace(start=0, stop=2*np.pi, num=10)
y1 = np.sin(x)
y2 = np.cos(x)

Lines

In [14]:
fig,ax=plt.subplots(1,figsize=(12,6))
ax.plot(x,y1,label='sin')
ax.plot(x,y2,label='cos')
ax.legend(fontsize=16)
ax.title.set_text("This is a graph\n")
ax.title.set_fontsize(28)
ax.axes.set_xlabel("X Axis", fontdict={"size":22})
ax.axes.set_ylabel("Y Axis", fontdict={"size":22})
fig.show()
/Users/lee/anaconda/lib/python3.5/site-packages/matplotlib/figure.py:397: UserWarning: matplotlib is currently using a non-GUI backend, so cannot show the figure
  "matplotlib is currently using a non-GUI backend, "

Histograms

In [15]:
d = np.random.randn(100) * 100.
m = d.mean()
s = d.std()
m_y = 3

fig = plt.figure(figsize=(12,6))
ax = plt.subplot(111)
ax.hist(d, 15)
ax.plot(m, m_y, "yo")
ax.plot([m - s, m + s], [m_y] * 2, "y--");

Multiple plots.

In [16]:
x = np.arange(0, 1000)
y = np.random.rand(1000)  # 100 random numbers
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(10,10))
ax1.plot(x, y)
ax2.hist(y)
ax3.scatter(x, y)
ax4.boxplot(y)
pl.show()

ggplot2 matplotlib theme

Matplotlib takes some getting used to, but don't fret. There are some solid examples and recipes available online that are super helpful.

http://matplotlib.org/gallery.html


pandas

pandas is the utility belt for data analysts using python. The package centers around the pandas DataFrame, a two-dimensional data structure with indexable rows and columns. It has effectively taken the best parts of Base R, R packages like plyr and reshape2 and consolidated them into a single library.

Library highlights include:

  • DataFrame
  • Indexable rows and columns (e.g. selecting columns and rows via column/row names)
  • Boolean indexing
  • Graceful treatment of missing or NA values
  • Broadcasting (i.e. calling a function on an entire row or column without iteration)
  • Outstanding split, apply, combine, groupby functionality
  • Simple yet powerful string manipulation (again, applied over a large number of strings)
  • Built-in summary statistics (mean, std, corr, etc.)
  • Merging, joining, data set concatenation
In [17]:
import pandas as pd
df = pd.DataFrame({"A": range(10), "B": np.random.random(size=10)})
df.B.corr(df.A)
Out[17]:
-0.26656769337660036


scikit-learn(sklearn)

scikit-learn is composed of a wide number of machine learning algorithms, utilities, and transformations all following a common and consistent API.

Library highlights include:

  • Biclustering
  • Clustering
  • Covariance estimation
  • Cross decomposition
  • Dataset examples
  • Decomposition
  • Ensemble methods
  • Gaussian Process for Machine Learning
  • Generalized Linear Models
  • Manifold learning
  • Gaussian Mixture Models
  • Nearest Neighbors
  • Semi Supervised Classification
  • Support Vector Machines
  • Decision Trees


statsmodels

statistics & econometrics package with useful tools for parameter estimation & statistical testing

Features include:

  • linear regression models
  • GLMs
  • time series modeling
  • integration with pandas

推荐阅读材料

Learning IPython for

Interactive Computing and Data Visualization

</center> https://pic3.zhimg.com/b404fc9b6579414faa092859d1368e52_r.jpg

Some other tutorial help/resources :

基本功能

First things first: running code, getting help

In the notebook, to run a cell of code, hit Shift-Enter. This executes the cell and puts the cursor in the next cell below, or makes a new one if you are at the end. Alternately, you can use:

  • Alt-Enter to force the creation of a new cell unconditionally (useful when inserting new content in the middle of an existing notebook).
  • Control-Enter executes the cell and keeps the cursor in the same cell, useful for quick experimentation of snippets that you don't need to keep permanently.
  • Print是python里很基本很常见的一个操作,它的操作对象是一个字符串。

  • 使用print时,也可以在语句中添加多个表达式,每个表达式用逗号分隔;在用逗号分隔输出时,print语句会在每个输出项后面自动添加一个空格。

In [ ]:
print 'hello,world'
print "hello","world"
print 'hello %s world' %(",")
print "My name is %s. I'm %d years old" %("Albert",18)
  • Print是函数操作,使用时应加(),括号里面时函数参数对象

%s 用字符串来代替

%d 用整型值来代替

%f 用浮点型值来代替

In [18]:
print ('hello,world')
print ("hello","world")
print ('hello %s world' %(","))
print ("My name is %s. I'm %d years old" %("Albert",18))
hello,world
hello world
hello , world
My name is Albert. I'm 18 years old
In [19]:
logFile = open("me.txt", "w")   # 打开文件
print ("我很聪明!",end="",file=logFile)
logFile.close()
In [20]:
# coding: UTF-8
myName = input("please input your name:")
print ("your name is :", myName)
please input your name:李lee
your name is : 李lee

Getting help

In [23]:
import numpy as np
??np.mean ##help(np.mean)

Accessing the underlying operating system

Note: the commands below work on Linux or Macs, but may behave differently on Windows, as the underlying OS is different. IPython's ability to access the OS is still the same, it's just the syntax that varies per OS.

In [21]:
!pwd
/Users/lee/Documents/HZ-python-notebook-base/HZ-python-notebook-base
In [22]:
!ls
Bokeh.pdf
color_scatter.html
data
figure-1.jpg
figure-2.jpg
figures
graph.pdf
graph.png
lines.html
linked_panning.html
me.txt
mprun_demo.py
mprun_demo.pyc
myscript.py
salary_data.csv
stocks.html
第1讲-数据分析方法概述.ipynb
第2讲-python环境和基础语法.ipynb
第3讲-练习题.ipynb
第3讲-基础工具库.ipynb
第4讲-其他一些模块介绍.ipynb
课件html
In [23]:
files = !ls
print("My current directory's files:")
for a_file in files:
    print(a_file)
My current directory's files:
Bokeh.pdf
color_scatter.html
data
figure-1.jpg
figure-2.jpg
figures
graph.pdf
graph.png
lines.html
linked_panning.html
me.txt
mprun_demo.py
mprun_demo.pyc
myscript.py
salary_data.csv
stocks.html
第1讲-数据分析方法概述.ipynb
第2讲-python环境和基础语法.ipynb
第3讲-练习题.ipynb
第3讲-基础工具库.ipynb
第4讲-其他一些模块介绍.ipynb
课件html
In [24]:
!echo $files
!echo {files[0].upper()}
[Bokeh.pdf, color_scatter.html, data, figure-1.jpg, figure-2.jpg, figures, graph.pdf, graph.png, lines.html, linked_panning.html, me.txt, mprun_demo.py, mprun_demo.pyc, myscript.py, salary_data.csv, stocks.html, 第1讲-数据分析方法概述.ipynb, 第2讲-python环境和基础语法.ipynb, 第3讲-练习题.ipynb, 第3讲-基础工具库.ipynb, 第4讲-其他一些模块介绍.ipynb, 课件html]
BOKEH.PDF

magic functions

The IPyhton 'magic' functions are a set of commands, invoked by prepending one or two % signs to their name, that live in a namespace separate from your normal Python variables and provide a more command-like interface. They take flags with -- and arguments without quotes, parentheses or commas. The motivation behind this system is two-fold:

  • To provide an orthogonal namespace for controlling IPython itself and exposing other system-oriented functionality.

  • To expose a calling mode that requires minimal verbosity and typing while working interactively. Thus the inspiration taken from the classic Unix shell style for commands.

In [25]:
%magic
In [26]:
%lsmagic
Out[26]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.
In [27]:
%timeit range(10)
1000000 loops, best of 3: 353 ns per loop
In [28]:
%%timeit
range(10)
range(100)
1000000 loops, best of 3: 732 ns per loop
In [29]:
%timeit L = [n ** 2 for n in range(1000)]
1000 loops, best of 3: 416 µs per loop
In [30]:
%timeit?
In [31]:
%run myscript.py
1 squared is 1
2 squared is 4
3 squared is 9

Part 4: Working with Notebooks

Directory Layout

  • IPython Notebooks are just files (.ipynb) on your file system
  • The Notebook server is aware of Notebooks in a single directory, which we call the Notebook directory
  • If you cd to a Notebook directory and type:

      ipython notebook
    
    

    you will see the Notebooks in that directory in the dashboard

Notebook Files

  • Are just that - files (.ipynb) on your file system
  • Contain JSON data
In [33]:
import nbformat
with open('第1讲-数据分析方法概述.ipynb') as f:
    nb = nbformat.read(f,4)

Saving and Exporting

IPython Notebooks can also be exported to .py files (see "File:Download As" menu item). You can tell the Notebook server to always save these .py files alongside the .ipynb files by starting the Notebook as:

ipython notebook --script

You can import Notebooks from the main Dashboard or simply by copying a Notebook into the Notebook directory.

Overview of the UI

  • Dashboard
  • Notebook area and cells
  • Menu
  • Toolbar

Cell types

  • Code
  • Markdown
  • Raw text
  • Heading

Keyboard Shortcuts

  • Shift-Enter to run a cell
  • Ctrl-Enter to run a cell in place
  • Show all keyboard shortcuts using Ctrl-m h

Python基本元素:数字、字符串和变量

变量、名字和对象

变量

Python中的数据——布尔值、整数、浮点值、字符串,甚至大型数据结构、函数以及程序——都是以对象(object)形式存在的。对象就像一个盒子,数据装在里面。对象有不同类型,比如布尔型和整型,类型决定可以对它可进行的操作。Python时强类型,即无法修改一个已有对象的类型。

Python2中,一个int型包含32位,可以存储从-2147483648到2147483647的整数。一个long型包含64位,可以存储从-9223372036854775808到9223372036854775807的整数。Python3中,long型被舍弃,int型可以存储超过64位整数

  • Python变量不需要声明类型
  • 定义:变量名=值
  • 变量名字大小写敏感
  • 命名规则:必须以字母或者下划线开头
  • 可以同时定义多个变量
    • a,b=‘python’,10
  • 内置数字类型
    • 整型、长整型
    • 浮点
    • 复数
    • 布尔类型
      • True,False
    • 空的列表、None的布尔都是False
    • 字符串
    • None
  • 数学运算符
    • 除法取余:%
    • 除法取整://
  • 比较运算符
  • 逻辑操作符
    • and
    • or
    • not

变量命名规则

  • 由字母或下划线开始
  • 其它字符可以是数字,字母, 或下划线
  • 区分大小写
  • 尽量见名知义
  • 不能使用关键字

Python关键字

and,class,elif,finally,if,lambda,print,while;

as,continue,else,for,import,not,raise,with;

assert,except,from,in,or,return,yield;

break,del,exec,global,is,pass,try,None;

In [34]:
print (float(True))
print (float(False))
1.0
0.0
In [35]:
print (float(98))
print (float(99))
98.0
99.0
In [36]:
print (float('98.6'))
print (float('-1.5'))
print (float('1.0e4'))
98.6
-1.5
10000.0

变量赋值

  • Python 是动态类型语言,不需要预先声明变量类型,变量的类型和初值在赋值时被初始化

变量的基本数字类型

int (有符号整数) 12 012 0x12

long (长整数) 123456L -8383828l

bool (布尔值) True False

float (浮点值) 3.2 3.5e-2 -1.5E3

complex (复数) 3.0+2j 4.8-3.5J

In [37]:
counter = 0
miles = 1000.0
name = 'Bob'
counter = counter + 1
kilometers = 1.609 * miles
In [38]:
print (2+2*3+5%2)
print (8/5+ 8//5)
print (8.0/5)
print (8.0//5)
print (-2 * 3 + 21 // 4 ** 2)
9
2.6
1.6
1.0
-5
In [39]:
print (3.14 <= 3.14159)
print ('A'<= 'B')                 # 比较ASCII码值
print (3.14 != 3.14)              # <>渐渐被淘汰
print ('a'<= 'A')                 # 比较ASCII码值
True
True
False
False
In [40]:
print ((2 < 4) and (2 == 4))
print ((2 > 4) or (2 < 4))
print (not (2<4))
print (3 < 4 < 5)
False
True
False
True
  • 通过代码缩进来区分代码区块结构,避免输入太多的花括号和关键字
    • Python用缩进表示代码块,没有其他语言的大括号
    • 缩进是强制检查,整个代码缩进必须一致,否则无法运行
    • 用4个空格或者Tab进行缩进
    • IDE自动保证缩进的一致
  • 使用#注释,注释是程序中会被Python忽略的一短文本
  • 使用\连接,一行程序的(非强制性)最大长度建议为80个字符


基本数据结构

一. 序列(列表、元组、字符串和集合)

Python的序列中的每个元素都有自己的编号。Python中有6种内建的序列,其中列表和元组是最常见的类型。其他包括字符串、Unicode字符串、buffer对象和xrange对象。下面重点介绍下列表、元组和字符串。

1、列表

列表是可变的,这是它区别于字符串和元组的最重要的特点,一句话概括即:列表可以修改,而字符串和元组不能。

In [41]:
sample_list = ['a',1,('a','b')]
type(sample_list)
Out[41]:
list
In [42]:
print (sample_list)
['a', 1, ('a', 'b')]
In [43]:
sample_list.append(0) # 尾部插入某个值
del sample_list[1] #删除某个值
sample_list[0:0] = ['sample value'] # 头部插入某个值
len(sample_list) #计算长度
Out[43]:
4
In [44]:
sample_list2=sample_list
In [45]:
sample_list[0]='test'
In [46]:
sample_list2
Out[46]:
['test', 'a', ('a', 'b'), 0]
In [47]:
sample_list3=sample_list[:]
In [48]:
sample_list3
Out[48]:
['test', 'a', ('a', 'b'), 0]

常用操作list的方法

  • 追加元素:list.append(var)
  • 插入元素:list.insert(index,var)
  • 返回最后一个元素并删除:list.pop(var)
  • 删除第一次出现的该元素:list.remove(var)
  • 元素在列表中出现的个数:list.count(var)
  • 元素的位置,无则抛异常:list.index(var)
  • 合并list到L上:list.extend(list)
  • 正序排序:list.sort()
  • 倒序排序: list.reverse()
  • 返回var在列表中出现的次数:list.count(var)
  • 创建由参数“分裂”得到的字符串元素列表:list.split()
  • 以字符串列表作为参数,使用分隔符作为主调对象,将每个元素放置到新的字符串中:list.join()

关于 list的复制

  • L1 = L
    • L1为L的别名,用C来说就是指针地址相同,对L1操作即对L操作。函数参数就是这样传递的
  • L1 = L[:]
    • L1为L的克隆,即另一个拷贝。

2、元组

元组与列表一样,也是一种序列,唯一不同的是元组不能被修改(字符串其实也有这种特点)。

  • 能保存任意数量任意类型的Python 对象
  • 元组元素用小括号 ( )包裹
  • 元素的个数及元素的值不可以改变
  • 索引运算符[ i ]得到下标为i的元素
  • 切片运算符[ i : j]得到从下标i到下标j-1的子集
  • 第一个字符元素为 0,最后一个元素索引为-1
In [49]:
t1=tuple([1,2,3])
t2=tuple("jeff")
t3=tuple((1,2,3))
print (t1)
print (t2)
print (t3)
t4=tuple(123)
print (t4)
(1, 2, 3)
('j', 'e', 'f', 'f')
(1, 2, 3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-916e407da42b> in <module>()
      5 print (t2)
      6 print (t3)
----> 7 t4=tuple(123)
      8 print (t4)

TypeError: 'int' object is not iterable
In [50]:
t4.append(3)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-50-832cd73aeebe> in <module>()
----> 1 t4.append(3)

NameError: name 't4' is not defined
  • 一般用逗号分隔一些值来创建元组;

  • 元组大部分时候是通过圆括号括起来的;

  • 空元组可以用没有包含内容的圆括号来表示;

3、字符串

字符串是由数字、字母、下划线组成的一串字符。字符串不能被修改。

  • 大多数程序员在处理字符串上花费的时间要远远超过处理数字的时间。逻辑思维(以及创造力)的重要性要远远超过数学能力。
  • 对Unicode的支持使得Python3可以包含世界上任何书面语言以及许多特殊符号。对于Unicode的支持是Python3从Python2中分离出来的重要原因之一。
  • 计算机很多时候都在处理字符串,比如写电子邮件和文章、发送短信或即时消息、发布博客、利用百度查找信息、浏览网页等。
  • 字符串类型是由Python提供的数据库集类型之一。
  • 包含在单引号或双引号之间的字符集合
  • 索引运算符[ i ]得到下标为i的字符
  • 切片运算符[ i : j]得到从下标i到下标j-1的子串
  • 第一个字符索引为 0,最后一个字符索引为-1
  • 加号(+)用于字符串连接运算
  • 星号(*)用于字符串复制
  • in用于判断某个子串是否在字符串中
In [51]:
str1='Hello world'
print (str1)
print (str1[0])
for c in str1:
    print (c)
Hello world
H
H
e
l
l
o
 
w
o
r
l
d
In [52]:
def cmp(a, b):
    return (a > b) - (a < b)
In [53]:
str2=str1.replace(' ',',') ###字符串替换
print (cmp(str1,str2))  ###字符串比较
-1
In [54]:
print ('_'.join(str2))  ###字符串连接
content = '%s;%s' % tuple((str1,str2)) 
print (content)
H_e_l_l_o_,_w_o_r_l_d
Hello world;Hello,world

常用字符串处理函数

  • 转化为大写:S.upper()
  • 转化为小写:S.lower()
  • 首字母大写:S.capitalize()
  • 是否是首字母大写:S.istitle()
  • 字母是否全是大写:S.isupper()
  • 字母是否全是小写:S.islower()
  • 去掉字符串的左右空格:S.strip()
  • 去掉字符串的左边空格:S.lstrip()
  • 去掉字符串的右边空格:S.rstrip()
  • 计算出现次数:S.count(substr, [start, [end]])
  • 是否全是字母和数字:S.isalnum()
  • 是否全是字母:S.isalpha()
  • 是否全是空白字符:S.isspace()

4、集合

  • 用set()创建集合对象
  • 集合中的元素不重复
  • 支持数学集合操作
  • 集合是无序的
  • 集合里的元素是不可变的(能够hash的)
  • 不支持索引和切片操作
  • set:可变集合
  • frozenset:不可变集合
  • 可变集合(set)特有的方法
    • s.add(item)
    • s.clear()
    • s.discard(item)、s.remove(item)
    • s.update(t)
    • s.difference_update(t):从s中删除和t交集的部分
In [55]:
empty_set = set()
empty_set
Out[55]:
set()
In [56]:
even_numbers = {0,2,2,4,6,8}
even_numbers
Out[56]:
{0, 2, 4, 6, 8}
In [57]:
set('letters')
Out[57]:
{'e', 'l', 'r', 's', 't'}
In [58]:
set(['apple','banana','orange'])
Out[58]:
{'apple', 'banana', 'orange'}
In [59]:
#字典作为参数传入set()函数时,只有键会被使用
set({'apple':'red','orange':'orange','cherry':'red'})
Out[59]:
{'apple', 'cherry', 'orange'}

二. 映射(字典)

映射中的每个元素都有一个名字,如你所知,这个名字专业的名称叫键。字典(也叫散列表)是Python中唯一内建的映射类型。

In [60]:
dict = {"a":"apple","b":"banana","g":"grape","o":"orange"}
print (dict["a"])
apple
In [61]:
del(dict["a"]) #删除字典元素"a"
dict["g"]="grapefruit" #添加字典元素“g”
In [62]:
for k in dict:
    print ("dict[%s] = "%k,dict[k])
dict[o] =  orange
dict[b] =  banana
dict[g] =  grapefruit
In [63]:
for (k,v) in dict.items():  ##遍历元素
    print ("dict[%s] = "%k,v)
dict[o] =  orange
dict[b] =  banana
dict[g] =  grapefruit

字典的常用方法

  • dict.keys():返回字典的key列表
  • dict.values():返回字典的value列表
  • dict.get(k,d):返回字典中的'd'个value值
  • dict.update(dict2):合并两个字典,若有相同键值,则覆盖原来的值
  • dict.setdefault(k,d):创建新的元素'd'并设置默认值

三. Array and Dataframe

多维数据结构。在Numpy包和pandas包中详细介绍。


基本语法

一. 条件和条件语句

  • Python主要作用条件判断的是if语句,由三部分组成: 关键字本身, 用于判断结果真假的条件表达式, 以及当表达式为真或者非零时执行的代码块;
  • if语句的语法如下:

    if expression:

    expr_true_suite;
  • 结合else语句语法如下:

    if expression:

    expr_true_suite
    

    else:

    expr_false_suite
  • 结合elif语句语法如下:

    if expression1:

    expr1_true_suite
    

    elif expression2:

    expr2_true_suite
In [64]:
value=0
if value>1 and value<=2:
    print (1) 
elif value>2:
    print (2)
else:
    print (value)
0

二. 循环和循环语句

1. While

while循环的语句格式如下:

while expression:
   suite_to_repeat

suite_to_repeat子句会一直循环执行, 直到expression 值返回为false。

In [66]:
count = 0
while (count < 9):
    print ('the index is:', count)
    count += 1
the index is: 0
the index is: 1
the index is: 2
the index is: 3
the index is: 4
the index is: 5
the index is: 6
the index is: 7
the index is: 8

2. For

for循环的语句格式如下:

for iter_var in iterable:
    suite_to_repeat

for循环会访问一个可迭代对象(例如序列或是迭代器)中的所有元素, 并在所有条目都处理过后结束循环。

In [67]:
for i in range(9):
    print ('the index is:', i)
the index is: 0
the index is: 1
the index is: 2
the index is: 3
the index is: 4
the index is: 5
the index is: 6
the index is: 7
the index is: 8
In [68]:
nameList = ['Donn', 'Shirley', 'Ben', 'Janice','David', 'Yen', 'Wendy']
for i, eachLee in enumerate(nameList):
    print ("%d %s Lee" % (i+1, eachLee))
1 Donn Lee
2 Shirley Lee
3 Ben Lee
4 Janice Lee
5 David Lee
6 Yen Lee
7 Wendy Lee

三. 函数

数学上的函数,是指给定一个输入,就会有唯一输出的一种对应关系。编程语言里的函数跟这个意思差不多,但也有不同。函数就是一块语句,这块语句有个名字,你可以在需要时反复地使用这块语句。它有可能需要输入,有可能会返回输出。

函数的语句格式如下:

def function_name(input):
    function_suite #函数体
In [69]:
def hello1():
    print ('hello, world')
    
test1=hello1()
hello, world
In [70]:
def hello2(name):
    return ('Hello,%s' %(name))
In [71]:
test2=hello2('Albert')
print (test2)
Hello,Albert

四. 类

类是一种数据结构,我们可以用它来定义对象。类是现实世界的抽象的实体以编程形式出现。实例是这些对象的具体化。可以类比一下,类是蓝图或者模型,用来产生真实的物体(实例)。因此为什么是术语“class”?这个术语很可能起源于使用类来识别和归类特定生物所属的生物种族,类还可以派生出相似但有差异的子类。编程中类的概念就应用了很多这样的特征。

类的语言格式如下:

class ClassName(object): 
     class_suite #类体
In [72]:
class C(object):
    foo = 100

test3=C()
test3.foo
Out[72]:
100
In [73]:
class Complex:
    def __init__(self, realpart, imagpart): ##__init__构造器,默认行为,是预定义的
        self.r = realpart
        self.i = imagpart

x = Complex(3.0, -4.5)
x.r, x.i
Out[73]:
(3.0, -4.5)

五. 模块

模块是Pyhon最高级别的程序组织单元,它将程序代码和数据封装起来以便重用。实际的角度,模块往往对应Python程序文件。 每个文件都是一个模块,并且模块导入其他模块之后就可以使用导入模块定义的变量名。模块可以由两个语句和一个重要的内置函数进行处理。

  • 将整个模块导入,格式为:import module;
  • 从某个模块中导入某个函数,格式为:from somemodule import somefunction;
  • 从某个模块中导入多个函数,格式为:from somemodule import firstfunc, secondfunc, thirdfunc
  • 将某个模块中的全部函数导入,格式为:from somemodule import *
In [74]:
import numpy as np
from numpy import abs,sum,where,sqrt
from numpy import *

六. 推导式

列表推导式

  • 形式一:[expression for item in iterable]
  • 形式二:[expression for item in iterable if condition]

字典推导式

  • 形式:{key_expression : value_expression for expression in iterable}

集合推导式

  • 形式: {expression for expression in iterable}
In [75]:
number_list = [number for number in range(1,6)]
number_list
Out[75]:
[1, 2, 3, 4, 5]
In [76]:
a_list = [number for number in range(1,6) if number % 2 == 1]
a_list
Out[76]:
[1, 3, 5]
In [77]:
word = 'letters'
letter_counts = {letter: word.count(letter) for letter in word}
letter_counts
Out[77]:
{'e': 2, 'l': 1, 'r': 1, 's': 1, 't': 2}
In [78]:
a_set = {number for number in range(1,6) if number % 3 == 1}
a_set
Out[78]:
{1, 4}

zip()并行迭代

  • 通过zip()函数对多个序列进行并行迭代
  • zip()函数在最短序列“用完”时停止执行
  • zip()函数配对两个元祖。函数返回值既不是元祖也不是列表,而是一个整合在一起的可迭代变量
In [79]:
days = ['Monday','Tuesday','Wednesday']
fruits = ['apple','banana','peach']
drinks = ['coffee','beer','tea']
desserts = ['tiramisu','ice cream','pie','pudding']
for day,fruit,drink,dessert in zip(days,fruits,drinks,desserts):
    print (day,":drink",drink,"- eat",fruit,"- enjoy",dessert)
Monday :drink coffee - eat apple - enjoy tiramisu
Tuesday :drink beer - eat banana - enjoy ice cream
Wednesday :drink tea - eat peach - enjoy pie

range()生成自然序列

  • range()函数返回在特定区间的自然数序列,不需要创建和存储复杂的数据结构,列如列表和元祖。这允许在不适用计算机全部内存的情况下创建较大的区间,也不会使你的程序崩溃。
  • range()函数用法类似使用切片:range(start,stop,step)
In [80]:
for x in range(0,3):
    print (x)
0
1
2
In [81]:
for x in range(2,-1,-1):
    print (x)
2
1
0
In [82]:
print (list(range(2,-1,-1)))
print (list(range(0,11,2)))
[2, 1, 0]
[0, 2, 4, 6, 8, 10]

Assignments

模式编辑 (Mode Editor)

Jupyter Notebook中有两种状态:

  • 编辑状态 (Editor Mode)。在编辑状态时右上角会出现铅笔图标,按Esc键切换回命令状态。
  • 命令状态 (Command Mode)。按Enter键(或者双击cell)变为编辑状态

单元类型 (cell type)

Jupyter Notebook文档由一系列的单元 (cell) 组成,主要用的两类单元是:

  • markdown cell, 命令模式下,按m可将单元切换为markdown cell
  • code cell,命令模式下,按y可将单元切换为code cell

命令模式下常用快捷键

  • 查看快捷键帮助:h
  • 保存: s
  • cell间移动: j, k
  • 添加cell: a, b
  • 删除cell: dd
  • cell编辑: x, c, v, z
  • 中断kernel: ii
  • 重启kernel: 00

拆分单元 (split cell)

编辑模式下按control+shift+-可拆分cell

合并单元

在命令模式下,先用shift+jshift+k选中想合并的单元,然后用shift+m合并

数学公式

Jupyter Notebook中采用mathjax渲染数学公式,渲染出的效果如下 $$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$$

查看对象信息

In [83]:
import numpy as np
In [84]:
%pdoc np
np.*cos*?

魔法函数 (magic functions)

  • %开头的为单行魔法函数
  • %%开头的为cell魔法函数
In [85]:
%ls -lt
total 32064
-rwxr-xr-x   1 lee  staff   368620  8 11 22:32 第2讲-python环境和基础语法.ipynb*
-rw-r--r--   1 lee  staff       15  8 11 22:24 me.txt
-rw-r--r--   1 lee  staff   390643  8 11 22:12 salary_data.csv
drwxr-xr-x   7 lee  staff      238  8 11 18:43 课件html/
-rwxr-xr-x   1 lee  staff   662209  8 11 18:42 第3讲-练习题.ipynb*
-rw-r--r--   1 lee  staff  1850788  8 11 18:22 第4讲-其他一些模块介绍.ipynb
-rw-r--r--   1 lee  staff   491826  8 11 17:20 stocks.html
-rw-r--r--   1 lee  staff   306711  8 11 17:18 linked_panning.html
-rw-r--r--   1 lee  staff   277157  8 11 17:17 color_scatter.html
-rw-r--r--   1 lee  staff     7613  8 11 17:15 lines.html
-rwxr-xr-x   1 lee  staff  8524136  8 11 10:55 第3讲-基础工具库.ipynb*
-rw-r--r--   1 lee  staff    11527  8 11 08:38 graph.pdf
-rw-r--r--   1 lee  staff    51316  8 11 08:38 graph.png
-rw-r--r--   1 lee  staff    50936  8 11 08:38 figure-2.jpg
-rw-r--r--@  1 lee  staff    36996  8 11 08:31 figure-1.jpg
-rw-r--r--   1 lee  staff      371  8 10 11:11 mprun_demo.pyc
-rw-r--r--   1 lee  staff      181  8 10 11:10 mprun_demo.py
-rw-r--r--@  1 lee  staff      157  8 10 10:06 myscript.py
-rwxr-xr-x   1 lee  staff    26033  8  9 23:25 第1讲-数据分析方法概述.ipynb*
-rw-r--r--@  1 lee  staff  3316934  7  7 16:54 Bokeh.pdf
drwxr-xr-x@ 10 lee  staff      340  6 28 07:14 data/
drwxr-xr-x@ 38 lee  staff     1292  6 28 07:14 figures/
In [86]:
%magic
In [87]:
%lsmagic
Out[87]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.
In [88]:
%%writefile hello.py
print("hello world")
Writing hello.py
In [89]:
%run hello.py
hello world
In [90]:
%%writefile message.py
print(message)
Writing message.py
In [91]:
message = "hello world"
In [92]:
%run message.py
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/Users/lee/Documents/HZ-python-notebook-base/HZ-python-notebook-base/message.py in <module>()
----> 1 print(message)

NameError: name 'message' is not defined
In [93]:
%run -i message.py
hello world
In [94]:
?%run
In [ ]: