## Advanced Jupyter Notebooks: A Tutorial

´ó | ÖÐ | Ð¡
[ ËùÊô·ÖÀà ¿ª·¢£¨python£© | ·¢²¼Õß µêÐ¡¶þ04 | Ê±¼ä 2019 | ×÷Õß ºìÁì½í ] 0ÈËÊÕ²Øµã»÷ÊÕ²Ø

Lying at the heart of modern data science and analysis, Jupyter Notebooks are an incredibly powerful tool at both ends of the project lifecycle. Whether you're rapidly prototyping ideas, demonstrating your work, or producing fully fledged reports, notebooks can provide an efficient edge over IDEs or traditional desktop applications.

Following on from " Jupyter Notebook for Beginners: A Tutorial ", this guide will take you on a journey from the truly vanilla to the downright dangerous. That's right! Jupyter's wacky world of out-of-order execution has the power to faze, and when it comes to running notebooks inside notebooks, things can get complicated fast.

This guide aims to straighten out some sources of confusion and spread ideas that pique your interest and spark your imagination. There are already plenty ofgreat listicles of neat tips and tricks, so here we will take a more thorough look at Jupyter's offerings.

This will involve:

Warming up with the basics of shell commands and some handy magics, including a look at debugging, timing, and executing multiple languages. Exploring topics like logging, macros, running external code, and Jupyter extensions. Seeing how to enhance charts with Seaborn, beautify notebooks with themes and CSS, and customise notebook output. Finishing off with a deep look at topics like scripted execution, automated reporting pipelines, and working with databases.

If you're a JupyterLab fan, you'll be pleased to hear that 99% of this is still applicable and the only difference is that some Jupyter Notebook extensions aren't compatible with JuputerLab. Fortunately, awesome alternatives are already cropping up on GitHub.

Now we're ready to become Jupyter wizards!

Shell Commands

Every user will benefit at least from time-to-time from the ability to interact directly with the operating system from within their notebook. Any line in a code cell that you begin with an exclamation mark will be executed as a shell command. This can be useful when dealing with datasets or other files, and managing your python packages. As a simple illustration:

!echo Hello World! !pip freeze | grep pandas Hello World! pandas==0.23.4

It is also possible to use Python variables in your shell commands by prepending a $symbol consistent with bash style variable names. message = 'This is nifty' !echo$message This is nifty

Note that the shell in which ! commands are executed is discarded after execution completes, so commands like cd will have no effect. However, IPython magics offer a solution.

Basic Magics

Magics are handy commands built into the IPython kernel that make it easier to perform particular tasks. Although they often resemble unix commands, under the hood they are all implemented in Python . There exist far more magics than it would make sense to cover here, but it's worth highlighting a variety of examples. We will start with a few basics before moving on to more interesting cases.

There are two categories of magic: line magics and cell magics. Respectively, they act on a single line or can be spread across multiple lines or entire cells. To see the available magics, you can do the following:

%lsmagic Available line magics: %alias %alias_magic %autocall %automagic %autosave %bookmark %cd %clear %cls %colors %config %connect_info %copy %ddir %debug %dhist %dirs %doctest_mode %echo %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %macro %magic %matplotlib %mkdir %more %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %popd %pprint %precision %profile %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %ren %rep %rerun %reset %reset_selective %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode Available cell magics: %%! %%HTML %%SVG %%bash %%capture %%cmd %%debug %%file %%html %%javascript %%js %%latex %%markdown %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile Automagic is ON, % prefix IS NOT needed for line magics.

As you can see, there are loads! Most are listed in the official documentation , which is intended as a reference but can be somewhat obtuse in places. Line magics start with a percent character % , and cell magics start with two, %% .

It's worth noting that ! is really just a fancy magic syntax for shell commands, and as you may have noticed IPython provides magics in place of those shell commands that alter the state of the shell and are thus lost by ! . Examples include %cd , %alias and %env .

Let's go through some more examples.

Autosaving

First up, the %autosave magic let's you change how often your notebook will autosave to its checkpoint file.

%autosave 60 Autosaving every 60 seconds

It's that easy!

Displaying Matplotlib Plots

One of the most common line magics for data scientists is surely %matplotlib , which is of course for use with the most popular plotting libary for Python, Matplotlib .

%matplotlib inline

Providing the inline argument instructs IPython to show Matplotlib plot images inline, within your cell outputs, enabling you to include charts inside your notebooks. Be sure to include this magic before you import Matplotlib, as it may not work if you do not; many import it at the start of their notebook, in the first code cell.

Now, let's start looking at some more complex features.

Debugging

The more experienced reader may have had concerns over the ultimate efficacy of Jupyter Notebooks without access to a debugger. But fear not! The IPython kernel has its own interface to the Python debugger, pdb , and several options for debugging with it in your notebooks. Executing the %pdb line magic will toggle on/off the automatic triggering of pdb on error across all cells in your notebook.

%pdb raise NotImplementedError() Automatic pdb calling has been turned ON --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) <ipython-input-31-022320062e1f> in <module>() 1 get_ipython().run_line_magic('pdb', '') ----> 2 raise NotImplementedError() NotImplementedError: > <ipython-input-31-022320062e1f>(2)<module>() 1 get_ipython().run_line_magic('pdb', '') ----> 2 raise NotImplementedError()

This exposes an interactive mode in which you can use the pdb commands .

Another handy debugging magic is %debug , which you can execute after an exception has been raised to delve back into the call stack at the time of failure.

As an aside, also note how the traceback above demonstrates how magics are translated directly into Python commands, where %pdb became get_ipython().run_line_magic('pdb', '') . Executing this instead is identical to executing %pdb .

Timing Execution

Sometimes in research, it is important to provide runtime comparisons for competing approaches. IPython provides the two timing magics %time and %timeit , which each has both line and cell modes. The former simply times either the execution of a single statement or cell, depending on whether it is used in line or cell mode.

n = 1000000 %time sum(range(n)) Wall time: 32.9 ms 499999500000

And in cell mode:

%%time total = 0 for i in range(n): total += i Wall time: 95.8 ms

The notable difference of %timeit from %time is that it runs the specified code many times and computes an average. You can specify the number of runs with the -n option, but if nothing is passed a fitting value will be chosen based on computation time.

%timeit sum(range(n)) 34.9 ms ¡À 276 s per loop (mean ¡À std. dev. of 7 runs, 10 loops each) Executing Different Languages

In the output of %lsmagic above, you may have noticed a number of cell magics named after various programming, scripting or markup langauges, including HTML, JavaScript, Ruby , and LaTeX . Using these will execute the cell using the specified language. There are also extensions available for other languages such as R .

For example, to render HTML in your notebook:

%%HTML This is <em>really</em> neat!

This is really neat!

Similarly, LaTeX is a markup language for displaying mathematical expressions, and can be used directly:

%%latex Some important equations: $$E = mc^2$$ $$e^{i \pi} = -1$$

Some important equations:

$$E = mc^2$$

$$e^{i \pi} = -1$$

Configuring Logging

Did you know that Jupyter has a built-in way to prominently display custom error messages above cell output? This can be handy for ensuring that errors and warnings about things like invalid inputs or parameterisations are hard to miss for anyone who might be using your notebooks. An easy, customisable way to hook into this is via the standard Python logging module.

(Note: Just for this section, we'll use some screenshots so that we can see how these errors look in a real notebook.)

The logging output is displayed separately from print statements or standard cell output, appearing above all of this.

This actually works because Jupyter notebooks listen to both standard output streams , stdout and stderr , but handle each differently; print statements and cell output route to stdout and by default logging has been configured to stream over stderr .

This means we can configure logging to display other kinds of messages over stderr too.

We can customise the format of these messages like so:

Note that every time you run a cell that adds a new stream handler via logger.addHandler(handler) , you will receive an additional line of output each time for each message logged. We could place all the logging config in its own cell near the top of our notebook and leave it be or, as we have done here, brute force replace all existing handlers on the logger. We had to do that in this case anyway to remove the default handler.

It's also easy to log to an external file , which might come in handy if you're executing your notebooks from the command line as discussed later. Just use a FileHandler instead of a StreamHandler :

handler = logging.FileHandler(filename='important_log.log', mode='a')

A final thing to note is that the logging described here is not to be confused with using the %config magic to change the application's logging level via %config Application.log_level="INFO" , as this determines what Jupyter outputs to the terminal while it runs.

Extensions

As it is an open source webapp, plenty of extensions have been developed for Jupyter Notebooks, and there is a long official list . Indeed, in the Working with Databases section below we use the ipython-sql extension. Another of particular note is the bundle of extensions from Jupyter-contrib , which contains individual extensions for spell check, code folding and much more.

You can install and set this up from the command line like so:

pip install jupyter_contrib_nbextensions jupyter contrib nbextension install --user jupyter nbextension enable spellchecker/main jupyter nbextension enable codefolding/main

This will install the jupyter_contrib_nbextensions package in Python, install it in Jupyter, and then enable the spell check and code folding extensions. Don't forget to refresh any notebooks live at the time of installation to load in changes.

Note that Jupyter-contrib only works in regular Jupyter Notebooks, but there are new extensions for JupyterLab now being released on GitHub.

Enhancing Charts with Seaborn

One of the most common exercises Jupyter Notebook users undertake is producing plots. But Matplotlib, Python's most popular charting library, isn't renowned for attractive results despite it's customisability. Seaborn instantly prettifies Matplotlib plots and even adds some additional features pertinent to data science, making your reports prettier and your job easier. It's included in the default Anaconda installation or easily installed via pip install seaborn .

Let's check out an example. First, we'll import our libraries and load some data.

import matplotlib.pyplot as plt import seaborn as sns data = sns.load_dataset("tips") Seaborn provides some built-in sample datasets for documentation, testing and learning purposes, which we will make use of here. This