An aside: Why Python uses 0-based indexing: a blog post.
Software Carpentry section
Day 1 notebooks: session1, loadingdata and session3. These relate to this lesson.
Day 2 notebooks: session4_loops from this lesson, session5_lists from this lesson, session6_multiple_files from this lesson (what is a glob?), session7_making_choices from this lesson.
mDay 4 (Wednesday) we moved on to command line programs, so here is the code for code/version.py and code/readings_01.py based on this lesson, and we continued that study on Thursday. For more sophisticated command line argument processing, look at argparse.
Exercise for command line: Arithmetic on the Command Line from Software Carpentry lesson on the command line. A solution is in code/my_calculator.py and another solution is code/myarith.py. A demo on using myarith as a module is here. Also have a look at this multi-number version: multi_arg_arithmetic.py.
An aside: A notebook to download the necessary data is download_code_and_data.ipynb.
Python for Biologists section
Programming for Biologists has an exercise on processing bird count data which is rendered in this notebook: bird_problem.
From page 65 in Python for Biologists there there are a couple of exercises on writing FASTA files: exercise_writing_fasta.
Monday 11 March 2019 exercises: monday_exercises. If Github is playing up and not rendering the notebook you can download the notebook by clicking on the “Raw” button and then using right-click and “Save page as” to save the file locally as “monday_exercises.ipynb”. Save the notebook in the folder where your other notebooks are and open it using Jupyter Lab.
Please read chapters 5 and 6 of Python for Biologists on your own time. We will return to the book with Chapter 8 (dictionaries) on Tuesday before covering Chapter 7 (regular expressions) on Wednesday.
Finally k-mer counting.
An exercise for those who know Python: use ssh to log into il-slurmctl-ext.sanbi.ac.za (using your supplied username and password). There is a collection of data in /data/outbreak including a large number of .fastq.gz files containing short sequences and a .fasta.gz file containing a “reference genome” of a bacterial species. Use snippy to try and map the samples to the reference and observe the results. Do they all map equally well?
Project Rosalind, named after Rosalind Franklin, is a bioinformatics programming challenge site that grew out of the work of Philip Compeau and Pavel Pevzner, authors of Bioinformatics Algorithms, an Active Learning Approach.
Python Image Manipulation
Some playful image manipulation of the SANBI logo is in this image_manipulation notebook.
Plotting in Python
Python has many different options for plotting and the plotting_gc notebook illustrates 2: Matplotlib and Altair.
Statistics in Python
Next Generation Sequencing
Cute Pythons with Hats