Skip to content

Tag: programming

Regex with negatives lookahead and lookbehind

"Looking different directions" by Paul Kline at (https://www.flickr.com/photos/paulelijah/6717953239/)
“Looking different directions” by Paul Kline.

Problem: Match strings that contains a single quotation mark ('), but not multiple ones together.

Solution:

(?<!')'(?!')

This is a regex for a single quotation mark with a (?<!') in the left and a (?!’) in the right. The (?<!') is a ?< look behind if not ! a single quotation mark '. The (?!') is a look ahead ? if not ! a single quotation mark '.

Java code:

[java]import java.util.regex.Pattern;

public class RegexProblem {
public static void main(String args[]) {
Pattern single_quote = Pattern.compile("(?<!’)'(?!’)");
String[] phrases = {
"",
"’",
"a’a",
"aaa",
"aa’aa",
"aa”aa",
"aa”’aaa",
"aaa””aaa"
};
for(String phrase: phrases){
System.out.println(String.format("For %s is %s.", phrase,
single_quote.matcher(phrase).find()));
}
}
}
[/java]

The output is:

For  is false.
For ' is true.
For a'a is true.
For aaa is false.
For aa'aa is true.
For aa''aa is false.
For aa'''aaa is false.
For aaa''''aaa is false.

Java, printing arrays

As I keep forgetting, this post is to remind me that Java Java doesn’t have a pretty toString() for arrays objects, and it does for Lists.

[java]import java.util.List;
import java.util.ArrayList;
import java.util.Arrays;

public class ListsExample {
public static void main (String args[]) {
// as an array
String[] list1 = {"a","b","c"};
System.out.println(Arrays.toString(list1));

// as an List provided by Arrays
List<String> list2 = Arrays.asList("d", "e", "f");
System.out.println(list2);

// as an implementation of the interface List
// ArrayList (could also be LinkedList, Vector, etc)
List<String> list3 = new ArrayList<String>();
list3.add("g");
list3.add("h");
list3.add("i");
System.out.println(list3);
}
}
[/java]

The output is:

[a, b, c]
[d, e, f]
[g, h, i]

Telephone keypad combinations

Problem: Given a sequence of numbers, show all possible letter combinations in a telephone keypad.

Recursive solution in Python:
[python]
keyboard = {
‘1’: [],
‘2’: [‘a’,’b’,’c’],
‘3’: [‘d’,’e’,’f’],
‘4’: [‘g’,’h’,’i’],
‘5’: [‘j’,’k’,’l’],
‘6’: [‘m’,’n’,’o’],
‘7’: [‘p’,’q’,’r’,’s’],
‘8’: [‘t’,’u’,’v’],
‘9’: [‘w’,’x’,’y’,’z’],
‘0’: []
}

def printkeys(numbers, prefix=""):
if len(numbers)==0:
print prefix
return

for letter in keyboard[numbers[0]]:
printkeys(numbers[1:], prefix+letter)

printkeys("234")
[/python]

Output:

adg
adh
adi
aeg
aeh
aei
afg
afh
afi
bdg
bdh
bdi
beg
beh
bei
bfg
bfh
bfi
cdg
cdh
cdi
ceg
ceh
cei
cfg
cfh
cfi

permutations implemented in Python

In case you can’t use Python’s itertools or in case you want a simple, recursive python implementation for a permutation of a list:

[python]
def perm(a,k=0):
if(k==len(a)):
print a
else:
for i in xrange(k,len(a)):
a[k],a[i] = a[i],a[k]
perm(a, k+1)
a[k],a[i] = a[i],a[k]

perm([1,2,3])
[/python]

Output:

[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 2, 1]
[3, 1, 2]

This Python implementation is based in the algorithm presented in the book Computer Algorithms by Horowitz, Sahni and Rajasekaran.

Bash Brace Expansion

photo by whiskeyandtears at https://www.flickr.com/photos/whiskeyandtears/2140154564

[bash]$ echo {0..9}
0 1 2 3 4 5 6 7 8 9[/bash]

[bash]$ echo b{a,e,i,o,u}
ba be bi bo bu
[/bash]

[bash]$ echo x{0..9}y
x0y x1y x2y x3y x4y x5y x6y x7y x8y x9y[/bash]

[bash]$ echo {a..z}
a b c d e f g h i j k l m n o p q r s t u v w x y z[/bash]

[bash]$ echo {1..3} {A..C}
1 2 3 A B C[/bash]

[bash]$ echo {1..3}{A..C}
1A 1B 1C 2A 2B 2C 3A 3B 3C[/bash]

[bash]echo {a,b{1,2,3},c}
a b1 b2 b3 c[/bash]

[bash]$ mkdir -p {project1,project2}/{src,tst,bin,lib}/
$ find .
.
./project1
./project1/tst
./project1/bin
./project1/lib
./project1/src
./project2
./project2/tst
./project2/bin
./project2/lib
./project2/src
[/bash]

[bash]$ echo {{A..Z},{a..z}}
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z[/bash]

[bash]$ for i in {a..f} 1 2 {3..5} ; do echo $i;done
a
b
c
d
e
f
1
2
3
4
5[/bash]

The examples below requires Bash version 4.0 or greater.

[bash]$echo {001..9}
001 002 003 004 005 006 007 008 009
[/bash]

[bash]$ echo {1..10..2}
1 3 5 7 9
[/bash]

GenBank renaming

http://www.flickr.com/photos/maria_keays/1251843227/
DNA inspired sculpture by Charles Jencks. Creative Commons photo by Maria Keays.

What is GenBank?

The GenBank sequence database is a widely used collection of nucleotide sequences and their protein translations. A GenBank sequence record file typically has a .gbk or .gb extension and is filled with plain text characters. A example of GenBank file can be found here.

Filename problem

Although there are several metadata are available inside a GenBank record the name of the file are not always in accordance with the content of the file. This is potentially a source of confusion to organize files and requires an additional effort to rename the files according to their content.

Approach using Biopython

The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Among other tools, Biopython includes modules for reading and writing different sequence file formats including the GenBank’s record files.

Despite the fact that is possible to write a parser for GenBank’ files it would represent a redundant effort to develop and maintain such tool. Biopython can be delegated to perform parsing and focus the programming on renaming mechanism.

Biopython installation on Linux (Ubuntu 11.10) or Apple OS X (Lion)

For both Ubuntu 11.10 and OS X Lion, a modern version of Python already comes out of the box.

For Linux you just need to install the Biopython package. One method to install Biopython in a APT ready distribution as Ubuntu 11.10 (Oneiric Ocelot) is:

# apt-get install python-biopython

For an Apple OS X (Lion) you can install Biopython using easy_install, a popular package manager for the Python. Easy_install is bundled with Setuptools, a set of tools for Python.

To install the Setuptools download the .egg file for your python version (probably setuptools-0.6c11-py2.7.egg) and execute it as a Shell Script:

sudo sh setuptools-0.6c11-py2.7.egg

After this you already have easy_install in place and you can use it to install the Biopython library:

sudo easy_install -f http://biopython.org/DIST/ biopython

For both operational systems you can test if you already have Biopython installed using the Python iterative terminal:

$ python
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import Bio
>>> Bio.__version__
‘1.57’
>>>

Automatic rename example through scripting

Below the Python source-code for a simple use of using Biopython to rename a Genbank file to it’s description after removing commas and spaces.

Using the the previous example of GenBank file, suppose you have a file called sequence.gb. To rename this file to the GenBank description metadata inside it you can use the script.

python gbkrename.py sequence.gb

And after this it will be called Hippopotamus_amphibius_mitochondrial_DNA_complete_genome.gbk.

Improvements

There is plenty of room for improvement as:

  • Better command line parsing with optparse and parameterization of all possible configuration.
  • A graphical interface
  • Handle special cases such multiple sequences in a single GenBank file.

Python, flatten a list

Surprisingly python doesn’t have a shortcut for flatten a list (more generally a list of lists of lists of…).

I made a simple implementation that doesn’t use recursion and tries to be written clearly.

I get a element from a “notflat” list (a list that can have another lists on it). If a element is not a list we store in our flat list. If the element is still a list we deal with him later. The flat list always have only elements that are not a list.
To preserve the original order we reverse the elements at the end.

OpenPixels: simple sprite sheet with Processing

/**
 * Openpixels example in Processing.
 * This simple example of how to get a sprite 
 * from a sprite sheet.
 */

PImage bg;
PImage sprite_sheet;
PImage player;

void setup() { 
  // load images
  bg = loadImage("kitchen.png");
  sprite_sheet = loadImage("guy.png");
  
  /* The sprite size is 32x49.
     Look guy.png, the "stand position" is at (36,102). */
  
  player = createImage(32, 49, ARGB);
  player.copy(sprite_sheet, 36, 102, 32, 49, 0, 0, 32, 49);

  // set screen size and background
  size(bg.width, bg.height);  
  background(bg);
  
  frameRate(30);
}

void draw() { 
  background(bg);
  image(player, 100, 50);
}

See more at OpenPixels.