Silveira Neto – Page 83

Merging k lists of size n

Published by Silveira on 2013-01-13

Merging n lists of size k, using two different approaches.

Plantas para apartamento

Published by Silveira on 2012-10-20

Eu estava procurando algumas plantas para dar um tom de verde e Â de vida no ambiente do apartamento.

Como esse Ã© um tema novo pra mim eu pesquisei um pouco e acabei me deparando com aÂ palestraÂ “Como plantar seu prÃ³prio ar fresco” do pesquisador Kamal Meattle no TED.Â Decidi procurar as plantas que ele recomendou:

Chrysalidocarpus lutescensÂ ouÂ Dypsis lutescens,Â conhecida no Brasil como areca-bambu ou palmeira de jardim.
Sansevieria trifasciata,Â conhecida no Brasil como espada-de-sÃ£o-jorge.
Epipremnum aureum, conhecida no Brasil comoÂ jibÃ³ia.

Elas nÃ£o sÃ³ purificam o ar como tambÃ©m sÃ£o fÃ¡ceis de cuidar e eu estava especialmente interessado nessa segunda caracterÃstica jÃ¡ que eu estava procurando algo tendo em vista que:

Durante muitos dias eu tenho pouco ou nenhum tempo disponÃvel pra cuidar das plantas.
Tenho somente uma entrada de iluminaÃ§Ã£o direta do sol no apartamento e essa fica numa janela que tambÃ©m Ã© onde fica o ar-condicionado e o aquecedor. NÃ£o hÃ¡ varanda ou parapeito. TambÃ©m nÃ£o Ã© possÃvel colocar as plantas por fora da janela.
4 estaÃ§Ãµes bem distintas passando por um verÃ£o escaldante com o sol se pondo depois das nove atÃ© um frio glacial com o sol de pondo as 4:40. Durante uma parte do ano muito pouco sol e aquecedor ligado e em outra parte do ano muita lista mas ar-condicionado ligado.
BaixaÂ umidade.Â

Eu acabei escolhendo umaÂ Epipremnum aureum, conhecida nos EUA como “Money Plant” (planta do dinheiro) ou “Pothos”, e umaÂ Chlorophytum comosum, conhecida no Brasil como clorofito eÂ nos EUA como “Spider Plant” (planta aranha). Essa minha clorofito Ã© da variedade Variegatum que tem as folhas verde-escuras. Cada uma custou uns US$ 12 (hoje aproximadamente R$ 24).

Elas estÃ£o sendo bem simples de cuidar e tem sobrevivido muito bem nas condiÃ§Ãµes que eu descrevi acima. Recentemente eu tive que viajar por duas semanas e tive que deixar elas sem Ã¡gua. Antes disso eu tambÃ©m tive que deixar elas em um ambiente com pouco sol porque eu estava cuidando por uns dias de uma gata e essas duas plantas sÃ£o tÃ³xicas para gatos. A jibÃ³ia ficou um pouco fraca e com algumas folhas amareladas masÂ uma semana de volta aos cuidados normais ela voltou ao normal. JÃ¡ a clorofito ficou Ã³tima, nem parece que ficou sem cuidados, e atÃ© cresceu um pouco. Ã‰ uma planta realmente muito forte. Eu atÃ© tenho visto a variedade

O resultado de cuidar dessas plantas jÃ¡ foi sentido no momento que elas entraram no apartamento. O verde que elas trouxeram jÃ¡ mudou completamente o ambiente. Eu jÃ¡ nem sei como eu vivia sem plantas aqui antes. Essas duas estÃ£o em potes em uma superfÃcie plana perto da janela mas tambÃ©m poderiam estar em potes suspensos. Os prÃ³ximos passos sÃ£o experimentar outros mÃ©todos que permitam cultivar algo comestÃvel como tomates e experimentar montar uma Window FarmÂ (fazenda de janela)Â :D.

EntÃ£o, fica aÃ a dica, se estiver procurando uma planta fÃ¡cil e bonita para cultivar dentro do seu apartamento com condiÃ§Ãµes iguais as minhas ou provavelmente melhores, ficam essas dicas.

Latex test

Published by Silveira on 2012-08-30

This:

i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>

Produce this:
$latex i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>$

If you are seeing a complicated math formula in a image then it worked.

Resenha: Superman: Red Son

Published by Silveira on 2012-08-20

Cover of the comic book Superman Red Son

Um belo dia eu lembrei que eu adoro quadrinhos e que por ventura eu estou morando nos Estados Unidos onde por acaso esse tipo de obra Ã© acessÃvel. Me sentindo feito um babaca trancado num quarto escuro num dia de sol eu resolvi dar uma chance aos quadrinhos daqui. Mais especificamente aos quadrinhos de super-herÃ³is.

Apesar de eu gostar mais de quadrinhos europeus e nipÃ´nicos, eu jÃ¡ li alguma coisa de quadrinhos americanos, de Disney a Will Eisner, sÃ³ que eu sempre ignorei os de super-herÃ³is. Sempre achei difÃcil levar a sÃ©rio alguÃ©m com super-poderes desproporcionais ou com cueca por cima da calÃ§a. Foram as adaptaÃ§Ãµes pro cinema (sobretudo o Watchman do Zack Snyder e os Batmans do Christopher Nolan) que me despertaram a curiosidade de ler quadrinhos de super-herÃ³is outra vez.

Dito isto, eu deixo claro que sou um aventureiro neÃ³fito no universo dos quadrinhos de super-herÃ³is. Assim como eu nÃ£o acho crÃvel as pessoas cantarem e sapearem no meio da rua mas faÃ§o umÂ concessÃ£oÂ poÃ©tica ao assistir musicais eu farei uso do mesmoÂ artifÃcioÂ para aceitar os super-poderes e uniformes dos super-herÃ³is. Munido disto eu fui procurar por onde comeÃ§ar e todos os dedos apontaram numa pilha de tÃtulos que incluem o tal Superman: Red SonÂ deÂ Mark Millar.

Todo o rico enredo parte da premissa “E se o Super-Homem tivesse crescido na UniÃ£o SoviÃ©tica?”.

NÃ£o Ã© uma premissa nada absurda se vocÃª lembrar as circunstÃ¢nciasÂ que o Super-Homem chegou Ã Terra. VocÃª provavelmente sabe que o Super-Homem, entÃ£o um bebÃª alienÃgena chamado de Kal-El vindo do planeta Krypton, ele chegou por aqui em uma cÃ¡psula espacial. Onde essa cÃ¡psula espacial caiu? Nos Estados Unidos, mais precisamente noÂ Kansas na cidade de Smallville. Mesmo uma pequena diferenÃ§a no Ã¢ngulo de entrada na atmosfera ou na velocidade da nave, fariam toda a diferenÃ§a. No caso do Superman: Red Son, a capsula caiu na UniÃ£o SoviÃ©tica.

(JÃ¡ parou pra pensar que o Super-Homem, o herÃ³i dos Estados Unidos da AmÃ©rica, Ã© um imigrante ilegal? NÃ£o tem nada a ver com Red Son mas foi algo percebi durante a leitura.)

Eu vou me abster de qualquer detalhe alÃ©m dessa premissa para nÃ£o ir alÃ©m do que jÃ¡ estÃ¡ explÃcito na capa ou nas primeiras pÃ¡ginas. Eu posso garantir que asÂ consequÃªnciaÂ da premissa exposta acima vÃ£o muito alÃ©m do uniforme do Super-homem. Eu tambÃ©m posso revelar queÂ essa Ã© a histÃ³ria em quadrinhos de super-herÃ³is mais alucinante e inteligente que eu jÃ¡ coloquei as mÃ£os atÃ© hoje. O roteiro Ã© sÃ³ envolvente e cheio de sacadas geniais. HÃ¡ muitas referencias ao universo dos super-herÃ³is, eventos polÃticos, e auto-referencias que vocÃª precisa ler novamente pra perceber (e ainda assim sem estragar a primeira leitura). E tem um final… um final, que eu sÃ³ posso fizer que o final Ã© capaz de explodir sua cabeÃ§a e espalhar seus miolos por toda a sala.

mind blowing gif

A ediÃ§Ã£o de capa de papel tem 160 pÃ¡ginas (podia ter muito mais) e custa cerca de 12 obamas. HÃ¡ tambÃ©m uma ediÃ§Ã£o de capa dura para colecionadores que eu muito provavelmente serei obrigado a comprar. HÃ¡ tambÃ©m uma versÃ£o em “revista em quadrinhos animada” (um espÃ©cie de desenho animado com defeitos de formaÃ§Ã£o) mas eu recomendo vocÃª ir direto pro quadrinho. Isso sÃ³ mostra a plasticidade dessa histÃ³ria e como Ã© possÃvel adaptar ela para outras mÃdias. A histÃ³ria Ã© tÃ£o rica que poderia ser quebradas em vÃ¡rias outras obras. Kick-Ass, outro quadrinho do Mark Millar jÃ¡ foi adaptada para o cinema antes.

Superman: Red Son me mostrou que quadrinhos de super-herÃ³is podem ser bem mais do que eu pensava antes. Ã‰ uma obra com comeÃ§o meio e fim, e fechada em si prÃ³pria, ela Ã© suficiente para ser lida mesmo que vocÃª, assim como eu, nÃ£o conheÃ§a muito sobre o universo dos super-herÃ³is. O traÃ§o e as cores sÃ£o lindos tambÃ©m. Eu recomendo muito que vocÃª coloque as mÃ£os em um exemplar e permita sua mente explodir tambÃ©m.

GWU Computer Science Graduate Classes Graph

Published by Silveira on 2012-06-03

In order to help me to takeÂ decisions about which class to take every semesterÂ I did a web scrappingÂ from the graduate and undergraduate bulletin. For every class I could get classe name, prerequisites, credits, teacher, program, description, etc, in a formated tabular document.

Using Python CSV library I could read the tables and parse the data to other formats. One format very useful to handle graph structures is theÂ DOT language script (included in theÂ GraphvizÂ project), in which you can describe both the graph structure and the elements of the graph layout.

Here is the Python source-code to convert the tables to graphs at Github.

The final result (click to view in full size):

Limitations and comments:

Prerequisites are only displayed using AND logic. It’s not showing other logics as OR (equivalent classes).
Errors may exists due to the scrapping process,Â conversions, or in the errors in the original source.
In the sources there is also a function to convert the graph in Dracula (aÂ JavaScript interactive graph representation) but the current result is too tangled.

Bioperl Install and examples

Published by Silveira on 2012-03-09

Perl is a widely used language in bioinformatics. As I already experimented Python and Biopython for handling a few simple bioinformatics tasks I will now try Perl and Bioperl.

Install on Ubuntu 11.10 (oneiric)

Perl already comes with Ubuntu. Bioperl can be installed (without CPAN):

$ sudo apt-get install bioperl

After the installation on have several tools in your PATH:

bp_aacomp, bp_biblio, bp_biofetch_genbank_proxy, bp_bioflat_index, bp_biogetseq, bp_blast2tree, bp_bulk_load_gff, bp_chaos_plot, bp_classify_hits_kingdom, bp_composite_LD, bp_das_server, bp_dbsplit, bp_download_query_genbank, bp_einfo, bp_extract_feature_seq, bp_fast_load_gff, bp_fastam9_to_table, bp_fetch, bp_filter_search, bp_flanks, bp_gccalc, bp_genbank2gff, bp_genbank2gff3, bp_generate_histogram, bp_heterogeneity_test, bp_hivq, bp_hmmer_to_table, bp_index, bp_load_gff, bp_local_taxonomydb_query, bp_make_mrna_protein, bp_mask_by_search, bp_meta_gff, bp_mrtrans, bp_mutate, bp_netinstall, bp_nexus2nh, bp_nrdb, bp_oligo_count, bp_pairwise_kaks, bp_parse_hmmsearch, bp_process_gadfly, bp_process_sgd, bp_process_wormbase, bp_query_entrez_taxa, bp_remote_blast, bp_revtrans-motif, bp_search2BSML, bp_search2alnblocks, bp_search2gff, bp_search2table, bp_search2tribe, bp_seq_length, bp_seqconvert, bp_seqfeature_delete, bp_seqfeature_gff3, bp_seqfeature_load, bp_seqret, bp_seqretsplit, bp_split_seq, bp_sreformat, bp_taxid4species, bp_taxonomy2tree, bp_translate_seq, bp_tree2pag, bp_unflatten_seq

You can try to import a Bioperl module to check if everything is working properly.

#!/bin/perl -w
 
use Bio::Seq;

Writing a nucleotide sequence to a FASTA file

#!/usr/bin/perl -w
 
use Bio::Seq;
use Bio::SeqIO;
 
$seq_obj = Bio::Seq->new(-seq => "gattaca",                        
         -display_id => "#10191997",
         -desc => "Example",                        
         -alphabet => "dna" );
 
$seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta', -format => 'fasta' );
 
$seqio_obj->write_seq($seq_obj);

The output in the sequence.fasta created will be:

#10191997 Example
gattaca

Reading a Genbank file
Opening the same example I used last time (Hippopotamus amphibius mitochondrion, complete genome).

#!/usr/bin/perl -w

use Bio::Seq;
use Bio::SeqIO;

$seqio_obj = Bio::SeqIO->new(-file => "sequence.gb", -format => "genbank" );

while ($seq_obj = $seqio_obj->next_seq){ 
    print $seq_obj->seq,"\n";
}

Online Querying Genbank

With Bioperl is possible to programmatically query and retrieve data directly from GenBank. For example, to retrieve the same mitochondrial genome from the Hippopotamus I used in the example above.

#!/usr/bin/perl -w

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
 
$query = "Hippopotamus amphibius[ORGN] AND NC_000889[LOCUS]";
$query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  -query => $query );
 
$gb_obj = Bio::DB::GenBank->new;
 
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);

while ($seq_obj = $stream_obj->next_seq) {    
	print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}

Substitutions in a phylogenetic tree file

Published by Silveira on 2012-03-08

The newick tree

The Newick tree format is a way of representing a graph trees with edge lengths using parentheses and commas.

A newick tree example:

(((Espresso:2,(Milk Foam:2,Espresso Macchiato:5,((Steamed Milk:2,Cappucino:2,(Whipped Cream:1,Chocolate Syrup:1,Cafe Mocha:3):5):5,Flat White:2):5):5):1,Coffee arabica:0.1,(Columbian:1.5,((Medium Roast:1,Viennese Roast:3,American Roast:5,Instant Coffee:9):2,Heavy Roast:0.1,French Roast:0.2,European Roast:1):5,Brazilian:0.1):1):1,Americano:10,Water:1);

A graphical representation for the newick tree above (using the http://www.jsphylosvg.com/ library):

TheÂ Newick format is commonly used for storeÂ phylogenetic trees.

The problem

A phylogenetic tree can beÂ highly branched and dense and even using proper visualizationÂ softwareÂ can beÂ difficult to analyse it.Â Additionally, as a tree are produced by a chain of differentÂ software with data from the laboratory,Â the label for eachÂ leaf/node can be something notÂ meaningful for a human reader.

For this particular problem, an example of a node label could be SXS_3014_Albula_vulpes_id_30.

There was a spreadsheetÂ withÂ more meaningful informationÂ where a node label could be used as a primary key. Example for the node above:

Taxon Order	Family	Genus	Species	ID
Albuliformes	Albulidae	Albula	vulpes	SXS_3014_Albula_vulpes_id_30

The problem consists in using the tree and the spreadsheetÂ to produce a new tree with the same structure, where each node have a moreÂ meaningful label.

The approach

The new tree can be mounted by substituting each label of the initial tree with the respective information from the spreadsheet. A script can be used toÂ automate this process.

The solution

After converting the spreadsheet to a CSV fileÂ that could be more easily handled by a CSV Python libraryÂ the problem is reduced to a file handling and string substitution.Â Fortunately, due the simplicity of the Newick format and its limited vocabulary, a tree parser is not necessary.

Source-code at Github.

Difficulties found

The spreadsheet was originally in a Microsoft Office Excel 2007 (.xlsx) and the conversion to CSV provided by Excel was not good and there was no configuration option available. Finally, the conversion provided by LibreOffice Productivity Suite was more configurable and was easier to read by the CSV library.

In the script, the DictReader class showed in the the long-term much more reliable and tolerant to changes in the spreadsheet as long the names of the columns remain the same.

P.S. due to the nature of the original sources for the tree and spreadsheetÂ I don’t have the authorizationÂ for public publishing their complete and original content. The artificialÂ data displayed here isÂ merely illustrative.

GenBank renaming

Published by Silveira on 2012-02-20

DNA inspired sculpture by Charles Jencks. Creative Commons photo by Maria Keays.

What is GenBank?

The GenBank sequence database is a widely used collection of nucleotide sequences and their protein translations. A GenBank sequence record file typically has a .gbk or .gb extension and is filled with plain text characters. A example of GenBank file can be found here.

Filename problem

Although there are several metadata are available inside a GenBank record the name of the file are not always in accordance with the content of the file. This is potentially a source of confusion to organize files and requires an additional effort to rename the files according to their content.

Approach using Biopython

The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Among other tools, Biopython includes modules for reading and writing different sequence file formatsÂ including the GenBank’s record files.

Despite the fact that is possible to write a parser for GenBank’ files it would represent a redundant effortÂ to develop andÂ maintain such tool. Biopython can be delegated to perform parsing and focus the programming on renamingÂ mechanism.

Biopython installation on Linux (Ubuntu 11.10) or Apple OS X (Lion)

For both Ubuntu 11.10 and OS X Lion, a modern version of Python already comes out of the box.

For Linux you just need to installÂ the Biopython package. One method to install Biopython in a APT ready distribution as Ubuntu 11.10 (Oneiric Ocelot) is:

# apt-get install python-biopython

For an Apple OS X (Lion) you can install Biopython using easy_install, a popular package manager for the Python. Easy_install is bundled with Setuptools, a set of tools for Python.

To install the Setuptools download the .egg file for your python version (probably setuptools-0.6c11-py2.7.egg) and execute it as a Shell Script:

sudo sh setuptools-0.6c11-py2.7.egg

After this you already have easy_install in place and you can use it to install the Biopython library:

sudo easy_install -f http://biopython.org/DIST/ biopython

For both operational systems you can test if you already have Biopython installed using the Python iterative terminal:

$ python
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import Bio
>>> Bio.__version__
‘1.57’
>>>

Automatic rename example through scripting

Below the Python source-code for a simple use of using Biopython to rename a Genbank file to it’s description after removing commas and spaces.

Using the the previous example of GenBank file, suppose you have a file called sequence.gb. To rename this file to the GenBank description metadata inside it you can use the script.

python gbkrename.py sequence.gb

And after this it will be called Hippopotamus_amphibius_mitochondrial_DNA_complete_genome.gbk.

Improvements

There is plenty of room for improvement as:

Better command line parsing with optparse and parameterization of all possible configuration.
A graphical interface
Handle special cases such multiple sequences in a single GenBank file.

1

Published by Silveira on 2012-01-08

Merry Christmas

Published by Silveira on 2011-12-21

Silveira Neto Posts