BioInfo
Club - useful tips
Diverse tips
** to write
tab or enter characters in the shell
press Ctrl+V first and then the special character. "Enter" is
represented by "^M"
Useful commands
HEAD / TAIL
$head ctd.txt
shows the first 10 lines
$head -n 2 *.pdb
shows the first 2 lines
$history | tail -n 15
shows the 15 most recent items in your command history
$tail -n +2 *.txt
shows from the second line to the end
$head -n -1 *.txt
shows from the second line to the 10th line
-------
GREP
prints out the lines containing the characters
$grep ">" *.fasta
$grep "\-122" ctd.txt
searches for a negative number
-c
chows only a count of the results
-v
shows only the lines that do not match the pattern. Inverted search.
-i
ignore case
-E
Use regular expressions. Terms should be in quotes, use [] to indicate
a character range, use [[:space:]] for \s, [[:digit:]] for \d.
-n
Show line number of the matches
-------
AGREP
searches for a nearly exact match.
-d "\>"
uses > as a delimiter between records rather than end-of-line
-B -y
returns only the best match
$agrep -B -y -d "\>" CYG FPexcerpt.fta
-2
returns results with up to this many mismatches between query and
record. Maximum allowed is 8.
-l
only lists filenames that contain a match
-i
case-insensitive search
-------
CUT
$cut -f 1,3 *.txt
returns columns 1 and 3 delimited by tabs
$cut -f 1-3 *.txt
returns columns 1 to 3 delimited by tabs
$cut -c 16-20,30 *.txt
returns characters 16 to 20 and 30 from each line
$grep ">" *.fta | cut -c 2-11
prints out the gene names
$head *.txt | cut -f 5,7 -d ","
returns columns 5 and 7. These are delimited by , in the original file
and in the output.
-------
SORT
$grep ">" *.fasta | sort
-n
sorts by numerical value rather than alphabetically
-f
makes all lines uppercase before sorting
-r
sorts in reverse order
-k 3
sorts lines based on column 3 , with columns delimited by space or tab
$head *.txt | sort -k 3
-t ","
uses commas for delimiters
-u
returns a unique representative of repeated items
-------
UNIQ
removes identical lines that are in immediate succession and keeps a
single line.
-c
counts the number of occurrence of each unique line and write it before
each unique line
$cut -c 12-21 ctd.txt | uniq -c
-f 4
ignores the first 4 fields (columns delimited by any number of spaces)
in determining uniqueness
-i
ignore case when determining uniqueness