2011年12月25日星期日

Some Linux commands useful for manipulating text files

I have learned some Linux utilities for manipulating text files these days. They are extremely handy which is far beyond my expectation. I should have learned them before I used Java to write those unresuable, verbose and error-prone codes to process texts a week ago.

1. cat

This is the simplest tool in this category. The cat command copies its input to output unchanged (identity filter). When supplied a list of file names, it concatenates them onto stdout.

2. head

Display the first few lines of a specified file. Particularly useful when the file is big and you wants to see only the header. You don`t have to open the whole file (which makes the system slow) using this command.

3. tail

Displays the last part of a file, similar to head.

4. cut 

The cut command prints selected parts of input lines. It can select columns (assumes tab-separated input); can select a range of character positions; can also specify delimiter characters.

5. sort 

Sort each line of the file using either lexicographic or arithmetic order. It can use a key for comparison to sort the lines in delimiter seperated files.

6. uniq 

Remove or report adjacent duplicate lines. Useful for finding duplicate and non-duplicate lines.

7. wc

The word count utility, wc, counts the number of lines, characters or words. The line counting feature is my best favorite because it does not have to open the file.

8. tr

Copies standard input to standard output with substitution or deletion of selected characters. It can be used to filter a range of characters in order to make certain conversions.

9. grep 

Search substrings that match the given regular expression, and print the line they reside in. Often used to search for the results we are interested in.

10. sed

More powerful tool compared to the above. It Looks for patterns one line at a time, like grep , but changes lines of the file. It`s a non-interactive text editor. Editing commands come in as a script. There is an interactive editor ed which accepts the same commands. It`s a A Unix filter which is the superset of previously mentioned tools, and it`s syntax is also used in VIM.

Above are the most common used Linux text manipulating utilities. Although they are already very convenient individually, combining them using pipes can make them even more powerful and flexible. If you are a linux admin or academic researcher, these are the commands you must know.
 

没有评论:

发表评论