2011年12月25日星期日

Learn Vim Progressively — 1st Level : Survive

I`m now learning vim, and I found a very interesting article which talks about how to learn vim progressively in four steps. I just paste the whole article here to share with others.

Want to learn vim (the best text editor known to human kind) the fastest way possible. I suggest you a way. Start by learning the minimal to survive, then integrate slowly all tricks.
Vim the Six Billion Dollar editor
Better, Stronger, Faster.
Learn vim and it will be your last text editor. There isn’t any better text editor I know. Hard to learn, but incredible to use.
I suggest you to learn it in 4 steps:
  1. Survive
  2. Feel comfortable
  3. Feel Better, Stronger, Faster
  4. Use vim superpowers
By the end of this journey, you’ll become a vim superstar.
But before we start, just a warning. Learning vim will be painful at first. It will take time. It will be a lot like playing a music instrument. Don’t expect to be more efficient with vim than with another editor in less than 3 days. In fact it will certainly take 2 weeks instead of 3 days.

1st Level – Survive

  1. Install vim
  2. Launch vim
  3. DO NOTHING! Read.
In a standard editor, typing on the keyboard is enough to write something and see it on the screen. Not this time. Vim is in Normal mode. Let’s get in Insert mode. Type on the letter i.
You should feel a bit better. You can type letters like in a standard notepad. To get back in Normal mode just tap the ESC key.
You know how to switch between Insert and Normal mode. And now, the list of command you can use in Normal mode to survive:
  • iInsert mode. Type ESC to return to Normal mode.
  • x → Delete the char under the cursor
  • :wq → Save and Quit (:w save, :q quit)
  • dd → Delete (and copy) current line
  • p → Paste
Recommended:
  • hjkl (highly recommended but not mandatory) → basic cursor move (←↓↑→). Hint: j look like a down arrow.
  • :help <command> → Show help about <command>, you can start using :help without anything else.
Only 5 commands. This is very few to start. Once these command start to become natural (may be after a complete day), you should go on level 2.
But before, just a little remark on Normal mode. In standard editors, to copy you have to use the Ctrl key (Ctrl-c generally). In fact, when you press Ctrl, it is a bit like if all your key change meaning. With vim in Normal mode, it is a bit like if your Ctrl key is always pushed down.
A last word about notations:
  • instead of writing Ctrl-λ, I’ll write <C-λ>.
  • command staring by : will must end by <enter>. For example, when I write :q it means :q<enter>.

Some Linux commands useful for manipulating text files

I have learned some Linux utilities for manipulating text files these days. They are extremely handy which is far beyond my expectation. I should have learned them before I used Java to write those unresuable, verbose and error-prone codes to process texts a week ago.

1. cat

This is the simplest tool in this category. The cat command copies its input to output unchanged (identity filter). When supplied a list of file names, it concatenates them onto stdout.

2. head

Display the first few lines of a specified file. Particularly useful when the file is big and you wants to see only the header. You don`t have to open the whole file (which makes the system slow) using this command.

3. tail

Displays the last part of a file, similar to head.

4. cut 

The cut command prints selected parts of input lines. It can select columns (assumes tab-separated input); can select a range of character positions; can also specify delimiter characters.

5. sort 

Sort each line of the file using either lexicographic or arithmetic order. It can use a key for comparison to sort the lines in delimiter seperated files.

6. uniq 

Remove or report adjacent duplicate lines. Useful for finding duplicate and non-duplicate lines.

7. wc

The word count utility, wc, counts the number of lines, characters or words. The line counting feature is my best favorite because it does not have to open the file.

8. tr

Copies standard input to standard output with substitution or deletion of selected characters. It can be used to filter a range of characters in order to make certain conversions.

9. grep 

Search substrings that match the given regular expression, and print the line they reside in. Often used to search for the results we are interested in.

10. sed

More powerful tool compared to the above. It Looks for patterns one line at a time, like grep , but changes lines of the file. It`s a non-interactive text editor. Editing commands come in as a script. There is an interactive editor ed which accepts the same commands. It`s a A Unix filter which is the superset of previously mentioned tools, and it`s syntax is also used in VIM.

Above are the most common used Linux text manipulating utilities. Although they are already very convenient individually, combining them using pipes can make them even more powerful and flexible. If you are a linux admin or academic researcher, these are the commands you must know.
 

2011年12月12日星期一

More array problems

Today I summarize some array problems that an be solved using hash tables.
Remove duplicates in an array.
Before starting with this question, we should ask the interviewer what`s the type of elements in the array. If the answer is character, well, the only thing we need is a boolean array of size 256 (assuming the charset is ASCII). If the answer is integer or other larger types, we should use a hash table.

We solve this question using two indecies, reader and writer. We iterate each element using reader and find its existence in hash table. If it does not exist, we insert an entry in hash table, copy the value indexed by reader to that of writer, and increment both indecies. Otherwise, we increment only the reader index.

This solution costs O(n) time and O(n) space. If extra space is not allowed, we can use any in-place sorting algorithm (e.g. quick sort) to sort the array and then use similar reader and writer technique to remove duplicates. This will cost O(nlogn) time and O(1) space.
Decide if two strings are anagrams.
If number of each character in two strings are the same, then they are anagrams. We can scan the first string and increment the number of occurence of each character using a hash table. Then we scan the second string, decrementing the counter of each encountered character. In the end, if all the entries in the hash table are zero, then the two strings are anagrams.

Another extremely simple solution is to sort two strings and compare if there are equal.
Find the intersection of two arrays.
We assume that there is no duplicate in each array. We scan the first array and mark the occurence of each element. Then we scan the second array and find if each element can be found in hash table. If so, we append it to the result.
Given a number N, find pairs of number a1, a2 in an array such that a1 + a2 = N.
We will scan the array for two passes. In the first pass, we calculate N - a1, where a1 is each element. We insert it into hash table which key is N - a1 and value is a1`s index. Then in the second pass, we use N - a2, where a2 is each element, to find the target index using the hash table.

2011年12月10日星期六

More linked list problems

There are a lot of commonly asked linked list questions. They can range from simple to much more challenging. The following are some questions that can all be solved using two pointers ptr1 and ptr2 with different moving speed. They are relatively simple so I won`t post code here, but just the method:
Find the middle element of a linked list.
Let ptr1 moves 1 step and ptr2 moves 2 steps in each loop. When ptr2 reaches the end, ptr1 is pointing the middle element. 
Find the nth last element of a linked list.
Let ptr2 move n steps ahead first, and then let two pointers move together. When ptr2 reaches the end, ptr1 is pointing the nth last element.
Two linked lists of different length intersect at one node. Find this node.
We should first find the lengths of two linked lists by iterating through the whole list. Suppose the length of two lists are len1 and len2, len1 < len2, the difference d = len2 - len1. Now reset two pointers and let the pointer of longer list move d steps first. Then move two pointers simultaneously and compare the data of them. When data is equal, we find the node.
Given a circular linked list, find node at the beginning of the loop.
Let N be the node we want to find. In each loop, ptr1 moves 1 step and ptr2 moves 2 steps, until they meet at one node. The distance from this node to N is equal to the distance from head node to N. Now we reset ptr1 to head node and begin to move two pointers simultaneously. When they meet again, N is found there.

Maximum subsequence sum problem

The problem we’d like to solve can be stated as:
Given a sequence of numbers, find the maximum sum of a contiguous subsequence of those numbers.
One obvious solution is that we can simply enumerate every possible subsequence and calculate their sum. This will take O(n^2) time.

There is a linear solution called Kadane`s algorithm. Following is the code:

int max_sub_sum(int arr[], int len) {
    int max_local = 0;
    int max_global = 0;
    for(int i = 0; i < len; i++) {
        max_local += arr[i];
        if(max_local < 0)
            max_local = 0;
        if(max_local > max_global)
            max_global = max_local;
    }
    return max_global;
}

Why does it work? If max_local becomes negative, it will not contribute to the next subsequence by adding the next element. Actually the next element itself would be larger than the sum of it and max_local, so we can simply restart summing from here.

Note that this algorithm only works when there is at least one positive integer in the array.

Some interviewers may ask this question with slight modification that the input array is a circular array. In this case, we can concatenate the input array with itself, and apply Kadane`s algorithm.

2011年12月9日星期五

Ellipse fitting and parameter conversion

The third assignment of Computer Vision course involves an interesting problem. Given a large number of points in Cartesian coordinate system, how to find an ellipse that best fit these points. In this paper the illustration and solution of this problem is presented.

The theory and proofs are not trivial and I can`t understand all of them. However, it`s implementation is very simple. Below is a six-line Matlab implementation of the algorithm.


function a = fit_ellipse(x, y)
    D = [x.*x x.*y y.*y x y ones(size(x))];
    S = D' * D;
    C(6, 6) = 0; C(1, 3) = -2; C(2, 2) = 1; C(3, 1) = -2;
    [gevec, geval] = eig(S, C);
    [NegR, NegC] = find(geval < 0 & ~isinf(geval));
    a = gevec(:, NegC);
end

The input of this function is two vectors containing x and y coordinates of the points, respectively. The output is the 6 parameters of implicit equation of the ellipse.

Our TA is very smart and he used a trick to simplify the problem for us to implement. So we can avoid solving the general eigenvector problem, which is not provided by OpenCV library. However, the eigen()function provided by OpenCV can only solve symmetric matrix, which is not our case. Fortunately, there is another linear algebra C++ library called Eigen, and OpenCV provided functions to convert data structures needed by both libraries back and forth. With Eigen, we can get things done, but it leaves me rather messy code. This is why I don`t post my C++ implementation here.

Having only the implicit equation is not sufficient, because we cannot get any useful information from it directly. To visualize the ellipse, we have to get semi-major axis, semi-minor axis, rotation and center from the 6 parameters. Below is the C++ code:

    // Input parameters: a, b, c, d, e, f
    phi = atan(b / (a - c)) * 0.5; // rotation
    double cos_phi = cos(phi);
    double sin_phi = sin(phi);
    double u = (2 * c * d - b * e) / (b * b - 4 * a * c);
    double v = (2 * a * e - b * d) / (b * b - 4 * a * c);
    center = cv::Vec2d(u, v);        // center

    // eliminate rotation and recalculate 6 parameters
    double aa = a * cos_phi * cos_phi - b * cos_phi * sin_phi + c * sin_phi * sin_phi;
    double bb = 0;
    double cc = a * sin_phi * sin_phi + b * cos_phi * sin_phi + c * cos_phi * cos_phi;
    double dd = d * cos_phi - e * sin_phi;
    double ee = d * sin_phi + e * cos_phi;
    double ff = 1 + (d * d) / (4 * a) + (e * e) / (4 * c);

    sx = sqrt(ff / aa);              // semi-major axis
    sy = sqrt(ff / cc);              // semi-minor axis

    if(sx < sy) {
        double temp = sx;
        sx = sy;
        sy = temp;
    }

Now we can use information to visualize the ellipse. Here is an example of my result:

The red dots are cross section points of human body and we want to fit. The ellipse shown is the result. We can see that waist, left and right biceps are quite good, but chest, hips, left and right thigh are not satisfactory. One possible explaination is that there may be many noisy points outside this visual range that affect the result.

Find the sum of all prime numbers below N

This is another classic question. Everyone would immediately come up with an idea that we can iterate from 2 to N and check if it is a prime number. If so, we add it to the sum.

bool isprime(int num) {
    int j = sqrt((float)num);
    for(int i = 2; i <= j; i++) {
        if(num % i == 0)
            return false;
    }
    return true;
}

int primesum(int range) {
    int sum = 2;
    for(int i = 3; i <= range; i += 2) {
        if(isprime(i))
            sum += i;
    }
    return sum;
}

This solution uses Trail Division to test if a number is prime. The trial factors need go no further than sqrt(n) because, if n is divisible by some number p, then n = p * q and if q were smaller than p, n would have earlier been detected as being divisible by q or a prime factor of q

When calculating the sum, we only need to check even numbers. This would save half of the time.

The time complexity of this method is O(n * sqrt(n)), and it requires O(1) space. The algorithm can be further optimized if we have a smaller pre-generated prime number table and test only prime factors.


If extra space is allowed, there is a better solution which is called Seive of Eratosthenes. The basic idea is to directly generate composite numbers rather than testing for prime numbers. The Seive of Eratosthenes algorithm is described as follows:
  1. Create a list of consecutive integers from 2 to n: (2, 3, 4, ..., n).
  2. Initially, let p equal 2, the first prime number.
  3. Starting from p, count up in increments of p and mark each of these numbers greater than p itself in the list. These numbers will be 2p, 3p, 4p, etc.; note that some of them may have already been marked.
  4. Find the first number greater than p in the list that is not marked; let p now equal this number (which is the next prime).
  5. If there were no more unmarked numbers in the list, stop. Otherwise, repeat from step 3.
When the algorithm terminates, all the numbers in the list that are not marked are prime.

int primesum(int range) {
    int sum = 2;
    bool* prime_table = new bool[range];
    memset(prime_table, true, sizeof(bool));

    for(int i = 2; i <= range / 2; i++) {
        if(prime_table[i]) {
            for(int j = 2 * i; j <= range; j += i) {
                prime_table[j] = false;
            }
        }
    }
    for(int i = 3; i <= range; i += 2) {
        if(prime_table[i])
            sum += i;
    }
    return sum;
}

The time complexity is O(nloglogn), and space complexity if O(n).

Above is only the implementation of original version. There are some refinements of this algorithm. For further informaton, please go to this link http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.

Reverse a singly linked list


This is a very classic interview question. It demonstrates the ability to work with basic data structure and pointers. The seemingly easy question has some pitfalls. Nearly every candidate starts with two temporary pointers and eventually finds out that you need three. You need to point to the current node (the one you’re handling), the previous node (so you can point back to it), and the next node (so you can prevent lost of the next node). Also, you should not forget to make the first node point to NULL.

There are two solutions to this question. One is iterative and the other is recursive. 

Node* reverse_iterative(Node* head) {
    Node* prev = head;
    Node* curr = head->next;
    Node* next = head->next->next;
    prev->next = NULL;

    while(curr->next) {
        curr->next = prev;
        prev = curr;
        curr = next;
        next = next->next;
    }
    curr->next = prev;
    return curr;
}


The recursive version is shown as follow:

Node* reverse(Node* prev, Node* curr) {
    // If reached the end of the list, return the head
    if(curr->next == NULL) {
        curr->next = prev;
        return curr;
    }
    // Remember the next node
    Node* next = curr->next;
    // Change current pointer to previous node
    curr->next = prev;
    return reverse(curr, next);
}

Node *reverse_recursive(Node* curr)
{
    return reverse(NULL, curr);
}

Although recursive version is more straightforward, in general we prefer the iterative version because if the list is very large, this version would fail because of stack overflow.

Note that there are some variations of this question like "How to print a singly linked list in reverse order". In this case, if we are allowed to alter the data structure, we can reverse linked list first and print it from the beginning.

2011年12月8日星期四

Find projection of a vector on a plane


Given a vector V in 3D space and a normal vector N of a plane, how to calculate the projection of V on the plane?
The following is my C++ and OpenCV implementation of the algorithm.
 
Vec3d findProjection(Vec3d V, Vec3d N) {
    Vec3d v, n;
    // We convert both vectors to unit vectors
    normalize(V, v);
    normalize(N, n);
    // Then calculate the cos angle between n and v
    double cosVN = v.dot(n);
    // Make sure the angle is in range of -90 to 90 degrees
    if(cosVN < 0) {
        cosVN = -cosVN;
        n = -n;
    }
    double V_len = sqrt(V.dot(V));
    double _N_len = V_len * cosVN;
    // _N is the projection of V on N
    Vec3d _N = _N_len * n;       
    return V - _N;
}