Tuesday, November 20, 2018

Howto - Linux Delete Common Lines From Two Files


Question: How can I delete lines containing matching text from two files?

Answer:

#cat test1
www.xyz.com/abc-1
www.xyz.com/abc-7
www.xyz.com/abc-8
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5

#cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6





This can be done with the Linux command “comm”. The basic syntax of this command is as follows.
comm [-1] [-2] [-3 ] test1 test2
-1 Suppress the output column of lines unique to test1.
-2 Suppress the output column of lines unique to test2
-3 Suppress the output column of lines duplicated in test1 and test2.
test1 Name of the first file to compare.
test2 Name of the second file to compare.
Before applying “comm”, we need to sort the input files. So, in order to get the lines unique to file1, we can use a combination of “comm” and “sort” commands as follows.
# comm -23 <(sort test1) <(sort test2) > test3
#comm -23 <(sort test2) <(sort test1) > test7
[/home/y100n0]
#cat test7
www.xyz.com/abc-6

#comm -23 <(sort test1) <(sort test2) > test8
[/home/y100n0]
#cat test8
www.xyz.com/abc-1
www.xyz.com/abc-7
www.xyz.com/abc-8