Hide

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again.

Delete that bloated snippets file you've been using and share your personal repository with the world. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.


If you have a new feature suggestion or find a bug, please get in touch via http://commandlinefu.uservoice.com/

Get involved!

You can sign-in using OpenID credentials, or register a traditional username and password.

First-time OpenID users will be automatically assigned a username which can be changed after signing in.

Universal configuration monitoring and system of record for IT.
Hide

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for:

Hide

News

May 19, 2015 - A Look At The New Commandlinefu
I've put together a short writeup on what kind of newness you can expect from the next iteration of clfu. Check it out here.
March 2, 2015 - New Management
I'm Jon, I'll be maintaining and improving clfu. Thanks to David for building such a great resource!
Hide

Top Tags

Hide

Functions

Hide

Credits

List all duplicate directories

Terminal - List all duplicate directories
find . -type d| while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -s -n -k1,1 -k2,2 |awk -F'[ \t]+' '{ idx=$1$2; if (array[idx] == 1) {print} else if (array[idx]) {print array[idx]; print; array[idx]=1} else {array[idx]=$0}}'
2014-02-25 22:50:09
User: knoppix5
Functions: awk du echo find ls read sort wc
1
List all duplicate directories

Very quick! Based only on the content sizes and the character counts of filenames. If both numbers are equal then two (or more) directories seem to be most likely identical.

if in doubt apply:

diff -rq path_to_dir1 path_to_dir2

AWK function taken from here:

http://stackoverflow.com/questions/2912224/find-duplicates-lines-based-on-some-delimited-fileds-on-line

Alternatives

There are 2 alternatives - vote for the best!

Terminal - Alternatives

Know a better way?

If you can do better, submit your command here.

What others think

*flatcap shrieks in terror

After looking at the command for five minutes, I ran it on my work directory.

There are no duplicate dirs in my work area, but your command says otherwise.

Most of the false dupes were .git dirs.

Problem 1: counting the number of chars in filenames (in the root) is a very poor measure of "sameness".

My git repos all look the same and so do many of my test dirs (example.c Makefile).

Risk of false positives (high)

Problem 2: "du -s" is bad in both directions. It will match different dirs of the same size

AND it WON'T match identical dirs in some circumstances.

Create a directory with 1000 files in, then delete 999 of them. Now copy that directory.

Most likely "du -s dir_one dir_two" will show different sizes.

Risk of false positives (high). Risk of false negatives (medium).

Problem 3: awk uses $1$2 as its index. This means that $1=1 $2=23 will match $1=12 $2=3.

This is unlikely to be seen in sorted numbers, but it is possible.

Risk of false positive (very low).

Now to the commands. The sort command can be simplifed:

sort -sn -k1,2

You're echoing the results of ls and du, therefore the numbers will be space-delimited. So: awk -F space is OK

awk -F' '

I mentioned the index being risky, so this is safer:

{ idx=$1"."$2;

A simple dot to separate the two numbers. awk doesn't care about the type of idx.

Now the rest of the awk program. It looks like it was copied from the awk one-liner to "uniq" the input.

It's storing every line in array, when it only needs to keep the previous one (the input is sorted).

{ new=$1"."$2; if (new == old) { if (oldline) { print oldline; oldline = ""; } print; } else { old = new; oldline = $0; } }

This does the same thing, but using less memory.

new = numbers from current line

old = number from previous line (empty to start)

oldline = copy of previous entire line (in case the new one matches)

To save space :-) it can be condensed to:

{n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}}

Leaving the new command:

find . -type d|while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -sn -k1,2^Cwk -F' ' '{n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}}'

Enjoy :-)

Comment by flatcap 79 weeks ago

Thank you flatcap for your constructive critics.

I saw lacking of any command regarding search for directory dupes.

Above command can give a vague starting point in estimating candidates for more accurate test with:

diff -rq path_to_dir1 path_to_dir2

if disk space is going to be exhausted (=some hard links being considered).

Comment by knoppix5 79 weeks ago

I had a good think about how I'd solve the problem.

But I didn't come up with any reliable solutions that didn't involve a lot of processing power.

I'll keep thinking...

Comment by flatcap 79 weeks ago

Any fs property which can be expressed digitally will do. What about

time ...echo $(ls -1 "$i"|wc -m)...|wc -l

159

real 0m1.104s

user 0m0.088s

sys 0m0.112s

time ...echo $(ls "$i"| tee >(egrep -c a)...|wc -l

109

real 0m2.105s

user 0m0.060s

sys 0m0.144s

109 lines output vs 159, risk of false positive lower(?).

Comment by knoppix5 78 weeks and 6 days ago
Comment by knoppix5 78 weeks and 6 days ago

actually

echo $(ls "$i"| tee >(egrep -c a) >(egrep -c e)|tail -2|tr -d '\n')

Comment by knoppix5 78 weeks and 6 days ago

Your point of view

You must be signed in to comment.