Hide

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again.

Delete that bloated snippets file you've been using and share your personal repository with the world. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.


If you have a new feature suggestion or find a bug, please get in touch via http://commandlinefu.uservoice.com/

Get involved!

You can sign-in using OpenID credentials, or register a traditional username and password.

First-time OpenID users will be automatically assigned a username which can be changed after signing in.

Hide

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for:

Hide

News

2011-03-12 - Confoo 2011 presentation
Slides are available from the commandlinefu presentation at Confoo 2011: http://presentations.codeinthehole.com/confoo2011/
2011-01-04 - Moderation now required for new commands
To try and put and end to the spamming, new commands require moderation before they will appear on the site.
2010-12-27 - Apologies for not banning the trolls sooner
Have been away from the interwebs over Christmas. Will be more vigilant henceforth.
2010-09-24 - OAuth and pagination problems fixed
Apologies for the delay in getting Twitter's OAuth supported. Annoying pagination gremlin also fixed.
Hide

Tags

Hide

Functions

List all duplicate directories

Terminal - List all duplicate directories
find . -type d| while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -s -n -k1,1 -k2,2 |awk -F'[ \t]+' '{ idx=$1$2; if (array[idx] == 1) {print} else if (array[idx]) {print array[idx]; print; array[idx]=1} else {array[idx]=$0}}'
2014-02-25 22:50:09
User: knoppix5
Functions: awk du echo find ls read sort wc
1
List all duplicate directories

Very quick! Based only on the content sizes and the character counts of filenames. If both numbers are equal then two (or more) directories seem to be most likely identical.

if in doubt apply:

diff -rq path_to_dir1 path_to_dir2

AWK function taken from here:

http://stackoverflow.com/questions/2912224/find-duplicates-lines-based-on-some-delimited-fileds-on-line

Alternatives

There is 1 alternative - vote for the best!

Terminal - Alternatives

Know a better way?

If you can do better, submit your command here.

What others think

*flatcap shrieks in terror

After looking at the command for five minutes, I ran it on my work directory.

There are no duplicate dirs in my work area, but your command says otherwise.

Most of the false dupes were .git dirs.

Problem 1: counting the number of chars in filenames (in the root) is a very poor measure of "sameness".

My git repos all look the same and so do many of my test dirs (example.c Makefile).

Risk of false positives (high)

Problem 2: "du -s" is bad in both directions. It will match different dirs of the same size

AND it WON'T match identical dirs in some circumstances.

Create a directory with 1000 files in, then delete 999 of them. Now copy that directory.

Most likely "du -s dir_one dir_two" will show different sizes.

Risk of false positives (high). Risk of false negatives (medium).

Problem 3: awk uses $1$2 as its index. This means that $1=1 $2=23 will match $1=12 $2=3.

This is unlikely to be seen in sorted numbers, but it is possible.

Risk of false positive (very low).

Now to the commands. The sort command can be simplifed:

sort -sn -k1,2

You're echoing the results of ls and du, therefore the numbers will be space-delimited. So: awk -F space is OK

awk -F' '

I mentioned the index being risky, so this is safer:

{ idx=$1"."$2;

A simple dot to separate the two numbers. awk doesn't care about the type of idx.

Now the rest of the awk program. It looks like it was copied from the awk one-liner to "uniq" the input.

It's storing every line in array, when it only needs to keep the previous one (the input is sorted).

{ new=$1"."$2; if (new == old) { if (oldline) { print oldline; oldline = ""; } print; } else { old = new; oldline = $0; } }

This does the same thing, but using less memory.

new = numbers from current line

old = number from previous line (empty to start)

oldline = copy of previous entire line (in case the new one matches)

To save space :-) it can be condensed to:

{n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}}

Leaving the new command:

find . -type d|while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -sn -k1,2^Cwk -F' ' '{n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}}'

Enjoy :-)

Comment by flatcap 42 weeks and 2 days ago

Thank you flatcap for your constructive critics.

I saw lacking of any command regarding search for directory dupes.

Above command can give a vague starting point in estimating candidates for more accurate test with:

diff -rq path_to_dir1 path_to_dir2

if disk space is going to be exhausted (=some hard links being considered).

Comment by knoppix5 42 weeks and 1 day ago

I had a good think about how I'd solve the problem.

But I didn't come up with any reliable solutions that didn't involve a lot of processing power.

I'll keep thinking...

Comment by flatcap 42 weeks and 1 day ago

Any fs property which can be expressed digitally will do. What about

time ...echo $(ls -1 "$i"|wc -m)...|wc -l

159

real 0m1.104s

user 0m0.088s

sys 0m0.112s

time ...echo $(ls "$i"| tee >(egrep -c a)...|wc -l

109

real 0m2.105s

user 0m0.060s

sys 0m0.144s

109 lines output vs 159, risk of false positive lower(?).

Comment by knoppix5 42 weeks and 1 day ago
Comment by knoppix5 42 weeks and 1 day ago

actually

echo $(ls "$i"| tee >(egrep -c a) >(egrep -c e)|tail -2|tr -d '\n')

Comment by knoppix5 42 weeks and 1 day ago

Your point of view

You must be signed in to comment.

Related sites and podcasts