cleanup-history-rs
Filters my .bash_history
through a set of regexes, deduplicates, and sorts it
by most recently used.
Based on https://github.com/naggie/dotfiles/blob/master/scripts/cleanup-history
Notes on .bash_history
Format:
#1593575811
echo each command has a timestamp immediately before it
#2
#1593575811
echo 'after multiple timestamp lines, `history` will show the timestamp 1593575811'
#1593575811
#3
echo after multiple timestamp lines, this will show the timestamp 3
#1
echo 'when you run `history` this will show up with a timestamp long ago but still at the end of the list'
#1593575811
echo this will have the same timestamp as others above, duplicates don\'t matter
#1593575812
#1593575813
#1593575814
#1593575815
#1593575816
#1593575817
#1593575818
#1593575819
#1593575820
#1593575821
echo 'once you `history -w` all these extra timestamps will get removed'
#1593576854
for ((i=0;i<5;i++)); do echo $i; done
#1593576854
echo ^^ that was written on multiple lines
#1593576874
echo 'foo
bar'
#1593576874
echo ^^ that was also written on multiple lines, cmdhist=on, lithist=off
Gotchas
If a line starts with #\d+
, it will be interpreted as a timestamp.
$ export HISTFILE=./foo
$ history -c
$ echo 'this
#1234
that'
$ history -w
$ cat foo
#1594044806
echo 'this
#1234
that'
#1594044814
history -w
$ history -c
$ history -r
$ history
1 2020-07-06 08:16.14 | history -r
2 2020-07-06 08:15.15 | echo 'this
3 1969-12-31 17:20.34 | that'
4 2020-07-06 08:15.25 | history -w
5 2020-07-06 08:16.16 | history
$ history -c
$ echo 'foo
#1234 bar
baz'
$ history -w
$ history # correct in memory
1 2020-07-06 08:24.30 | echo 'foo
#1234 bar
baz'
2 2020-07-06 08:24.38 | history -w
3 2020-07-06 08:24.41 | history
$ cat foo
#1594045470
echo 'foo
#1234 bar
baz'
#1594045478
history -w
$ history -c # clear in-memory history
$ history -r # reread from file
$ history # now incorrectly interprets `#1234 bar` as a timestamp
1 2020-07-06 08:19.49 | history -r
2 2020-07-06 08:19.09 | echo 'foo
3 1969-12-31 17:20.34 | baz'
4 2020-07-06 08:19.31 | history -w
5 2020-07-06 08:19.51 | history
Benchmarks
The deduplicated line count is a little different due to slightly different regexes ¯\(ツ)/¯. I think it's close enough to be informational.
$ wc -l bash_history.bak
86636 bash_history.bak
$ hyperfine --warmup=5 --prepare='cp bash_history.bak bash_history_python' \
--export-markdown=bash-history-python.txt \
--time-unit=millisecond \
'python3 cleanup-history.py bash_history_python'
$ wc -l bash_history_python
73149 bash_history_python
$ hyperfine --warmup=5 --prepare='cp bash_history.bak bash_history_rust' \
--export-markdown=bash-history-rust.txt \
--time-unit=millisecond \
'cleanup-history-rs/target/release/cleanup-history bash_history_rust'
$ wc -l bash_history_rust
64638 bash_history_rust
Command | Mean [ms] | Min [ms] | Max [ms] |
---|---|---|---|
python3 cleanup-history.py bash_history_python |
2069.9 ± 112.4 | 1935.1 | 2356.4 |
cleanup-history-rs/target/release/cleanup-history bash_history_rust |
653.5 ± 22.1 | 631.2 | 698.9 |