Pages

16 February, 2015

Showing the progress of awk scripts

When running awk scripts on big data files, you may want to know how long the process will take. Here is a simple script that will output the fraction of the data that has been processed and an estimate when the processing is finished:
BEGIN {
    ecat="cat >&2"
    clear="\33[2K\r"
    start=systime()
    lines=18000000
}

{
    if(NR%1000  == 0) {
        frac = NR/lines
        elapsed = systime() - start
        eta = elapsed/frac/60
        printf("%s %f% (ETA: %i minutes)", clear, frac*100, eta)  | ecat
    }
}
The script uses the shell escape commands to reset the last printed line, so that the fraction and ETA values are always on the same line in your shell. It outputs to stderr and does not interfere with the data output to stdout. Example output: 7.061% (ETA: 4 minutes)