Fancy numbering of records


banner

For a full list of BASHing data blog posts, see the index page.     RSS

In A Data Cleaner's Cookbook I demonstrate a few of the ways you can give records a serial number using the nl command — for example:

nl demo

With nl alone you can't add a string before a number. To do that, or to embed the serial record numbering within a string, you can use seq with its -f option ("printf" formatting) and paste the result to the list of records:

paste <(seq -f "Record_%03g" 3) nl_file 

paste <(seq -f "Record_%03g_of_3" 3) nl_file

seq demo

Rather than serial record numbers, you can add a field with unique "random" numbers, like a UUID. A simple way to do this is with a while loop:

while read line; do echo -e "$(uuidgen)\t$line"; done < nl_file 

while read; do echo -e "$(uuidgen)\t$REPLY"; done < nl_file

while1

The loop trick can also be used to put the UUID at the end of the record:

while read line; do echo -e "$line\t$(uuidgen)"; done < nl_file

while2

or in the middle, with a bit of cutting and paste-ing:

while read line; do paste <(cut -f1 <<<"$line") <(echo "$(uuidgen)") <(cut -f2 <<<"$line"); done < nl_file

while3

I find UUIDs are overkill for unique record numbering. For a random string of 10 alphanumeric characters I would use

head -n3 /dev/urandom | tr -cd '[:alnum:]' | cut -c -10

like this:

while read line; do echo -e "$(head -n3 /dev/urandom | tr -cd '[:alnum:]' | cut -c -10)\t$line"; done < nl_file

while4

What if your data table has a header line and you don't want the header numbered? See the "Ignore a header" section of the header page in A Data Cleaner's Cookbook.

Interesting numbering problems arise with irregular multi-line records, as in the file "records" shown below. The records are irregular in this case because they might occupy 2, 3 or 4 lines, although they are always separated with "---":

multi1

How to insert a serial number before each record? In this case I've added a 4-digit number with AWK:

awk 'NR==1 {$0="0001" RS $0} /^---/ {$0=$0 RS sprintf("%04d",(++c)+1)} 1' records

multi2

Last update: 2019-02-17