Extraire la colonne désirée avec des valeurs

S’il vous plaît aidez-moi avec ce petit script que je fais, je suis en train d’essayer de grep certaines colonnes avec des valeurs d’un gros fichier (tabseparated) (mainFileWithValues.txt) qui a ce format:

ABC ......... (total 700 columns) 80 2.08 23 14 1.88 30 12 1.81 40

Les noms de colonne sont dans column.nam

 cat columnnam.nam A B . . .

jusqu’à 20 mn

Je prends d’abord le numéro de colonne d’un gros fichier en utilisant:

 sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c

Puis, en coupant, j’extrais des valeurs

J’ai fait une boucle pour

 #/bin/bash for i in `cat columnnam.nam` do cut -f`sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c` mainFileWithValues.txt > test.txt done cat test.txt A 80 14 12 B 2.08 1.88 1.81

mon problème est que je veux que la sortie test.txt soit dans les colonnes comme fichier principal. c’est à dire

 AB 80 2.08

Comment puis-je résoudre ce problème dans ce script?

Voici un one-liner:

 awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"")}print ""}' columns.nam mainfile

Explication:

[note: correspondance d’en-tête insensible à la casse, supprime tolower() , si vous voulez une correspondance ssortingcte]

 awk ' FNR==NR{ # Here we read columns.nam file h[NR]=$1; # h -> array, NR -> as array key, $1 -> as array value next # go to next line } { # Here we read second file for(i=1; i in h; i++) # iterate array h { if(FNR==1) # if we are reading 1st row of second file, will parse header { for(j=1; j<=NF; j++) # iterate over fields of 1st row fields { # if it was the field we are looking for if(tolower(h[i])==tolower($j)) { # then # d -> array, i -> as array key which is column order number # j -> as array value which is column number d[i]=j; break } } } # for all records # if field we searched was found then print such field # from d[i] we access, column number printf("%s%s",i>1 ? OFS:"", i in d ? $(d[i]): ""); } # print newline char print "" } ' columns.nam mainfile

Résultats de test:

 $ cat mainfile ABC 80 2.08 23 14 1.88 30 12 1.81 40 $ cat columns.nam A C $ awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"")}print ""}' columns.nam mainfile AC 80 23 14 30 12 40

Vous pouvez également faire un script et exécuter

 akshay@db-3325:/tmp$ cat col_parser.awk FNR == NR { h[NR] = $1; next } { for (i = 1; i in h; i++) { if (FNR == 1) { for (j = 1; j <= NF; j++) { if (tolower(h[i]) == tolower($j)) { d[i] = j; break } } } printf("%s%s", i > 1 ? OFS : "", i in d ? $(d[i]) : ""); } print "" } akshay@db-3325:/tmp$ awk -v OFS="\t" -f col_parser.awk columns.nam mainfile AC 80 23 14 30 12 40

Réponse similaire

AWK pour afficher une colonne basée sur le nom de la colonne et supprimer l’en-tête et le dernier délimiteur

Une autre approche:

 awk 'NR == FNR { hdr[$1] next } FNR == 1 { for (i=1; i<=NF; i++) if ($i in hdr) h[i] } { s="" for (i in h) s = s (s == "" ? "" : OFS) $i print s }' column.nam mainFileWithValues.txt AB 80 2.08 14 1.88 12 1.81

Pour placer le canal de sortie formaté au-dessus de la commande column -t