Bash: Cut a CSV File from specific Column Names

Question

I got a CSV File with a lot of useless information, and I want the information that i need from that file into another CSV File.

Current State:

First Name,Middle Name, Last Name, Title, Suffix, Nickname, Given Yomi, Surname Yomi....
Angel,,Romero,,,Romi,, ....

In new File Should be something like:

First Name, Last Name, Nickname
Angel, Romero, Romi

I want to do that by using something like cut and the Column names, not just the Field numbers. Like this somehow:

cut -d',' -f"First Name" file

I know that doesn't work but is there another way?

As you don't need reordering, cut -d ',' -f '1,3,6' is enough — Fravadona
– Fravadona, Commented Dec 6, 2021 at 13:26
You asked the exact same question. Please edit your original question instead of opening a new one — Aserre
– Aserre, Commented Dec 6, 2021 at 13:43
"I want to do that by using something like cut and the Column names, not just the Field numbers." Why? If that's really what you want, try SQL. Should be reasonably easy to import into any small DB. — Paul Hodges
– Paul Hodges, Commented Dec 6, 2021 at 14:12
The csvcut command from csvkit does exactly this: csvcut -Sc 'First Name','Last Name' file.csv — glenn jackman
– glenn jackman, Commented Dec 6, 2021 at 15:14
@glennjackman I tried that but it always tells me 'csvcut: command not found' even tho i installed csvkit — Mahmoud Abdulkarim
– Mahmoud Abdulkarim, Commented Dec 6, 2021 at 15:31

aborruso · Accepted Answer · 2021-12-07 07:53:17Z

2

The tool is Miller:

mlr --csv cut -o -f "field A","field B" input.csv >output.csv

Here the documentation for cut verb.

answered Dec 7, 2021 at 7:53

aborruso

5,9273 gold badges27 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mahmoud Abdulkarim Over a year ago

for some reason I can't install the tool..

aborruso Over a year ago

@MahmoudAbdulkarim what's your operative system? What errors did you have during the installation?

user448810 · Accepted Answer · 2021-12-06 16:06:49Z

$ cat csvcut.awk
# csvcut.awk

function csvsplit(str, arr,     i,j,n,s,fs,qt) {
    # split comma-separated fields into arr; return number of fields in arr
    # fields surrounded by double-quotes may contain commas;
    #     doubled double-quotes represent a single embedded quote
    delete arr; s = "START"; n = 0; fs = ","; qt = "\""
    for (i = 1; i <= length(str); i++) {
        if (s == "START") {
            if (substr(str,i,1) == fs) { arr[++n] = "" }
            else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" }
            else { j = i; s = "INFIELD" } }
        else if (s == "INFIELD") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j); j = 0; s = "START" } }
        else if (s == "INQUOTES") {
            if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } }
        else if (s == "MAYBEDOUBLE") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j-1)
                gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } }
    if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) }
    else if (s == "MAYBEDOUBLE") {
        arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) }
    else if (s == "START") { arr[++n] = "" }
    return n }

BEGIN { # read and store output field names
    for (i=1; i<ARGC; i++) { fields[++nfields] = ARGV[i]; ARGV[i] = "" } }

NR == 1 { # read and store input field names, write output header
    for (i=1; i<=csvsplit($0,arr); i++) { names[arr[i]] = i }
    for (i=1; i<=nfields; i++) { printf "%s%s", sep, fields[i]; sep = "," }
    printf "\n" }

NR > 1 { # read input record, split fields, write output record
    delete csv; sep = ""; n = csvsplit($0, csv)
    for (i=1; i<=nfields; i++) {
        printf "%s%s", sep, csv[names[fields[i]]]; sep = "," }
    printf "\n" }
$ cat mahmoud.input
FirstName,MiddleName,LastName,Title,Suffix,Nickname,GivenYomi,SurnameYomi
Angel,,Romero,,,Romi,,
$ awk -f csvcut.awk FirstName LastName Nickname <mahmoud.input
FirstName,LastName,Nickname
Angel,Romero,Romi

Ed Morton · Accepted Answer · 2021-12-06 15:20:19Z

1

awk -v tags='First Name,Last Name,Nickname' '
    BEGIN {
        FS=", *"; OFS=", "
        numOutFlds = split(tags,outFldNr2tag)
    }
    NR==1 {
        for (inFldNr=1; inFldNr<=NF; inFldNr++) {
            tag = $inFldNr
            tag2inFldNr[tag] = inFldNr
        }
    }
    {
        for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
            tag = outFldNr2tag[outFldNr]
            inFldNr = tag2inFldNr[tag]
            val = $inFldNr
            printf "%s%s", val, (outFldNr<numOutFlds ? OFS : ORS)
        }
    }
' file
First Name, Last Name, Nickname
Angel, Romero, Romi

edited Dec 6, 2021 at 15:20

answered Dec 6, 2021 at 15:13

Ed Morton

209k18 gold badges90 silver badges213 bronze badges

Comments

dawg · Accepted Answer · 2021-12-06 17:29:06Z

Given that you have a straight CSV without the variable space, you can use Ruby's csv parser directly (without cleaning the csv file first...)

Given:

cat file
First Name,Middle Name,Last Name,Title,Suffix,Nickname,Given Yomi,Surname Yomi
Angel,,Romero,,,Romi,,

You can just filter each csv row:

ruby -r CSV -e 'BEGIN{wanted=["First Name", "Last Name", "Nickname"]
                      puts wanted.to_csv
                      }     
CSV.parse($<.read, headers:true).each{
    |h| puts h.to_hash.select{
    |k,v| wanted.include?(k) }.values.to_csv}' file

Prints:

First Name,Last Name,Nickname
Angel,Romero,Romi

The advantage here is that full csv files are supported including quoted fields with embedded delimiters.

rungekutta · Accepted Answer · 2021-12-12 11:25:15Z

1

Maybe late and not very general, but very simple if you don't need to reuse the script:

awk 'BEGIN {FS=", *"; OFS=","}{print $1,$3,$6}' input.csv > output.csv

answered Dec 12, 2021 at 11:25

rungekutta

396 bronze badges

Collectives™ on Stack Overflow

Bash: Cut a CSV File from specific Column Names

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related