Cutadapt

We can use Cutadapt to trim further our output and the script more_tidying.sh.

Note

Remember we are following the Hyb_baits_pipeline from the paper Nicholls et al. 2015, with scripts available on github: https://github.com/ckidner/Targeted_enrichment.

You can see the content of the script more_tidying.sh here:
#! /bin/bash -x
# to trim the trimmomatic output one more time
# Needs ck_empties.sh and ck_remove.sh
#Catherine Kidner 3 Nov 2014


echo "Hello world"

acc=$1

echo "You're working on accession $1"


fwd_p=${acc}_forward_paired.fq.gz
rev_p=${acc}_reverse_paired.fq.gz
fwd_un_p=${acc}_forward_unpaired.fq.gz
rev_un_p=${acc}_reverse_unpaired.fq.gz

fwd_p_done=${acc}_trimmed_1.fastq
rev_p_done=${acc}_trimmed_2.fastq
fwd_u_done=${acc}_trimmed_1u.fastq
rev_u_done=${acc}_trimmed_2u.fastq

cutadapt -a AGATCGGAAGAGC $fwd_p > $fwd_p_done 2>> cut_out
cutadapt -a AGATCGGAAGAGC $rev_p > $rev_p_done 2>> cut_out
cutadapt -a AGATCGGAAGAGC $fwd_un_p > $fwd_u_done 2>> cut_out
cutadapt -a AGATCGGAAGAGC $rev_un_p > $rev_u_done 2>> cut_out
./ck_empties.sh $acc
./ck_remove.sh $acc


exit 0

You can see in the top commented section of the script that it requires two other scripts: # Needs ck_empties.sh and ck_remove.sh, also available in GitHub, so make sure you have the correct path for ck_empties.sh and ck_remove.sh the scripts inside more_tidying.sh. We will use more_tidying.sh in a loop with out trimmomatic output as input:

while read f ; do ./more_tidying.sh "$f" ; done < acc

This script will produce four outputs ending in _trimmed_1.fastq (forward paired), _trimmed_2.fastq (reverse paired), _trimmed_1u.fastq (forward unpaired) and _trimmed_2u.fastq (reverse unpaired).

Here is what we have in our folder now:

ls
FG113                                 FG35_1.fastq.gz                       FGIntype_1_fastqc.zip                 KGD465_2_fastqc.html                  zygia917_1.fastq.gz
FG113.empties                         FG35_1_fastqc.html                    FGIntype_2.fastq.gz                   KGD465_2_fastqc.zip                   zygia917_1_fastqc.html
FG113_1.fastq.gz                      FG35_1_fastqc.zip                     FGIntype_2_fastqc.html                KGD465_forward_paired.fq.gz           zygia917_1_fastqc.zip
FG113_1_fastqc.html                   FG35_2.fastq.gz                       FGIntype_2_fastqc.zip                 KGD465_forward_paired_fastqc.html     zygia917_2.fastq.gz
FG113_1_fastqc.zip                    FG35_2_fastqc.html                    FGIntype_forward_paired.fq.gz         KGD465_forward_paired_fastqc.zip      zygia917_2_fastqc.html
FG113_2.fastq.gz                      FG35_2_fastqc.zip                     FGIntype_forward_paired_fastqc.html   KGD465_forward_unpaired.fq.gz         zygia917_2_fastqc.zip
FG113_2_fastqc.html                   FG35_forward_paired.fq.gz             FGIntype_forward_paired_fastqc.zip    KGD465_forward_unpaired_fastqc.html   zygia917_forward_paired.fq.gz
FG113_2_fastqc.zip                    FG35_forward_paired_fastqc.html       FGIntype_forward_unpaired.fq.gz       KGD465_forward_unpaired_fastqc.zip    zygia917_forward_paired_fastqc.html
FG113_forward_paired.fq.gz            FG35_forward_paired_fastqc.zip        FGIntype_forward_unpaired_fastqc.html KGD465_reverse_paired.fq.gz           zygia917_forward_paired_fastqc.zip
FG113_forward_paired_fastqc.html      FG35_forward_unpaired.fq.gz           FGIntype_forward_unpaired_fastqc.zip  KGD465_reverse_paired_fastqc.html     zygia917_forward_unpaired.fq.gz
FG113_forward_paired_fastqc.zip       FG35_forward_unpaired_fastqc.html     FGIntype_reverse_paired.fq.gz         KGD465_reverse_paired_fastqc.zip      zygia917_forward_unpaired_fastqc.html
FG113_forward_unpaired.fq.gz          FG35_forward_unpaired_fastqc.zip      FGIntype_reverse_paired_fastqc.html   KGD465_reverse_unpaired.fq.gz         zygia917_forward_unpaired_fastqc.zip
FG113_forward_unpaired_fastqc.html    FG35_reverse_paired.fq.gz             FGIntype_reverse_paired_fastqc.zip    KGD465_reverse_unpaired_fastqc.html   zygia917_reverse_paired.fq.gz
FG113_forward_unpaired_fastqc.zip     FG35_reverse_paired_fastqc.html       FGIntype_reverse_unpaired.fq.gz       KGD465_reverse_unpaired_fastqc.zip    zygia917_reverse_paired_fastqc.html
FG113_reverse_paired.fq.gz            FG35_reverse_paired_fastqc.zip        FGIntype_reverse_unpaired_fastqc.html KGD465_trimmed_1.fastq                zygia917_reverse_paired_fastqc.zip
FG113_reverse_paired_fastqc.html      FG35_reverse_unpaired.fq.gz           FGIntype_reverse_unpaired_fastqc.zip  KGD465_trimmed_1.fastq.gz             zygia917_reverse_unpaired.fq.gz
FG113_reverse_paired_fastqc.zip       FG35_reverse_unpaired_fastqc.html     FGIntype_trimmed_1.fastq              KGD465_trimmed_1u.fastq               zygia917_reverse_unpaired_fastqc.html
FG113_reverse_unpaired.fq.gz          FG35_reverse_unpaired_fastqc.zip      FGIntype_trimmed_1.fastq.gz           KGD465_trimmed_2.fastq                zygia917_reverse_unpaired_fastqc.zip
FG113_reverse_unpaired_fastqc.html    FG35_trimmed_1.fastq                  FGIntype_trimmed_1u.fastq             KGD465_trimmed_2.fastq.gz             zygia917_trimmed_1.fastq
FG113_reverse_unpaired_fastqc.zip     FG35_trimmed_1.fastq.gz               FGIntype_trimmed_2.fastq              KGD465_trimmed_2u.fastq               zygia917_trimmed_1.fastq.gz
FG113_trimmed_1.fastq                 FG35_trimmed_1u.fastq                 FGIntype_trimmed_2.fastq.gz           acc                                   zygia917_trimmed_1u.fastq
FG113_trimmed_1.fastq.gz              FG35_trimmed_2.fastq                  FGIntype_trimmed_2u.fastq             cut_out                               zygia917_trimmed_2.fastq
FG113_trimmed_1u.fastq                FG35_trimmed_2.fastq.gz               KGD465                                fastqcfiles                           zygia917_trimmed_2.fastq.gz
FG113_trimmed_2.fastq                 FG35_trimmed_2u.fastq                 KGD465.empties                        fastqctrimfile                        zygia917_trimmed_2u.fastq
FG113_trimmed_2.fastq.gz              FGIntype                              KGD465_1.fastq.gz                     more_tidying.sh
FG113_trimmed_2u.fastq                FGIntype.empties                      KGD465_1_fastqc.html                  renaming.sh
FG35                                  FGIntype_1.fastq.gz                   KGD465_1_fastqc.zip                   zygia917
FG35.empties                          FGIntype_1_fastqc.html                KGD465_2.fastq.gz                     zygia917.empties

Tip

In case you are doing your anaylisis in Crop Diversity HPC, you can run this step as an array. An array job is a job in which the script is run concomitantly for each sample and it will be much quicker. In this link you can find more about array jobs: https://help.cropdiversity.ac.uk/slurm-overview.html#array-jobs. Below an example of how to run the more_tidying.sh script as an array:

#!/bin/bash

# to trim the trimmomatic output one more time
# Needs ck_empties.sh and ck_remove.sh
# Catherine Kidner 3 Nov 2014
# Adjusted for an array job in Mar 2022, Flavia Pezzini

#SBATCH --job-name="cutadapt"
#SBATCH --export=ALL
#SBATCH --mail-user=youremail@yourdomain # enter your email to receive a message once it is done.
#SBATCH --mail-type=END,FAIL
#SBATCH --output ./slurm-%x-%A_%a.out # %x gives job name, %A job ID, %a array index
#SBATCH --partition=short
#SBATCH --cpus-per-task=4 #number of threads, not cores
#SBATCH --mem=1G #adjust this according to your data.
#SBATCH --array=0-4 # the number of samples you have. We have five accessions we use 0-4 because Bash array is zero-indexed (instead of 1-5). It is a good practice to ask for a maximum of 25 tasks at a time.


acc=$(sed -n "$SLURM_ARRAY_TASK_ID"p /path/to/my/acc/file)

echo "Hello world"

echo "You're working on accession $acc"

fwd_p=${acc}_forward_paired.fq
rev_p=${acc}_reverse_paired.fq
fwd_un_p=${acc}_forward_unpaired.fq
rev_un_p=${acc}_reverse_unpaired.fq

fwd_p_done=${acc}_trimmed_1.fastq
rev_p_done=${acc}_trimmed_2.fastq
fwd_u_done=${acc}_trimmed_1u.fastq
rev_u_done=${acc}_trimmed_2u.fastq

cutadapt -a AGATCGGAAGAGC $fwd_p > $fwd_p_done 2>> cut_out
cutadapt -a AGATCGGAAGAGC $rev_p > $rev_p_done 2>> cut_out
cutadapt -a AGATCGGAAGAGC $fwd_un_p > $fwd_u_done 2>> cut_out
cutadapt -a AGATCGGAAGAGC $rev_un_p > $rev_u_done 2>> cut_out
./ck_empties.sh $acc
./ck_remove.sh $acc

exit 0

To submit just type: sbatch more_tidying_array.sh