| | | | | linguist.page@gmail.com

Count lines, words, and characters in a file

Type here
wc filename

Count only lines in a corpus file

Type here
wc -l corpus.txt

Count words in a corpus file

Type here
wc -w corpus.txt

Count characters in a corpus file

Type here
wc -c corpus.txt

Sort file contents alphabetically

Type here
sort filename

Sort in reverse order

Type here
sort -r filename

Display unique lines only

Type here
uniq filename

Remove duplicate lines from sorted output

Type here
sort corpus.txt | uniq

Count frequency of each unique line

Type here
sort corpus.txt | uniq -c

Sort by frequency descending to get word counts

Type here
sort corpus.txt | uniq -c | sort -rn
Type here
sort corpus.txt | uniq -d

Shuffle lines randomly for training data prep

Type here
shuf corpus.txt > corpus_shuffled.txt