Extract a specific column from a TSV file
Type here
cut -f 2 data.tsv
Extract a specific column from a CSV file
Type here
cut -d ',' -f 1 data.csv
Merge two parallel corpus files side by side
Type here
paste arabic.txt english.txt > parallel.txt
Merge files with a custom delimiter
Type here
paste -d '|' arabic.txt english.txt > parallel.txt
Split a large corpus into smaller files by line count
Type here
split -l 10000 corpus.txt chunk_