2a The purpose of this exercise was to introduce you to a set of tools and to make you consider that there is no single converter that will just magically work on every document. Sometimes you will need to try several tools. PDF in particular is designed to make documents that will look similar on paper (or a screen), what we need for this exercise is a tool that can extract text in a logical order. Which may not be possible when extracting from another format. The quote that “PDF is a terrible format to convert from” is from the Calibre manual.
2b The purpose of this exercise was to introduce the need processing with a plain text file and the benefits of concordance analysis over a simple keyword search. Hopefully you will have a chance to explore some of the more advanced features of this tool later.
2c The purpose of this exercise was to introduce Parts Of Speech (POS) tagging. Some analysis tools will require you to generate this type of data to use with their tools, in my case I used this to prepare data for sentiment analysis. The full list of the 58 tags that it applies is described at http://www.laurenceanthony.net/software/tagant/resources/treetagger_tagset.pdf.