2a AntFileConverter

  • 1. Run AntFileConverter
  • 2. Convert the 12th May 2015 Folder
  • 3. Check your results (You should have errors, you may not notice them all.)

2b AntConc

  • 4. Run AntConc
  • 5. Open a single document, then try some of the analysis tools.
  • 6. Open a whole folder

2c TagAnt

  • 7. Run TagAnt
  • 8. Tag at least 1 document for parts of speech (POS), Then open it in your text editor to see what it did.

Why did you do this?

2a The purpose of this exercise was to introduce you to a set of tools and to make you consider that there is no single converter that will just magically work on every document. Sometimes you will need to try several tools. PDF in particular is designed to make documents that will look similar on paper (or a screen), what we need for this exercise is a tool that can extract text in a logical order. Which may not be possible when extracting from another format. The quote that “PDF is a terrible format to convert from” is from the Calibre manual.

2b The purpose of this exercise was to introduce the need processing with a plain text file and the benefits of concordance analysis over a simple keyword search. Hopefully you will have a chance to explore some of the more advanced features of this tool later.

2c The purpose of this exercise was to introduce Parts Of Speech (POS) tagging. Some analysis tools will require you to generate this type of data to use with their tools, in my case I used this to prepare data for sentiment analysis. The full list of the 58 tags that it applies is described at http://www.laurenceanthony.net/software/tagant/resources/treetagger_tagset.pdf.

Ant Convert

Ant Conc

Tag Ant