Tools
The resources for the workshop
Please be respectful with your use of Mass Observation content.
Calibre is an e-book manager, the reason we needed Calibre was to convert the books we were using from a format we couldn’t use into one we could. In our case we had a set of e-books, most were in epub, azw or pdf formats. We needed to convert them into utf8 encoded plain text so we could analyse the text using other tools. We needed enough books to make us grateful that we could process all the books in a single job rather than just one at a time.
Calibre can ingest and export a lot more formats, it also handles several types of encryption and digital rights management and has some excellent features for working with meta-data. Calibre can import AZW, AZW3, AZW4, CBZ, CBR, CBC, CHM, DJVU, DOCX, EPUB, FB2, HTML, HTMLZ, LIT, LRF, MOBI, ODT, PDF, PRC, PDB, PML, RB, RTF, SNB, TCR, TXT, TXTZ and it can export AZW3, EPUB, DOCX, FB2, HTMLZ, OEB, LIT, LRF, MOBI, PDB, PMLZ, RB, PDF, RTF, SNB, TCR, TXT , TXTZ, ZIP.
In our case we were surprised we needed to; however, somebody had to make the text usable for the work to proceed.
Calibre's purpose is to be an e-Book manager, what we needed was format conversion of a range of electronic document types. When I looked for tools to perform this job this was the one I came across that converted the types we needed. The ability to perform conversion as a batch process meant large folders of unsorted documents could be handled with a few mouse clicks.
The pitfalls of using Calibre as a format converter is that you have to adapt to the way the tool works. In particular the library folder where Calibre stores it’s copies of documents and exported results. This means you'll have to find it's library folder location, this is one of the options when you install the program.
Calibre may not supply you with the sort of error or progress reports you expect - so you should check your results.