We may earn an affiliate commission when you visit our partners.

Tika in Action

Jukka L. Zitting, Chris A. Mattmann, and Jukka Zitting

Apache Tika is an open source toolkit that makes it easy for search engines, content management systems and other applications to detect and extract content from digital documents in all major file formats.

Tika in Action is a hands-on guide for developers working with search engines, content management systems and other similar applications who want to exploit the information locked in digital documents. It introduces you to the world of mining text and binary documents and other information sources like Internet media types and Dublin Core metadata. The book shows where Tika fits within this landscape and how readers can use Tika to build and extend applications. The book's many case studies give real-world experience from domains ranging from search engines to digital asset management and scientific data processing.

In addition to the architectural overviews, developers will find more detailed information in chapters that focus on advanced features like XMP metadata processing, automatic language detection and custom parser extensions. The book also describes common file formats like MS Word, PDF, HTML, and ZIP and the open source libraries used to process files in these formats. The included code examples are designed support hands-on experimentation.

This book requires no previous knowledge of Tika or text mining techniques, and will be most valuable to readers with a working knowledge of Java. Tika in Action fits perfectly with other Manning books including Lucene in Action, Mahout in Action, Taming Text, Algorithms of the Intelligent Web, and Collective Intelligence in Action.

Read on Amazon
Read this for free with Kindle Unlimited

Related Courses

Save this book

Create your own learning path. Save this book to your list so you can find it easily later.
Save

Share

Help others find this book page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser