We may earn an affiliate commission when you visit our partners.
Course image
Course image
Coursera logo

Limpieza de datos para el procesamiento de lenguaje natural

Hernán Daniel Merlino

Este curso te brindará los conocimientos necesarios para la extracción, limpieza y preparación de distintas fuentes de datos para ser incluidos en un proceso de NLP.

Read more

Este curso te brindará los conocimientos necesarios para la extracción, limpieza y preparación de distintas fuentes de datos para ser incluidos en un proceso de NLP.

Para realizar este curso es necesario contar con conocimientos de programación de nivel básico a medio, deseablemente conocimiento básico del lenguaje Python y es recomendable conocer el entorno de Jupyter Notebooks del entorno Anaconda.

Para desarrollar aplicaciones se va a utilizar Python 3.6 o superior. Alternativamente se puede utilizar el entorno de Anaconda con la misma versión de Python.

Como editor de código, los ejemplos van a ser editados en el Notebook de Anaconda, pero el alumno puede utilizar cualquier editor de texto que reconozca notebooks de Anaconda.

Librerías que es necesario tener instaladas para realizar el curso: NLTK, Pandas, Scikit-learn y librerías de extracción de datos.

Enroll now

What's inside

Syllabus

Web Scraping para Procesamiento de Lenguaje Natural
Este módulo te permitirá obtener los conocimientos necesarios para la construcción de un programa de extracción de datos de páginas Web basadas en HTML.
Read more
HTML Parsing para Procesamiento de Lenguaje Natural
En este módulo se describen un conjunto de pasos necesarios para el pre procesar páginas HTML y extraer información de ellas. Además, se detallarán distintos tipos de aproximación al mismo.
Técnicas avanzadas de Scraping
En este módulo se presentarán las técnicas avanzadas de scraping para extracción de datos de páginas HTML que utilizan diversas librerías de JavaScript para su construcción
Técnicas de Manipulación de texto
Una vez estriado el texto de las paginas HTML que es una fuente habitual de extracción de información, se pueden sumar distintas fuentes de tipos de datos, como ser PDF, DOC, XLS e imágenes. En este módulo se verán diversas técnicas que pueden servir para recolectar la información de ellas y unificarlas en un mismo conjunto de documentos.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
El curso brinda conocimientos para la extracción, limpieza y preparación de fuentes de datos para NLP
Está dirigido a personas con conocimientos de programación y Python
Utiliza librerías como NLTK, Pandas y Scikit-learn
Cubre técnicas de scraping para extraer datos de páginas HTML
Enseña técnicas avanzadas para extraer datos de páginas HTML con JavaScript
Incluye técnicas para manipular y unificar diferentes tipos de datos, como PDF, DOC, XLS e imágenes

Save this course

Save Limpieza de datos para el procesamiento de lenguaje natural to your list so you can find it easily later:
Save

Reviews summary

Nlp data cleaning fundamentals

This NLP data cleaning course is an excellent resource for those new to natural language processing. You'll get a comprehensive overview with helpful examples and documentation to jumpstart your NLP journey.

Activities

Coming soon We're preparing activities for Limpieza de datos para el procesamiento de lenguaje natural. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Limpieza de datos para el procesamiento de lenguaje natural will develop knowledge and skills that may be useful to these careers:
Natural Language Processing Engineer
Natural Language Processing Engineers design and build systems that can understand and generate human language. They use a variety of techniques to process and analyze text, and they develop algorithms to solve problems such as machine translation and text summarization. This course may be helpful to Natural Language Processing Engineers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Natural Language Processing Engineers to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which is essential for Natural Language Processing Engineers.
Technical Writer
Technical Writers create and maintain technical documentation, such as user manuals, white papers, and training materials. They use a variety of writing and editing skills to communicate complex technical information clearly and concisely. This course may be helpful to Technical Writers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Technical Writers to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which is essential for Technical Writers.
Data Scientist
Data Scientists use statistical and programming techniques to extract knowledge and insights from data. They develop and apply machine learning models to solve business problems. This course may be helpful to Data Scientists because it teaches techniques for extracting and cleaning data from websites and documents. This can help Data Scientists to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Data Scientists who need to analyze text data.
Machine Learning Engineer
Machine Learning Engineers design, build, and deploy machine learning models. They use a variety of programming languages and tools to automate the process of training and deploying machine learning models. This course may be helpful to Machine Learning Engineers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Machine Learning Engineers to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Machine Learning Engineers who need to analyze text data.
Information Architect
Information Architects design and organize websites and other digital content to make it easy for users to find and use. They use a variety of techniques to create intuitive and user-friendly interfaces. This course may be helpful to Information Architects because it teaches techniques for extracting and cleaning data from websites and documents. This can help Information Architects to understand how users interact with websites and to design interfaces that are more effective. Additionally, the course covers techniques for manipulating text, which can be useful for Information Architects who need to create and manage website content.
Search Engine Optimizer
Search Engine Optimizers help businesses improve the visibility of their websites in search engine results pages. They use a variety of techniques to optimize website content and structure, and they track website traffic to measure the effectiveness of their optimization efforts. This course may be helpful to Search Engine Optimizers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Search Engine Optimizers to understand how search engines work and to optimize website content more effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Search Engine Optimizers who need to create and manage website content.
Librarian
Librarians organize and manage libraries and other information centers. They help users find and access information, and they develop and implement library programs and services. This course may be helpful to Librarians because it teaches techniques for extracting and cleaning data from websites and documents. This can help Librarians to organize and manage library collections more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Librarians who need to create and manage library catalogs and other finding aids.
User Experience Researcher
User Experience Researchers study how users interact with products and services. They use a variety of research methods to identify user needs and preferences, and they design and evaluate user interfaces to improve the user experience. This course may be helpful to User Experience Researchers because it teaches techniques for extracting and cleaning data from websites and documents. This can help User Experience Researchers to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for User Experience Researchers who need to analyze user feedback.
Web Developer
Web Developers design and develop websites and other web applications. They use a variety of programming languages and tools to create websites that are both functional and visually appealing. This course may be helpful to Web Developers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Web Developers to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Web Developers who need to create and manage website content.
Data Engineer
Data Engineers design, build, and maintain data pipelines that collect, transform, and store data. They use a variety of programming languages and tools to automate data processing tasks. This course may be helpful to Data Engineers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Data Engineers to design and build data pipelines that can collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Data Engineers who need to process text data.
Knowledge Engineer
Knowledge Engineers design and build knowledge bases that can be used by computers to solve problems. They use a variety of techniques to represent and organize knowledge in a way that is both computer-readable and human-understandable. This course may be helpful to Knowledge Engineers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Knowledge Engineers to build knowledge bases that are more comprehensive and accurate. Additionally, the course covers techniques for manipulating text, which can be useful for Knowledge Engineers who need to create and manage knowledge base content.
Data Analyst
Data Analysts collect, clean, and analyze data to help businesses make informed decisions. They use a variety of statistical and programming techniques to identify trends, patterns, and relationships in data. This course may be helpful to Data Analysts because it teaches techniques for extracting and cleaning data from websites and documents. This can help Data Analysts to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Data Analysts who need to analyze text data.
Research Analyst
Research Analysts collect, analyze, and interpret data to help businesses make informed decisions. They use a variety of statistical and programming techniques to identify trends, patterns, and relationships in data. This course may be helpful to Research Analysts because it teaches techniques for extracting and cleaning data from websites and documents. This can help Research Analysts to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Research Analysts who need to analyze text data.
Software Engineer
Software Engineers design, develop, and maintain software systems. They use a variety of programming languages and tools to create software that meets the needs of users. This course may be helpful to Software Engineers because it teaches techniques for extracting and cleaning data from websites and documents. This can help Software Engineers to collect and prepare data more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Software Engineers who need to process text data.
Content Moderator
Content Moderators review and assess digital content to ensure that it meets community standards, advertising guidelines, and the law. They flag inappropriate content, including hate speech, violence, and pornography. This course may be helpful to Content Moderators because it teaches techniques for extracting and cleaning data from websites and documents. This can help Content Moderators to identify and flag inappropriate content more efficiently and effectively. Additionally, the course covers techniques for manipulating text, which can be useful for Content Moderators who need to summarize or redact content.

Reading list

We've selected six books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Limpieza de datos para el procesamiento de lenguaje natural.
Esta guía cubre las técnicas de web scraping utilizando Python, incluidas técnicas avanzadas para manejar páginas web complejas.
Es un punto de partida excelente para quienes buscan fundamentos sobre el procesamiento del lenguaje natural. Si bien no se centra específicamente en la limpieza de datos, proporciona conocimientos sólidos sobre los métodos y técnicas esenciales de PNL, lo que lo convierte en una lectura valiosa para este curso.
Este libro proporciona una guía integral para el uso efectivo de Python en la ciencia de datos. Si bien no se centra específicamente en el PNL, ofrece una base sólida en las mejores prácticas y técnicas de Python, que son esenciales para el procesamiento de datos de texto.
Esta guía completa cubre JavaScript, que se utiliza ampliamente para crear páginas web interactivas.
Este libro ofrece una introducción accesible al aprendizaje automático, que es un campo estrechamente relacionado con el PNL. Proporciona una comprensión básica de los conceptos y algoritmos fundamentales que son útiles para el procesamiento de datos de texto.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Limpieza de datos para el procesamiento de lenguaje natural.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser