Big Data Analytics con Python e Spark 2.4: il Corso Completo from Udemy

Impara a utilizzare le Ultime Tecnologie per l'Analisi dei Big Data con il linguaggio di Programmazione più popolare al mondo - Spark e Python .

Siamo entrati nell'era dei Big Data, oggi i dati sono il nuovo petrolio e sapere come elaborarli e analizzarli vuol dire avere un posto di lavoro garantito in un futuro molto prossimo e un vantaggio competitivo enorme rispetto ai rivali in affari.

In questo corso impareremo a lavorare con i Big Data utilizzando Spark, il framework per il calcolo distribuito più popolare al mondo, usato in produzione da giganti come Amazon, Microsoft, Oracle, Verizon e Cisco.

Cosa faremo durante il corso ?

Nella prima sezione del corso introdurre l'argomento Big Data, vedendo cosa sono, da dover arrivano e come possono essere sfruttati.

Vedremo quali sono le principali tecnologie utilizzate per i Big Data: Apache Hadoop, Hadoop MapReduce e Spark, chiarendone le differenze, i punti deboli e i punti di forza.

Nella seconda sezione vedremo come installare e configurare Spark su una macchina locale, prima usando VirtualBox per creare una macchina simulata sulla quale installare Ubuntu, poi creando una macchina remota sfruttando gli Amazon Web Service, nello specifico AWS EC2.

Nella terza sezione impareremo a creare un cluster di macchine con Spark e lo faremo in due modi differenti:

Usando AWS EMR (Elastic MapReduce)
Usando DataBricks, piattaforma per l'analisi dei Big Data co-fondata dallo stesso creatore di Spark.

Nella quarta sezione studieremo la principale struttura dati di Spark: il Resilient Distributed Dataset (RDD), introducendo la teoria del suo funzionamento per poi eseguire qualche esercizio pratico per studiarne le API.

Nella quinta sezione ci sporcheremo le mani con il primo laboratorio in cui analizzeremo un dataset contenente 22.5 milioni di recensioni di prodotti su Amazon.

Nella sesta sezione introdurremo una struttura dati a più alto livello che Spark mette a disposizione dalle sue versioni più recenti: il DataFrame, parleremo brevemente della suo funzionamento per poi vedere come può essere utilizzato nella pratica. Vedremo anche come creare una tabella SQL partendo da un DataFrame per poi interrogarla con query di selezione.

Nella settima sezione svolgeremo un secondo laboratorio, usando un DataFrame per analizzare ben 28 milioni di recensioni di film.

Nell'ottava sezione parleremo di serie storiche (time series) e analizzeremo le azioni di Apple dal 1980 ad oggi.

Nella nona sezione parleremo di Machine Learning, scoprendo come funziona e a cosa serve e studiando i due modelli di base rispettivamente per modelli di Regressione e Classificazione:

La Regressione Lineare
La Regressione Logistica

Al termine di questa sezione introdurremo il modulo MLlib (Machine Learning Library) di Spark, il quale ci permette di costruire modelli di Machine Learning distribuiti.

Nelle sezioni dieci e undici vedremo come utilizzare il modulo MLlib con le sue API per il Dataframe, per risolvere semplici problemi di regressione e classificazione, come:

Stimare il valore di abitazioni partendo dalle loro caratteristiche
Riconoscere un tumore al seno maligno da un'agobiopsia

Nella sezione dodici utilizzeremo le conoscenze acquisite sul Machine Learning e MLlib per costruire un modello di Sentiment Analysis utilizzando il dataset di Yelp, il quale contiene oltre 5 GB di recensioni di locali e attività commerciali.

Per addestrare il modello di Machine Learning sull'intero dataset così grande utilizzeremo un cluster AWS EMR, imparando a configurare un cluster e a importare grandi quantità di dati nel Hadoop File System (HDFS) da un bucket S3 utilizzando l'utility s3-dist-cp.

Nella nona sezione introdurremo uno delle estensioni più hot di Spark: Spark Streaming, che ci permette di analizzare ed elaborare flussi di dati in tempo reale .

Nella decima sezione svolgeremo un progetto usando Spark Streaming e le API di Twitter: monitoreremo tutti i tweets pubblicati in tempo reale, relativi ad un determinato argomento selezionato da noi, e creeremo un grafico interattivo con gli hashtags più popolari .

Perché seguire questo corso ?

I Big Data sono il futuro, sapere come sfruttarli sarà un vantaggio enorme, sia per un professionista che per un imprenditore, non perdere questa occasione .

What's inside

Syllabus

Introduzione

Cosa sono i Big Data ?

Domande Frequenti

I vantaggi dei Big Data

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

È un corso introduttivo che insegna le basi di Spark e Python per l'analisi dei Big Data

Si concentra sull'analisi pratica dei Big Data utilizzando Spark e Python

Fornisce esercitazioni pratiche con grandi set di dati, come recensioni di prodotti Amazon e film

Introduce concetti avanzati come elaborazione di flussi di dati, analisi di serie temporali e Machine Learning

È tenuto da Giuseppe Gullo, un professionista dell'Intelligenza Artificiale

Richiede una certa familiarità con la programmazione e la gestione dei dati

Reviews summary

Apprezzatissimo corso di analisi dei dati con python e spark 2.4

Secondo gli studenti, questo corso di analisi dei dati con Python e Spark 2.4 è largamente positivo. I discenti apprezzano le lezioni coinvolgenti, gli esempi facili da capire e la competenza dell'insegnante. Il corso copre in modo completo i concetti chiave dell'analisi dei dati, compresi dati strutturati e non strutturati, apprendimento automatico e visualizzazione dei dati. Gli studenti hanno trovato il corso facile da seguire e interessante, e molti hanno riferito di aver acquisito nuove competenze e informazioni. Tuttavia, alcuni studenti hanno segnalato problemi con codici obsoleti e discrepanze tra le lezioni video e il materiale didattico. Nel complesso, questo corso è altamente consigliato agli studenti che desiderano approfondire le proprie conoscenze nell'analisi dei dati.

Il corso copre in modo esaustivo i concetti chiave dell'analisi dei dati.

"Very informative and great fit for me.Course was clearly presented as I am new to online learning."

"The course is very informative and well-presented."

"O curso abrangiu todas as áreas do LangChain que eu precisava para montar meu projeto e personalizá-lo conforme minhas necessidades. No geral, excelente."

Studenti riferiscono lezioni coinvolgenti e facili da seguire.

"It’s very engaging so far"

"I like the way the course is set up. I've found the topics easy to remember so far"

"Well explained and really good to see that he takes into account feedback from the students and keep updating the course according to the feedback"

L'insegnante è considerato competente e ben informato.

"knowledgeable and easy to understand"

"The Instructor is GREAT!!!"

"Thank you for sharing your knowledge and making it such a pleasant experience."

Sono stati segnalati problemi con codici obsoleti e discrepanze tra le lezioni video e il materiale didattico.

"yes, it is a good experience until now."

"Contents were good. However, there are some outdated codes which do not functional anymore. Needs to be updated."

"It is very difficult to follow up, and the codes in resources do not match the code in the video."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Analytics con Python e Spark 2.4: il Corso Completo with these activities:

Review Spark Basics

Show steps

Review the basics of Spark to ensure a foundation for success in this course.

Browse courses on Spark

Show steps

Review the Spark documentation
Complete a Spark tutorial
Take a Spark quiz or assessment

Spark Study Group

Show steps

Join a study group to discuss course concepts and collaborate on projects.

Browse courses on Spark

Show steps

Find or start a Spark study group
Meet regularly to discuss course materials
Work together on assignments and projects
Provide support and feedback to each other

Spark RDD Exercises

Show steps

Complete hands-on exercises to strengthen your understanding of Spark RDDs.

Browse courses on Spark

Show steps

Solve RDD problems on LeetCode or HackerRank
Create your own RDD-based Spark application
Attend a Spark RDD workshop

Five other activities

Expand to see all activities and additional details

Show all eight activities

Spark DataFrame Analysis Project

Show steps

Apply your DataFrame skills to analyze a real-world dataset.

Browse courses on Spark

Show steps

Choose a dataset of interest
Load the dataset into Spark
Create DataFrames from the dataset
Perform data analysis tasks on the DataFrames
Visualize the results of your analysis

Contribute to Spark Open Source

Show steps

Contribute to the Spark community by volunteering your time to the project.

Browse courses on Spark

Show steps

Submit a bug report
Find an area of the Spark project to contribute to
Fix a bug or implement a new feature
Review code and provide feedback
Participate in the Spark community forums

Spark Machine Learning Tutorial

Show steps

Deepen your understanding of Spark Machine Learning by creating your own tutorial.

Browse courses on Spark

Show steps

Choose a Machine Learning algorithm to cover
Implement the algorithm using Spark MLlib
Write a step-by-step tutorial explaining your implementation
Share your tutorial with the community

Kaggle Spark Competition

Show steps

Put your Spark skills to the test by participating in a Kaggle competition.

Browse courses on Spark

Show steps

Find a Kaggle competition that uses Spark
Build a Spark-based solution
Submit your solution and track your progress
Learn from the competition and improve your Spark skills

Big Data Analytics Project

Show steps

Apply the skills learned in this course to a comprehensive Big Data Analytics project.

Browse courses on Spark

Show steps

Define the project scope and goals
Gather and prepare the necessary data
Build a Spark-based data pipeline
Perform data analysis and modeling
Present your findings and insights

Career center

Learners who complete Big Data Analytics con Python e Spark 2.4: il Corso Completo will develop knowledge and skills that may be useful to these careers:

Data Analyst

Data Analysts play a vital role in the success of organizations by collecting, analyzing, and interpreting data to improve decision-making. This course provides a comprehensive introduction to Big Data Analytics using Python and Spark, two of the most popular technologies in the field. By taking this course, you will gain the skills necessary to work with large datasets, identify trends, and make predictions, which are essential for success in Data Analyst roles.

See salaries and explore the career path for Data Analyst

Data Scientist

Data Scientists are in high demand due to their expertise in extracting insights from complex data. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Data Scientists. The course covers topics such as data preprocessing, feature engineering, and model building, which are crucial for success in this field.

See salaries and explore the career path for Data Scientist

Machine Learning Engineer

Machine Learning Engineers are responsible for building and deploying machine learning models. This course provides a practical introduction to Machine Learning using Spark MLlib, a popular library for distributed machine learning. By taking this course, you will gain the skills to develop and implement machine learning solutions, which are increasingly used in various industries.

See salaries and explore the career path for Machine Learning Engineer

Data Engineer

Data Engineers are responsible for building and maintaining the infrastructure that supports data analysis. This course provides an overview of Big Data technologies, including Spark and Hadoop, which are essential for Data Engineers. By taking this course, you will gain the skills to design and manage data pipelines, which are critical for handling large amounts of data.

See salaries and explore the career path for Data Engineer

Business Analyst

Business Analysts use data to improve business processes and make better decisions. This course provides a solid foundation in Big Data Analytics, which is increasingly used by Business Analysts to gain insights from large datasets. By taking this course, you will learn how to analyze data, identify trends, and make recommendations, which are essential for success in this field.

See salaries and explore the career path for Business Analyst

Software Engineer

Software Engineers with knowledge of Big Data technologies are in high demand. This course provides a comprehensive introduction to Spark, a popular framework for distributed computing. By taking this course, you will gain the skills to develop and deploy scalable data-intensive applications, which are essential for success in this field.

See salaries and explore the career path for Software Engineer

Database Administrator

Database Administrators (DBAs) are responsible for managing and maintaining databases. This course provides an overview of Big Data technologies, including Spark and Hadoop, which are increasingly used to store and process large datasets. By taking this course, you will gain the skills to manage and optimize Big Data databases, which is essential for success in this field.

See salaries and explore the career path for Database Administrator

Statistician

Statisticians use data to solve problems and make predictions. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Statisticians. By taking this course, you will gain the skills to analyze large datasets, identify trends, and make predictions, which are essential for success in this field.

See salaries and explore the career path for Statistician

Operations Research Analyst

Operations Research Analysts use data to improve the efficiency of systems and processes. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Operations Research Analysts. By taking this course, you will gain the skills to analyze large datasets, identify inefficiencies, and make recommendations for improvement, which are essential for success in this field.

See salaries and explore the career path for Operations Research Analyst

Market Research Analyst

Market Research Analysts use data to understand consumer behavior and market trends. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Market Research Analysts. By taking this course, you will gain the skills to analyze large datasets, identify trends, and make recommendations for marketing strategies, which are essential for success in this field.

See salaries and explore the career path for Market Research Analyst

Financial Analyst

Financial Analysts use data to make investment decisions. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Financial Analysts. By taking this course, you will gain the skills to analyze large datasets, identify trends, and make recommendations for investment strategies, which are essential for success in this field.

See salaries and explore the career path for Financial Analyst

Risk Analyst

Risk Analysts use data to identify and assess risks. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Risk Analysts. By taking this course, you will gain the skills to analyze large datasets, identify risks, and make recommendations for risk mitigation strategies, which are essential for success in this field.

See salaries and explore the career path for Risk Analyst

Quantitative Analyst

Quantitative Analysts use data to develop and implement mathematical models for financial analysis. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Quantitative Analysts. By taking this course, you will gain the skills to analyze large datasets, develop models, and make predictions, which are essential for success in this field.

See salaries and explore the career path for Quantitative Analyst

Actuary

Actuaries use data to assess and manage risk. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Actuaries. By taking this course, you will gain the skills to analyze large datasets, identify risks, and make recommendations for risk management strategies, which are essential for success in this field.

See salaries and explore the career path for Actuary

Data Journalist

Data Journalists use data to tell stories and explain complex issues. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Data Journalists. By taking this course, you will gain the skills to analyze large datasets, identify trends, and present findings in a clear and engaging way, which are essential for success in this field.

See salaries and explore the career path for Data Journalist