We may earn an affiliate commission when you visit our partners.
Giuseppe Gullo and Profession AI

Impara a utilizzare le Ultime Tecnologie per l'Analisi dei Big Data con il linguaggio di Programmazione più popolare al mondo - Spark e Python .

Siamo entrati nell'era dei Big Data, oggi i dati sono il nuovo petrolio e sapere come elaborarli e analizzarli vuol dire avere un posto di lavoro garantito in un futuro molto prossimo e un vantaggio competitivo enorme rispetto ai rivali in affari.

In questo corso impareremo a lavorare con i Big Data utilizzando Spark, il framework per il calcolo distribuito più popolare al mondo, usato in produzione da giganti come Amazon, Microsoft, Oracle, Verizon e Cisco.

Read more

Impara a utilizzare le Ultime Tecnologie per l'Analisi dei Big Data con il linguaggio di Programmazione più popolare al mondo - Spark e Python .

Siamo entrati nell'era dei Big Data, oggi i dati sono il nuovo petrolio e sapere come elaborarli e analizzarli vuol dire avere un posto di lavoro garantito in un futuro molto prossimo e un vantaggio competitivo enorme rispetto ai rivali in affari.

In questo corso impareremo a lavorare con i Big Data utilizzando Spark, il framework per il calcolo distribuito più popolare al mondo, usato in produzione da giganti come Amazon, Microsoft, Oracle, Verizon e Cisco.

Cosa faremo durante il corso ?

Nella prima sezione del corso introdurre l'argomento Big Data, vedendo cosa sono, da dover arrivano e come possono essere sfruttati.

Vedremo quali sono le principali tecnologie utilizzate per i Big Data: Apache Hadoop, Hadoop MapReduce e Spark, chiarendone le differenze, i punti deboli e i punti di forza.

Nella seconda sezione vedremo come installare e configurare Spark su una macchina locale, prima usando VirtualBox per creare una macchina simulata sulla quale installare Ubuntu, poi creando una macchina remota sfruttando gli Amazon Web Service, nello specifico AWS EC2.

Nella terza sezione impareremo a creare un cluster di macchine con Spark e lo faremo in due modi differenti:

  • Usando AWS EMR (Elastic MapReduce)

  • Usando DataBricks, piattaforma per l'analisi dei Big Data co-fondata dallo stesso creatore di Spark.

Nella quarta sezione studieremo la principale struttura dati di Spark: il Resilient Distributed Dataset (RDD), introducendo la teoria del suo funzionamento per poi eseguire qualche esercizio pratico per studiarne le API.

Nella quinta sezione ci sporcheremo le mani con il primo laboratorio in cui analizzeremo un dataset contenente 22.5 milioni di recensioni di prodotti su Amazon.

Nella sesta sezione introdurremo una struttura dati a più alto livello che Spark mette a disposizione dalle sue versioni più recenti: il DataFrame, parleremo brevemente della suo funzionamento per poi vedere come può essere utilizzato nella pratica. Vedremo anche come creare una tabella SQL partendo da un DataFrame per poi interrogarla con query di selezione.

Nella settima sezione svolgeremo un secondo laboratorio, usando un DataFrame per analizzare ben 28 milioni di recensioni di film.

Nell'ottava sezione parleremo di serie storiche (time series) e analizzeremo le azioni di Apple dal 1980 ad oggi.

Nella nona sezione parleremo di Machine Learning, scoprendo come funziona e a cosa serve e studiando i due modelli di base rispettivamente per modelli di Regressione e Classificazione:

  • La Regressione Lineare

  • La Regressione Logistica

Al termine di questa sezione introdurremo il modulo MLlib (Machine Learning Library) di Spark, il quale ci permette di costruire modelli di Machine Learning distribuiti.

Nelle sezioni dieci e undici vedremo come utilizzare il modulo MLlib con le sue API per il Dataframe, per risolvere semplici problemi di regressione e classificazione, come:

  • Stimare il valore di abitazioni partendo dalle loro caratteristiche

  • Riconoscere un tumore al seno maligno da un'agobiopsia

Nella sezione dodici utilizzeremo le conoscenze acquisite sul Machine Learning e MLlib per costruire un modello di Sentiment Analysis utilizzando il dataset di Yelp, il quale contiene oltre 5 GB di recensioni di locali e attività commerciali. 

Per addestrare il modello di Machine Learning sull'intero dataset così grande utilizzeremo un cluster AWS EMR, imparando a configurare un cluster e a importare grandi quantità di dati nel Hadoop File System (HDFS) da un bucket S3 utilizzando l'utility s3-dist-cp.

Nella nona sezione introdurremo uno delle estensioni più hot di Spark: Spark Streaming, che ci permette di analizzare ed elaborare flussi di dati in tempo reale .

Nella decima sezione svolgeremo un progetto usando Spark Streaming e le API di Twitter: monitoreremo tutti i tweets pubblicati in tempo reale, relativi ad un determinato argomento selezionato da noi, e creeremo un grafico interattivo con gli hashtags più popolari .

Perché seguire questo corso ?

I Big Data sono il futuro, sapere come sfruttarli sarà un vantaggio enorme, sia per un professionista che per un imprenditore, non perdere questa occasione .

Enroll now

What's inside

Syllabus

Introduzione
Cosa sono i Big Data ?
Domande Frequenti
I vantaggi dei Big Data
Read more

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
È un corso introduttivo che insegna le basi di Spark e Python per l'analisi dei Big Data
Si concentra sull'analisi pratica dei Big Data utilizzando Spark e Python
Fornisce esercitazioni pratiche con grandi set di dati, come recensioni di prodotti Amazon e film
Introduce concetti avanzati come elaborazione di flussi di dati, analisi di serie temporali e Machine Learning
È tenuto da Giuseppe Gullo, un professionista dell'Intelligenza Artificiale
Richiede una certa familiarità con la programmazione e la gestione dei dati

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Apprezzatissimo corso di analisi dei dati con python e spark 2.4

Secondo gli studenti, questo corso di analisi dei dati con Python e Spark 2.4 è largamente positivo. I discenti apprezzano le lezioni coinvolgenti, gli esempi facili da capire e la competenza dell'insegnante. Il corso copre in modo completo i concetti chiave dell'analisi dei dati, compresi dati strutturati e non strutturati, apprendimento automatico e visualizzazione dei dati. Gli studenti hanno trovato il corso facile da seguire e interessante, e molti hanno riferito di aver acquisito nuove competenze e informazioni. Tuttavia, alcuni studenti hanno segnalato problemi con codici obsoleti e discrepanze tra le lezioni video e il materiale didattico. Nel complesso, questo corso è altamente consigliato agli studenti che desiderano approfondire le proprie conoscenze nell'analisi dei dati.
Il corso copre in modo esaustivo i concetti chiave dell'analisi dei dati.
"Very informative and great fit for me.Course was clearly presented as I am new to online learning."
"The course is very informative and well-presented."
"O curso abrangiu todas as áreas do LangChain que eu precisava para montar meu projeto e personalizá-lo conforme minhas necessidades. No geral, excelente."
Studenti riferiscono lezioni coinvolgenti e facili da seguire.
"It’s very engaging so far"
"I like the way the course is set up. I've found the topics easy to remember so far"
"Well explained and really good to see that he takes into account feedback from the students and keep updating the course according to the feedback"
L'insegnante è considerato competente e ben informato.
"knowledgeable and easy to understand"
"The Instructor is GREAT!!!"
"Thank you for sharing your knowledge and making it such a pleasant experience."
Sono stati segnalati problemi con codici obsoleti e discrepanze tra le lezioni video e il materiale didattico.
"yes, it is a good experience until now."
"Contents were good. However, there are some outdated codes which do not functional anymore. Needs to be updated."
"It is very difficult to follow up, and the codes in resources do not match the code in the video."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Analytics con Python e Spark 2.4: il Corso Completo with these activities:
Review Spark Basics
Review the basics of Spark to ensure a foundation for success in this course.
Browse courses on Spark
Show steps
  • Review the Spark documentation
  • Complete a Spark tutorial
  • Take a Spark quiz or assessment
Spark Study Group
Join a study group to discuss course concepts and collaborate on projects.
Browse courses on Spark
Show steps
  • Find or start a Spark study group
  • Meet regularly to discuss course materials
  • Work together on assignments and projects
  • Provide support and feedback to each other
Spark RDD Exercises
Complete hands-on exercises to strengthen your understanding of Spark RDDs.
Browse courses on Spark
Show steps
  • Solve RDD problems on LeetCode or HackerRank
  • Create your own RDD-based Spark application
  • Attend a Spark RDD workshop
Five other activities
Expand to see all activities and additional details
Show all eight activities
Spark DataFrame Analysis Project
Apply your DataFrame skills to analyze a real-world dataset.
Browse courses on Spark
Show steps
  • Choose a dataset of interest
  • Load the dataset into Spark
  • Create DataFrames from the dataset
  • Perform data analysis tasks on the DataFrames
  • Visualize the results of your analysis
Contribute to Spark Open Source
Contribute to the Spark community by volunteering your time to the project.
Browse courses on Spark
Show steps
  • Submit a bug report
  • Find an area of the Spark project to contribute to
  • Fix a bug or implement a new feature
  • Review code and provide feedback
  • Participate in the Spark community forums
Spark Machine Learning Tutorial
Deepen your understanding of Spark Machine Learning by creating your own tutorial.
Browse courses on Spark
Show steps
  • Choose a Machine Learning algorithm to cover
  • Implement the algorithm using Spark MLlib
  • Write a step-by-step tutorial explaining your implementation
  • Share your tutorial with the community
Kaggle Spark Competition
Put your Spark skills to the test by participating in a Kaggle competition.
Browse courses on Spark
Show steps
  • Find a Kaggle competition that uses Spark
  • Build a Spark-based solution
  • Submit your solution and track your progress
  • Learn from the competition and improve your Spark skills
Big Data Analytics Project
Apply the skills learned in this course to a comprehensive Big Data Analytics project.
Browse courses on Spark
Show steps
  • Define the project scope and goals
  • Gather and prepare the necessary data
  • Build a Spark-based data pipeline
  • Perform data analysis and modeling
  • Present your findings and insights

Career center

Learners who complete Big Data Analytics con Python e Spark 2.4: il Corso Completo will develop knowledge and skills that may be useful to these careers:
Data Analyst
Data Analysts play a vital role in the success of organizations by collecting, analyzing, and interpreting data to improve decision-making. This course provides a comprehensive introduction to Big Data Analytics using Python and Spark, two of the most popular technologies in the field. By taking this course, you will gain the skills necessary to work with large datasets, identify trends, and make predictions, which are essential for success in Data Analyst roles.
Data Scientist
Data Scientists are in high demand due to their expertise in extracting insights from complex data. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Data Scientists. The course covers topics such as data preprocessing, feature engineering, and model building, which are crucial for success in this field.
Machine Learning Engineer
Machine Learning Engineers are responsible for building and deploying machine learning models. This course provides a practical introduction to Machine Learning using Spark MLlib, a popular library for distributed machine learning. By taking this course, you will gain the skills to develop and implement machine learning solutions, which are increasingly used in various industries.
Data Engineer
Data Engineers are responsible for building and maintaining the infrastructure that supports data analysis. This course provides an overview of Big Data technologies, including Spark and Hadoop, which are essential for Data Engineers. By taking this course, you will gain the skills to design and manage data pipelines, which are critical for handling large amounts of data.
Business Analyst
Business Analysts use data to improve business processes and make better decisions. This course provides a solid foundation in Big Data Analytics, which is increasingly used by Business Analysts to gain insights from large datasets. By taking this course, you will learn how to analyze data, identify trends, and make recommendations, which are essential for success in this field.
Software Engineer
Software Engineers with knowledge of Big Data technologies are in high demand. This course provides a comprehensive introduction to Spark, a popular framework for distributed computing. By taking this course, you will gain the skills to develop and deploy scalable data-intensive applications, which are essential for success in this field.
Database Administrator
Database Administrators (DBAs) are responsible for managing and maintaining databases. This course provides an overview of Big Data technologies, including Spark and Hadoop, which are increasingly used to store and process large datasets. By taking this course, you will gain the skills to manage and optimize Big Data databases, which is essential for success in this field.
Statistician
Statisticians use data to solve problems and make predictions. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Statisticians. By taking this course, you will gain the skills to analyze large datasets, identify trends, and make predictions, which are essential for success in this field.
Operations Research Analyst
Operations Research Analysts use data to improve the efficiency of systems and processes. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Operations Research Analysts. By taking this course, you will gain the skills to analyze large datasets, identify inefficiencies, and make recommendations for improvement, which are essential for success in this field.
Market Research Analyst
Market Research Analysts use data to understand consumer behavior and market trends. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Market Research Analysts. By taking this course, you will gain the skills to analyze large datasets, identify trends, and make recommendations for marketing strategies, which are essential for success in this field.
Financial Analyst
Financial Analysts use data to make investment decisions. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Financial Analysts. By taking this course, you will gain the skills to analyze large datasets, identify trends, and make recommendations for investment strategies, which are essential for success in this field.
Risk Analyst
Risk Analysts use data to identify and assess risks. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Risk Analysts. By taking this course, you will gain the skills to analyze large datasets, identify risks, and make recommendations for risk mitigation strategies, which are essential for success in this field.
Quantitative Analyst
Quantitative Analysts use data to develop and implement mathematical models for financial analysis. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Quantitative Analysts. By taking this course, you will gain the skills to analyze large datasets, develop models, and make predictions, which are essential for success in this field.
Actuary
Actuaries use data to assess and manage risk. This course provides a practical introduction to Big Data Analytics using Python and Spark, essential technologies for Actuaries. By taking this course, you will gain the skills to analyze large datasets, identify risks, and make recommendations for risk management strategies, which are essential for success in this field.
Data Journalist
Data Journalists use data to tell stories and explain complex issues. This course provides a solid foundation in Big Data Analytics using Python and Spark, essential technologies for Data Journalists. By taking this course, you will gain the skills to analyze large datasets, identify trends, and present findings in a clear and engaging way, which are essential for success in this field.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Big Data Analytics con Python e Spark 2.4: il Corso Completo.
Is the official guide to Spark, written by its creators. It provides a deep dive into Spark's internals and covers advanced topics such as performance tuning, security, and integration with other big data technologies.
Comprehensive guide to Spark, covering everything from its architecture and programming model to advanced topics such as graph processing and machine learning. It's an excellent resource for both beginners and experienced Spark users.
Is the official guide to Hadoop, written by its creator. It provides a deep dive into Hadoop's architecture and programming model, and covers advanced topics such as performance tuning, security, and integration with other big data technologies.
Comprehensive guide to using Python for data analysis. It covers a wide range of topics, from data manipulation and visualization to machine learning. It's a valuable resource for anyone who wants to use Python for data analysis.
Provides a practical introduction to data science, covering topics such as data collection, analysis, and visualization. It's a good resource for business professionals who want to learn more about how data can be used to improve decision-making.
Guide to using Apache Spark for advanced analytics. It covers a wide range of topics, including machine learning, streaming data, and graph analytics.
Hands-on guide to using Python for deep learning. It covers a wide range of topics, including deep neural networks, convolutional neural networks, and recurrent neural networks.
Practical guide to using Python for natural language processing. It covers a wide range of topics, including natural language understanding, natural language generation, and machine translation.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser