This project contains the source code and related files for resolving data science problems using Python. Explore external libraries how as Pandas, Matplotlib, Numpy, Beutiful Soup, Regex, and others.
This project contains the source code and related files for resolving data science problems using Python. Explore external libraries how as Pandas, Matplotlib, Numpy, Beautiful Soup, Regex, and others.
data-science/scrape/beutifulsoup/
Create a simple scraper to get information from a web page. With BeautifulSoup & Requests libraries to analyze Web Page using text formatting HTML html.parser. Extract data of links referenced in this web page. This code pre-processes data and sends to the template an object that contains a list of anchors. Render scraped data of all links in this web page.
data-science/books-filter/
This project aims to provide a user-friendly and efficient platform for managing and exploring a catalog of books using Django and XML data. It can be expanded and customized to meet specific requirements and user needs.
data-science/regex/
The project focuses on harnessing the power of Regex to extract, preprocess, and analyze valuable information from raw data. Develop a comprehensive text data processing pipeline that utilizes regular expressions to achieve the following goals: Clean and preprocess text data. Extract structured information and pattern recognition. The project will involve Python programming and the use of a library as 're' for regular expression.
data-science/pandas/data-cleaning/
This Django project is designed to demonstrate how to leverage the Pandas library to retrieve data from CSV files, to give sorting, modifications, and formatting, and then present the processed data on a web page using Django's built-in functionality. The goal is to create a web application that can display structured and formatted data from CSV files in an easily digestible HTML format.
data-science/pandas/data-cleaning/census/
This application employs census data from the United States Census Bureau. To get the dataset, was retrieved the data from the file "census.csv" and processed using DataFrame with Pandas Library. The Counties serve as both political and geographical subdivisions within the United States. This dataset encompasses population statistics for both counties and states across the USA, spanning the years from 2010 to 2015.
data-science/pandas/hypothesis/
In this project, we embark on a comprehensive journey to test the intriguing hypothesis about university towns. We aim to determine whether these towns truly possess housing markets that are more resilient during economic downturns. To accomplish this, we leverage various datasets and employ statistical analyses to gain insights into housing price dynamics.
data-science/data-visualization/
This project delved into historical temperature trends, employing Pandas and Matplotlib in a Python (Django) project. The outcome is a compelling representation of temperature data, revealing how temperatures have evolved over time. It not only enhances our understanding of climate patterns but also showcases the power of data visualization in conveying complex information.
data-science/data-visualization/customize/
In this project, we embark on a comprehensive journey to test the intriguing hypothesis about university towns. We aim to determine whether these towns truly possess housing markets that are more resilient during economic downturns. To accomplish this, we leverage various datasets and employ statistical analyses to gain insights into housing price dynamics.
data-science/data-visualization/subplots/
This project delves into the analysis and visualization of daily climate data across various regions of Indonesia from 2010 to 2020. Leveraging Python libraries like Matplotlib and Pandas, we employ subplots to comprehensively present climate trends from different weather stations. Our focus lies in identifying and illustrating temperature variations and long-term trends using trendlines, offering a nuanced perspective on climate patterns. Through this approach, we aim to facilitate a deeper understanding of complex climate data trends, enhancing our ability to detect patterns and variations over time.
data-science/machine_learning/supervised/knn/
This data science project combines full-stack web development and data analysis to create a predictive model for cancer detection using the K-Nearest Neighbors (KNN) algorithm using Python's scikit-learn library. The project's primary objective is to seamlessly integrate this predictive model into a Django web application, enabling users to perform cancer diagnosis predictions with ease and accuracy.