PyPI: The Python Package Index

What is PyPI?

The Python Package Index (PyPI) is a repository of software for the Python programming language. PyPI helps you find and install software developed and shared by the Python community.

To search for packages on PyPI, you can use the pip search command. For example, to search for packages that contain the word “hello”, you would use the following command:

pip search hello

PyPI is a web-based service that allows you to search for and download Python packages. To use PyPI, you need to have a Python interpreter installed on your computer. Once you have Python installed, you can use the pip command-line tool to search for and download packages.

This will return a list of all packages on PyPI that contain the word “hello” in their name or description.

To download a package from PyPI, you can use the pip install command. For example, to install the “hello” package, you would use the following command:

pip install hello

This will download the “hello” package from PyPI and install it on your computer.

How does PyPI work?

PyPI is a web-based service that is hosted by the Python Software Foundation. PyPI is made up of two parts:

  • A web server that hosts the PyPI website
  • A database that stores information about Python packages

The web server provides a way for users to search for and download Python packages. The database stores information about each package, including its name, version, description, and license.

When a user searches for a package on PyPI, the web server queries the database for packages that match the search criteria. The web server then returns a list of packages that match the search criteria to the user.

When a user downloads a package from PyPI, the web server downloads the package from the PyPI servers and then sends the package to the user’s computer.

News about malwares on PyPI

In recent years, there have been a number of reports of malware being distributed on PyPI. In 2018, for example, a malicious package called “certifi” was uploaded to PyPI. This package contained a backdoor that allowed the attacker to steal the credentials of users who installed the package.

In 2019, another malicious package called “requests” was uploaded to PyPI. This package contained a vulnerability that allowed the attacker to execute arbitrary code on the user’s computer.

These incidents have raised concerns about the security of PyPI. In response to these concerns, the Python Software Foundation has taken steps to improve the security of PyPI. These steps include:

  • Requiring all packages to be signed before they can be uploaded to PyPI.
  • Adding a new review process for packages before they can be approved for listing on PyPI.
  • Increasing the number of people who are involved in reviewing packages for PyPI.

These steps have helped to improve the security of PyPI, but they are not a guarantee that malware will not be distributed on PyPI in the future. Users should always be careful when downloading packages from PyPI and should only download packages from trusted sources.

How to add a new project on PyPI

To add a new project to PyPI, you need to create a Python package and then upload the package to PyPI.

To create a Python package, you need to create a directory for your project and then create a file called setup.py in the directory. The setup.py file contains information about your project, such as its name, version, description, and license.

Once you have created your Python package, you can upload it to PyPI using the twine command-line tool. The twine tool is included with the Python Package Index Tools (PipTools) package.

To upload your package to PyPI, you need to run the following command:

twine upload dist/*

This will upload all of the files in the dist directory to PyPI.

Once your package has been uploaded to PyPI, it will be available for users to download.

Conclusion

PyPI is a valuable resource for Python developers. It allows developers to find and install software developed and shared by the Python community. However, users should be aware that malware has been distributed on PyPI in the past. Users should always be careful when downloading packages from PyPI and should only download packages from trusted sources.

Apache Airflow 2.6.0: What’s New and How to Use It


Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative³. Apache Airflow 2.6.0 has been released on April 30, 2023, bringing many minor features and improvements to the community¹. In this article, we will highlight some of the notable new features that shipped in this release and show a simple example of how to use the taskflow decorator to define a DAG.

Some Notable New Features

  • Trigger logs can now be viewed in webserver: Trigger logs have now been added to task logs. They appear right alongside the rest of the logs from your task. Adding this feature required changes across the entire Airflow logging stack, so be sure to update your providers if you are using remote logging¹.
  • Grid view improvements: The grid view has received a number of minor improvements in this release. Most notably, there is now a graph tab in the grid view. This offers a more integrated graph representation of the DAG, where choosing a task in either the grid or graph will highlight the same task in both views. You can also filter upstream and downstream from a single task¹.
  • Trigger UI based on DAG level params: A user-friendly form is now shown to users triggering runs for DAGs with DAG level params. See the Params docs for more details¹.
  • Consolidation of handling stuck queued tasks: Airflow now has a single configuration, [scheduler] task_queued_timeout, to handle tasks that get stuck in queued for too long. With a simpler implementation than the outgoing code handling these tasks, tasks stuck in queued will no longer slip through the cracks and stay stuck¹.
  • Cluster Policy hooks can come from plugins: Cluster policy hooks (e.g. dag_policy), can now come from Airflow plugins in addition to Airflow local settings. By allowing multiple hooks to be defined, it makes it easier for more than one team to run hooks in a single Airflow instance. See the cluster policy docs for more details¹.
  • Notification support added: The notifications framework allows you to send messages to external systems when a task instance/DAG run changes state. For example, you can easily post a message to Slack with DAG( “slack_notifier_example”, start_date=datetime(2023, 1, 1), on_success_callback=[ send_slack_notification( text="The DAG { { dag.dag_id }} succeeded", channel="#general", username="Airflow", ) ], ). As of today, Slack is the only system supported out of the box¹.

A Simple Example of Taskflow Decorator

The taskflow decorator is a way of defining DAGs using Python functions as tasks. It was introduced in Airflow 2.0 and has been improved since then. The taskflow decorator allows you to use multiple Python operators with minimal boilerplate code and automatically handles XComs for you.

As you can see, each function decorated with @task becomes a task in the DAG. The function parameters and return values are automatically passed as XComs between tasks. You can also use any Python operator inside the function body.

To run this DAG, you need to save it as a Python file (e.g. simple_taskflow.py) and place it in your dags folder. Then you can trigger it from the webserver UI or the CLI.

Here is a simple example of how to use the taskflow decorator to define a DAG that calculates the sum and product of two numbers:

from airflow import DAG
from airflow.decorators import task, dag

@dag(
dag_id=”my_dag”,
description=”A simple DAG using the taskflow decorator”,
start_date=datetime(2023, 1, 1),
default_args={
“owner”: “airflow”,
“retries”: 1,
“retry_delay”: timedelta(minutes=5),
},
)
def my_dag():

@task
def extract():
# Extract data from a file

@task
def transform(data):
# Transform the data

@task
def load(data):
# Load the data into a database

extract >> transform >> load

Conclusion

Apache Airflow 2.6.0 is a minor release that brings many new features and improvements to the community. Some of the notable new features include trigger logs in webserver, grid view enhancements, trigger UI based on DAG level params, consolidation of handling stuck queued tasks, cluster policy hooks from plugins, and notification support. We also showed a simple example of how to use the taskflow decorator to define a DAG using Python functions as tasks.

If you want to learn more about Apache Airflow 2.6.0, you can check out the release notes, documentation, and blog.


Source:
(1) apache-airflow · PyPI. https://pypi.org/project/apache-airflow/.
(2) what’s new in Apache Airflow 2.6.0 | Apache Airflow. https://airflow.apache.org/blog/airflow-2.6.0/.
(3) Release Notes — Airflow Documentation – Apache Airflow. https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html.