diff --git a/module4.ipynb b/module4.ipynb index b6892ec..7e699ac 100644 --- a/module4.ipynb +++ b/module4.ipynb @@ -298,6 +298,214 @@ "print(result.stdout)\n", "print(result.stderr)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced subprocess management" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import subprocess\n", + "\n", + "my_env = os.environ.copy()\n", + "my_env[\"PATH\"] = os.pathsep.join([\"/opt/myapp/\", my_env[\"PATH\"]])\n", + "\n", + "result = subprocess.run([\"myapp\"], env=my_env)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Python subprocesses**\n", + "\n", + "In Python, there are usually a lot of different ways to accomplish the same task. Some are easier to write, some are better suited to a given task, and some have a lower overhead in terms of the amount of computing power used. Subprocesses are a way to call and run other applications from within Python, including other Python scripts. In Python, the subprocess module can run new codes and applications by launching the new processes from the Python program. Because subprocess allows you to spawn new processes, it is a very useful way to run multiple processes in parallel instead of sequentially.\n", + "\n", + "Python subprocess can launch processes to: \n", + "\tOpen multiple data files in a folder simultaneously. \n", + "\tRun external programs. \n", + "\tConnect to input, output, and error pipes and get return codes.\n", + "\n", + "Comparing subprocess to OS and Pathlib\n", + "Again, Python has multiple ways to achieve most tasks; subprocess is extremely powerful, as it allows you to do anything you would from Python in the shell and get information back into Python. But just because you can use subprocess doesn’t always mean you'll want to. \n", + "\n", + "Let’s compare subprocess to two of its alternatives: OS, which has been covered in other readings, and Pathlib. For tasks like getting the current working directory or creating a directory, OS and Pathlib are more direct (or “Pythonic,” meaning it uses the language as it was intended). Using subprocess for tasks like these is like using a crowbar to open a nut. It's more heavy-duty and can be overkill for simple operations. \n", + "\n", + "As a comparison example, the following commands accomplish the exact same tasks of getting the current working directory. \n", + "\n", + "Subprocess: \n", + "\n", + "cwd_subprocess = subprocess.check_output(['pwd'], text=True).strip()\n", + "\n", + "OS: \n", + "\n", + "cwd_os = os.getcwd()\n", + "\n", + "Pathlib: \n", + "\n", + "cwd_pathlib = Path.cwd()\n", + "\n", + "And these following commands accomplish the exact same tasks of creating a directory. \n", + "\n", + "Subprocess: \n", + "\n", + "subprocess.run(['mkdir', 'test_dir_subprocess2'])\n", + "\n", + "OS: \n", + "\n", + "os.mkdir('test_dir_os2')\n", + "\n", + "Pathlib: \n", + "\n", + "test_dir_pathlib2 = Path('test_dir_pathlib2')\n", + "\n", + "test_dir_pathlib2.mkdir(exist_ok=True) #Ensures the directory is created only if it doesn't already exist\n", + "\n", + "**When to use subprocess**\n", + "Subprocess is best used when you need to interface with external processes, run complex shell commands, or need precise control over input and output. Subprocess also spawns fewer processes per task than OS, so subprocess can use less compute power. \n", + "\n", + "**Other advantages include:**\n", + "\tSubprocess can run any shell command, providing greater flexibility.\n", + "\tSubprocess can capture stdout and stderr easily.\n", + "\n", + "On the other hand, OS is useful for basic file and directory operations, environment variable management, and when you don't need the object-oriented approach provided by Pathlib. \n", + "\n", + "**Other advantages include:**\n", + "\tOS provides a simple way to interface with the operating system for basic operations.\n", + "\tOS is part of the standard library, so it's widely available.\n", + "\n", + "Finally, Pathlib is most helpful for working extensively with file paths, when you want an object-oriented and intuitive way to handle file system tasks, or when you're working on code where readability and maintainability are crucial. \n", + "\n", + "**Other advantages include:**\n", + "\tPathlib provides an object-oriented approach to handle file system paths.\n", + "\tCompared to OS, Pathlib is more intuitive for file and directory operations. \n", + "\tPathlib is more readable for path manipulations.\n", + "\n", + "**Where subprocess shines**\n", + "The basic ways of using subprocess are the .run() and .Popen() methods. There are additional methods, .call(), .check_output(), and .check_call(). Usually, you will just want to use .run() or one of the two check methods when appropriate. However, when spawning parallel processes or communicating between subprocesses, .Popen() has a lot more power!\n", + "\n", + "You can think of .run() as the simplest way to run a command—it’s all right there in the name—and .Popen() as the most fully featured way to call external commands. \n", + "All of the methods, .run(), .call(), .check_output(), and .check_call() are wrappers around the .Popen() class. \n", + "\n", + "Run\n", + "The .run() command is the recommended approach to invoking subprocesses. It runs the command, waits for it to complete, then returns a CompletedProcess instance that contains information about the process.\n", + "\n", + "Using .run() to execute the echo command:\n", + "\n", + "result_run = subprocess.run(['echo', 'Hello, World!'], capture_output=True, text=True)\n", + "\n", + "result_run.stdout.strip() # Extracting the stdout and stripping any extra whitespace\n", + "\n", + "output:\n", + "\n", + "'Hello, World!'\n", + "\n", + "Call \n", + "The call() command runs a command, waits for it to complete, then returns the return code. Call is older and .run() should be used now, but it’s good to see how it works.\n", + "\n", + "Using call() to execute the echo command: \n", + "\n", + "return_code_call = subprocess.call(['echo', 'Hello from call!'])\n", + "\n", + "return_code_call\n", + "\n", + "output:\n", + "\n", + "0\n", + "\n", + "The returned value 0 indicates that the command was executed successfully.\n", + "\n", + "Check_call and check_output\n", + "Use check_call() to receive just the status of a command. Use check_output() to also obtain output. These are good for situations such as file IO, where a file might not exis, or the operation may otherwise fail. \n", + "\n", + "The command check_call()is similar to call() but raises a CalledProcessError exception if the command returns a non-zero exit code.\n", + "\n", + "Using check_call() to execute the echo command:\n", + "\n", + "return_code_check_call = subprocess.check_call(['echo', 'Hello from check_call!'])\n", + "\n", + "return_code_check_call\n", + "\n", + "output:\n", + "\n", + "0\n", + "\n", + "The returned value 0 indicates that the command was executed successfully.\n", + "\n", + "Using check_output() to execute the echo command:\n", + "\n", + "output_check_output = subprocess.check_output(['echo', 'Hello from check_output!'], text=True)\n", + "\n", + "output_check_output.strip() # Extracting the stdout and stripping any extra whitespace\n", + "\n", + "output:\n", + "\n", + "'Hello from check_output!'\n", + "\n", + "Note: Check_output raises a CalledProcessError if the command returns a non-zero exit code. For more on CalledProcessError, see \n", + "Exceptions\n", + ".\n", + "\n", + "**Popen**\n", + "Popen() offers more advanced features compared to the previously mentioned functions. It allows you to spawn a new process, connect to its input/output/error pipes, and obtain its return code.\n", + "\n", + "Using Popen to execute the echo command:\n", + "\n", + "process_popen = subprocess.Popen(['echo', 'Hello from popen!'], stdout=subprocess.PIPE, text=True)\n", + "\n", + "output_popen, _ = process_popen.communicate()\n", + "\n", + "output_popen.strip() # Extracting the stdout and stripping any extra whitespace\n", + "\n", + "output:\n", + "\n", + "'Hello from popen!'\n", + "\n", + "**Pro tip**\n", + "The Popen command is very useful when you need asynchronous behavior and the ability to pipe information between a subprocess and the Python program that ran that subprocess. Imagine you want to start a long-running command in the background and then continue with other tasks in your script. Later on, you want to be able to check if the process has finished. Here’s how you would do that using Popen.\n", + "\n", + "import subprocess\n", + "\n", + "Using Popen for asynchronous behavior: \n", + "\n", + "process = subprocess.Popen(['sleep', '5'])\n", + "\n", + "message_1 = \"The process is running in the background...\"\n", + "\n", + "Give it a couple of seconds to demonstrate the asynchronous behavior\n", + "\n", + "import time\n", + "\n", + "time.sleep(2)\n", + "\n", + "Check if the process has finished\n", + "\n", + "if process.poll() is None:\n", + "\n", + "\tmessage_2 = \"The process is still running.\"\n", + "\n", + "else:\n", + "\n", + "\tmessage_2 = \"The process has finished.\"\n", + "\n", + "print(message_1, message_2)\n", + "\n", + "output:\n", + "\n", + "('The process is running in the background...',\n", + "\n", + " 'The process is still running.')\n", + "\n", + "The process runs in the background as the script continues with other tasks (in this case, simply waiting for a couple of seconds). Then the script checks if the process is still running. In this case, the check was after 2 seconds' sleep, but Popen called sleep on 5 seconds. So the program confirms that the subprocess has not finished running. " + ] } ], "metadata": {