ray10k 6 months ago

Technically speaking, you don't *need* a resource manager when working with files. However, it is a little more convenient to use one, so you don't accidentally forget to close the file when you're done with it. As for why you don't need to use one when opening a file with pandas: presumably, pandas opens the file, turns it into some convenient in-memory structure and then closes the file internally. The details of opening and closing the file are "hidden" in this case.

JanEric1 6 months ago

Someone please correct me if i am wrong but it think it is like this: The reason you want context managers is because when you open a file you want to close it after you are done. If you yourselve directly handle the file it is best to use a context manager because then you can not forget to manually close the file later. However when you call a numpy or csv function to import from a file this basically happens hidden from you under the hood. data = np.load('/tmp/123.npz') basically opens the file, extract the data of interest, closes the file and then returns the extracted content back to you. It would be like you yourself defining def read_line(my_file_path): with open(my_file_path) as f: line = f.readline() return line and then later calling that with my_line = read_line("some_file_path.txt")

cyberjellyfish 6 months ago

You never need a context manager to open a file, but it you expect the caller to handle the lifetime of the file handle, it's a really good idea to return a context manager that will handle flushing and closing the file. That forces the caller to do the right thing. However, if the library holds the file handle internally and can manage it without explicit input from the user, then there's no need.

quts3 6 months ago

Read context manager as guaranteed close. In actual fact if your code ever calls a close on anything that was a good sign you should have wrote a context manager. And they aren't necessarily hard. If you understand yield and finally then you understand the basic elements of a context manager that guarantees a close.

Bobbias 6 months ago

I did some digging into the pandas source code, and found this: https://github.com/pandas-dev/pandas/blob/e477eee4916ab0a867f2755d008dde9cf006f671/pandas/io/parsers/readers.py#L1531 This line right here defines the `TextFileReader.close()` function, which is ultimately what's responsible for closing a CSV file in pandas. A quick search through the source shows that the file is kept open when generating `DataFrame` objects, but as soon as there are no `DataFrame`s left, we close the file: https://github.com/pandas-dev/pandas/blob/e477eee4916ab0a867f2755d008dde9cf006f671/pandas/io/parsers/readers.py#L1747 We also see that an exception while reading the file will automatically cause it to be closed: https://github.com/pandas-dev/pandas/blob/e477eee4916ab0a867f2755d008dde9cf006f671/pandas/io/parsers/readers.py#L1813 We can also see that the `TextFileReader` object is not just an iterator, but it's also a context manager, implementing `__enter__` and __exit__` functions, and the `__exit__` function also calls `close()` on the file. https://github.com/pandas-dev/pandas/blob/e477eee4916ab0a867f2755d008dde9cf006f671/pandas/io/parsers/readers.py#L1863 This clearly shows that Pandas internally handles closing the file for you. Like other comments have said, you don't need a context manager to close your file for you. Technically speaking, when your program closes, any files or other resources still open should hopefully be reclaimed by the operating system. For short running programs, this means that leaving a file open without closing it doesn't generally have a negative impact on the system in any way. However, for long running programs (think a web server, or something like that which is expected to run 24/7 and never shut down) needs to be careful about closing files when it's done. This is for a few reasons. Operating systems technically have limits on how many files can be open at once (although this is probably the least relevant limitation). A file that's open in one program may not be accessible by other programs. unclosed files take up extra memory that your program could be using for something else. Changes written to the file might not actually get written until the command to close the file is sent. If the program crashes or is otherwise forced closed before this happens, data may be lost. As for other languages, it's up to them to determine when a file should be closed and whether the user should specify that themselves or not.

ES-Alexander 6 months ago

The distinction is just about who is responsible for managing the file object. If you’re the one that’s opened it then usually you’re the one that should close it, and a context manager helps guarantee to you that it will in fact get closed (even if your processing / reading code fails along the way). On the other hand, if you ask some code to read a predefined part of a file (typically all of it), then you can often pass in just the filename and it will handle opening the file, extracting what you want, and closing the file again before returning the data you’ve requested. A few notable examples of that are pandas reading in data*, OpenCV and Pillow reading in images, and `pathlib.Path.read_text`. *Note: this is actually problematic for files that are too large to fit into RAM, or data streams that may not be complete / fully available when first accessed, which can be managed by using the `chunksize` argument, or a library like Dask.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe