Skip to navigation
Find out what files a program touched or used
12.05.26
## Find out what files a program touched or used ``` inotifywait -r -m -e access,open --format '%w%f' appfolder >> accessed_files.log sort accessed_files.log | uniq > unique_accessed_files.txt ```
https://man7.org/linux/man-pages/man1/inotifywait.1.html
Reply
Anonymous
**Goal:** 1. Record the output of `inotifywait` (which lists accessed files) into a file. 2. Process that file to extract a unique list of all the files that were accessed. **Prerequisites:** * You have `inotify-tools` installed on your Linux system. If not, you can usually install it via your package manager (e.g., `sudo apt-get install inotify-tools` on Debian/Ubuntu, `sudo yum install inotify-tools` on Fedora/CentOS). * You have a directory structure you want to monitor. --- ### Step 1: Record `inotifywait` Output to a File You'll need to run `inotifywait` for a sufficient duration to capture a representative sample of your project's file access. Then, you'll redirect its output to a file. **Command:** ```bash # Navigate to your project's root directory first (optional but good practice) # cd /path/to/your/project/ # Run inotifywait and redirect its output to a file # Replace '/path/to/your/project/' with the actual path to monitor # The output will be saved to 'accessed_files.log' inotifywait -r -m -e access,open --format '%w%f' /path/to/your/project/ > accessed_files.log ``` **Explanation:** * `inotifywait`: The command-line tool from `inotify-tools`. * `-r`: **Recursive.** Watch the directory and all its subdirectories. * `-m`: **Monitor.** Keep running indefinitely until you manually stop it (e.g., by pressing `Ctrl+C`). * `-e access,open`: **Events to watch.** We're specifically interested in `access` (a file was read) and `open` (a file was opened). You could add `modify` or `close_write` if you want to track changes as well, but for "what files are *used*", `access` and `open` are usually sufficient. * `--format '%w%f'`: **Output format.** * `%w`: The watched directory (the path you specified). * `%f`: The filename within that directory. * Together, `%w%f` will give you the full path to the accessed file. * `/path/to/your/project/`: **The directory to monitor.** Make sure to replace this with the actual path to your project. * `>`: **Redirect standard output.** This takes the output that `inotifywait` would normally print to your terminal and writes it into the specified file. * `accessed_files.log`: **The output file.** This file will be created in your current directory (or overwritten if it already exists). **How to use this:** 1. **Open a terminal.** 2. **Run the command above.** 3. **Let it run** while you perform the typical operations of your program, or run your test suite, or use your application as you normally would for a representative period. The longer you run it, the more comprehensive the data. 4. **Press `Ctrl+C`** in the terminal where `inotifywait` is running to stop it. Now, `accessed_files.log` will contain a list of file paths, with duplicates, for every time a file was accessed or opened. --- ### Step 2: Extract Unique Files from the Log Once `inotifywait` has stopped, you'll have your `accessed_files.log` file. Now you need to process it to get a clean list of unique files. **Command (using standard Linux utilities):** ```bash # Process the log file to get a unique list of files sort accessed_files.log | uniq > unique_accessed_files.txt ``` **Explanation:** * `sort accessed_files.log`: This command takes the `accessed_files.log` file and sorts its lines alphabetically. Sorting is a prerequisite for `uniq`. * `|`: **Pipe.** This sends the output of the `sort` command as input to the `uniq` command. * `uniq`: This command filters out adjacent duplicate lines. Because `sort` has already sorted the file, all identical file paths will be next to each other, allowing `uniq` to effectively remove them. * `>`: **Redirect standard output.** This takes the output of `uniq` and writes it into a new file. * `unique_accessed_files.txt`: **The output file.** This file will contain a list of every file that was accessed or opened at least once during your `inotifywait` monitoring session, with no duplicates. --- ### Step 3: Identify Potentially Deletable Files Now you have a list of files that *were* used (`unique_accessed_files.txt`). To find files that *can be deleted*, you need to compare this list against the complete list of files in your project directory. **Method:** 1. **Get a list of all files in your project:** ```bash # Make sure you are in your project's root directory # Or specify the path explicitly: find /path/to/your/project/ -type f > all_project_files.txt ``` * `find /path/to/your/project/ -type f`: This command finds all regular files (`-type f`) within your project directory (and its subdirectories). 2. **Compare the lists:** You can use `grep` or `comm` (or a script) for this. `comm` is very efficient for comparing sorted files. **Using `comm` (requires both lists to be sorted):** * Ensure `all_project_files.txt` is sorted (it usually is from `find`, but let's be sure): ```bash sort all_project_files.txt -o all_project_files.txt ``` * Compare the two sorted files: ```bash comm -23 all_project_files.txt unique_accessed_files.txt > potentially_unused_files.txt ``` * `comm` outputs three columns: * Column 1: Lines unique to `file1` (our `all_project_files.txt`). * Column 2: Lines unique to `file2` (our `unique_accessed_files.txt`). * Column 3: Lines common to both files. * `-23`: This suppresses columns 2 and 3, leaving only column 1 (lines unique to `all_project_files.txt` that are NOT in `unique_accessed_files.txt`). These are your candidates for deletion. **Result:** The `potentially_unused_files.txt` file will now contain a list of files that were present in your project but were *not* detected by `inotifywait` as being accessed or opened. --- ### Important Considerations & Next Steps: * **Representative Run:** The accuracy of this method heavily depends on how thoroughly you run your application or tests while `inotifywait` is active. If you miss a workflow that reads a file, it will appear unused. * **False Positives:** * **Configuration:** Files only read under specific configuration settings might be missed. * **Build Artifacts:** Files generated by your build process that your program *uses* but doesn't explicitly `open`/`access` at runtime might be flagged. * **System/Framework Usage:** Some files might be accessed by underlying libraries or the OS in ways `inotifywait` doesn't catch, or that are outside the scope of your specified path. * **Caching:** If a file is read once and then cached in memory, subsequent accesses might not trigger an `access` event on the file system. * **What about files your program *writes*?** `inotifywait -e access,open` won't track files that are *only* written to. If you want to identify files that are never touched at all (read or written), you might need to: * Add `write` and `close_write` events to `inotifywait`. * Or, use a method that compares a file list before and after your run. * **Verification:** **NEVER blindly delete files from `potentially_unused_files.txt`.** * **Review the list carefully.** * **Manually test your application** after removing a few candidate files to ensure no functionality breaks. * Consider moving files to a "quarantine" or "trash" folder for a few days before permanent deletion. This process provides a powerful way to identify candidates, but human oversight and testing are crucial for safety.
12.05.26
Reply
Anonymous
Information Epoch 1782407618
Make every program a filter.
Home
Notebook
Contact us