4

I am trying to create large dummy files on a drive using dd. I am currently doing this:

#!/bin/bash
writeFile(){ #$1 - destination directory/filename, $2 - source filepath $3 - blocksize, $4 - blockcount $5 - log file name

if [ "$#" -ne 5 ]; then
    echo "Bad number of args - Should be 4, not $#"
    return 1;
fi

dest_filepath=$1
src_filepath=$2
block_size=$3
block_count=$4
log_file=$5

int_regex='^[0-9]+$' 

file_size=$(($block_size * $block_count))
src_file_size=`ls -l $src_filepath | awk '{print $5}'`
full_iter=0
while [[ $file_size -ge $src_file_size ]]; do
    file_size=$((file_size - $src_file_size))
    full_iter=$((full_iter + 1))
done

section_block_count=$(($src_file_size / $block_size))
echo $section_block_count $block_size
topping_off_block_count=$(($file_size / $block_size))

dest_dir=$(dirname $dest_filepath)
if [ -d "$dest_dir" ] && [ -r $src_filepath ] && [[ $block_size =~ $int_regex ]] && [[ $block_count =~ $int_regex ]]; then
    data_written=0
    for (( i=0 ; i < $full_iter ; i=$((i+1)) )); do
        (time dd of=$dest_filepath if=$src_filepath bs=$block_size count=$section_block_count seek=$data_written) >> $log_file 2>&1 #Output going to external file
        data_written=$(($data_written + $src_file_size +1 ))
        echo $data_written
    done

    if [[ $file_size -gt 0 ]]; then
        (time dd of=$dest_filepath if=$src_filepath bs=$block_size count=$topping_off_block_count seek=$data_written) >> $log_file 2>&1 & #Output going to external file
    fi
    return 0;
fi

return 1;   
}

However, this isn't working, as it's either only writing from the src_filepath once, or writing over the same part of the file multiple times, I don't know how to find out the difference. In this particular case, what I'm doing is writing from a 256MB file 4 times to create a single 1GB file, but I want to keep it generic so that I can write any size from and to.

The aim is to fragment a hard drive, and measure the output of dd (rate of transfer specifically) and the time it took.

I am on an embedded system with limited functionality, and the OS is a very but down version of linux using busybox.

How do I alter this so that it will write the correct size file?

Yann
  • 1,170
  • 8
  • 13
  • 4
    Why don't you just `cat` the file? Something like `for i in a b c d; do cat $file1 >> $file2; done`? You seem to have chosen an extremely complex way to get this done, what is your actual objective? – terdon Oct 28 '14 at 15:34
  • 1
    Try adding `conv=notrunc` to the `dd` lines. – Mark Plotnick Oct 28 '14 at 16:00
  • @MarkPlotnick I gave it a go, but apparently my busybox system dd doesn't have support for `conv` -.- – Yann Oct 28 '14 at 16:04
  • Ah, in that case, please [edit] your question and include exactly what you're trying to do. Please also specify your OS and shell language (I know you tagged as `bash` but it should also be mentioned in the question since you have no shebang line). – terdon Oct 28 '14 at 16:23
  • 1
    busybox != bash – ErlVolton Oct 28 '14 at 16:24
  • I agree with terdon – you seem to have gone out of your way to make this much more complicated than it needs to be. But (1) “it's either writing from the `src_filepath` only once, or writing over the same part of the file multiple times, I don't know how to find out the difference.” It’s writing over the same part of the file multiple times. You can debug things like this by inserting a `set -x` command before any statements whose execution you want to monitor. See [How to debug a bash script?](http://unix.stackexchange.com/q/155551/80216) – G-Man Says 'Reinstate Monica' Oct 28 '14 at 16:32
  • A couple of possible fixes: (2) Use the `seek=` option to `dd`, if your version supports it. (3) Rather than `dd of=$dest_filepath … >> $log_file 2>&1`, do `dd … >> $dest_filepath 2>> $log_file`. (4) A bit of general advice: always quote your shell variable references (e.g., `"$dest_filepath"` and `"$log_file"`) unless you have a good reason not to, and you’re sure you know what you’re doing. – G-Man Says 'Reinstate Monica' Oct 28 '14 at 16:33
  • @G-Man I'm writing from the same file, so to look at it, there would be no difference between it writing once and over the same bit 4 times. Also, I am using `seek`. I'm not sure I follow your third point, and it's getting a little long for a comment, why not explain it in an answer? – Yann Oct 28 '14 at 16:37
  • Without `conv=notrunc`, `dd` will truncate the output file every time it's run. Can you get a traditional `dd` executable for your system? – Mark Plotnick Oct 28 '14 at 16:58
  • i think what you want to do is `dd <file\n$(cat file file file file file)\nIN`. If you have a `tee` at your disposal which will handle `-` args, then `tee – mikeserv Dec 04 '14 at 11:54

1 Answers1

1

replying to comments: conv=notrunc makes dd not truncate, but doesn't make it seek to the end. (It leaves out O_TRUNC, but doesn't add O_APPEND in the open(2) system call).

Answering the question: If you insist on using dd instead of cat, then get the shell to open the output file for append, and have dd write to its stdout.

dd if=src bs=128k count=$count of=/dev/stdout >> dest 2>> log

Also, if you're trying to fragment your drive, you could do a bunch of fallocate(1) allocations to use space, and then start using dd once the drive is near full. util-linux's fallocate program is a simple front-end to the fallocate(2) system call.

xfs for example will detect the open, append pattern and leave its speculatively-preallocated space beyond EOF allocated for a few seconds after closing. So on XFS, a loop of appending to the same file repeatedly won't produce as much fragmentation as writing many small files.

You're on an embedded system, so I assume you're not using xfs. In that case, you still might see less fragmentation from your close/reopen/write-more that you'd expect, with a decently smart filesystem. Maybe sync between each write, to wait for the FS to allocate and write out all your data, before letting it know there's more coming.

Peter Cordes
  • 6,328
  • 22
  • 41