Re: Concern for my data on large partition

On 19/08/2009, at 3:56 AM, Sherman, William R wrote:

Hello,

I have a largish (~5.5TB) partition onto which I thought I had successfully
created a filesystem -- and have been using it, but recently I've been
given cause for concern that there are grave issues that have the potential
for the loss of a large amount of data.

I was made aware of my potentially dangerous situation after rebooting
my machine and finding that not only did my large partition not mount,
but that the partition was listed by the OS as only begin about 1.5TB.

I managed to convince the system that the partition was okay, and it
mounts again, but I get this error (which I also get when rebooting):
"Aug 14 19:07:33 angelico kernel: I/O error in filesystem ("sdb1") meta-data dev sdb1 block 0x2ae7bffff       ("xfs_read_buf") error 5 buf count 512"

I'm convinced that this error is related to the incongruous partition
size issue that I am seeing.

For some background information, here are some details about my system
and what I learned when first creating the partition:

I am running Fedora-11 with a 3ware 9650SE RAID controller card with
8 Seagate 1TB drives configured as a ~5.8TB RAID-6 system. Using the
RAID bios controls, I configured the system to have two logical units.
One unit is for the OS (or actually OSes) and is only 96GB, the other
is for all the data, and is (or is supposed to be) ~5.7TB.

The 96GB unit shows up as drive "/dev/sda" and I've partitioned it
into four partitions to hold multiple OSes (though thus far, I've
only put Fedora-11 on the machine). The 2nd logical unit is referred
to as "/dev/sdb", and I've only created one partition to consume
the entire unit.

Of course the reason I've created the RAID-6 system to for the protection
of my data -- got burned once with RAID-5 when I was under the gun to get
some data off a system with one bad drive -- a 2nd drive went bad, and
all was lost. So, putting my data in peril is something I'm not
interested in (if that isn't a tautology).

When creating the partitions, I first tried to use the partition manager
that Fedora 11 provides, and it reported that it had successfully created
a "5623675 MB" partition. But after the system came up, "/proc/partitions"
disagreed, reporting:
  # cat /proc/partitions
  major minor #blocks name

  8        0 100663295 sda
  8        1   33554432 sda1
  8        2   33554432 sda2
  8        3   12582912 sda3
  8        4          1 sda4
  8        5   20964352 sda5
  8       16 5758648320 sdb
  8       17 1463676507 sdb1

So somewhere the partition size got messed up.

I then went to the partitioning tool I'm familiar with: fdisk. But
when creating the partition, it only allowed me to create a partition
of 267349 cylinders. If I tried anything larger, it reported "Value
out of range." The man page for fdisk didn't report anything about
partition size limitations, so not sure why that's failing.

I then tried the "cfdisk" program, and it seemed to accept my larger
partition size, but after writing the partition table and quitting,
"/proc/partitions" reported the same values as above -- no change.

I then tried "parted", but didn't find an option that would allow
for the "xfs" filesystem (or even ext3).

So I moved on to "sfdisk" which also seemed to accept my desired
partition size, but again after writing the partition table and
quitting, there was no change in the values of "/proc/partitions".

So I decided to give "parted" another try. And it seemed to work,
though now when I look back at my notes there was something that I
should have been concerned about in the output.

The essential commands I gave to "parted" are:
  # parted /dev/sdb
  (parted) rm 1
  (parted) mkpart primary 0 -0
  (parted) print
  (parted) quit

At this point, I checked "/proc/partitions" and was happy to see
the results I had been shooting for:
  # cat /proc/partitions
  major minor #blocks name

  8        0 100663295 sda
  8        1   33554432 sda1
  8        2   33554432 sda2
  8        3   12582912 sda3
  8        4          1 sda4
  8        5   20964352 sda5
  8       16 5758648320 sdb
  8       17 5758648320 sdb1

From there I was able to create an 'xfs' filesystem and mount it:
  % df
  Filesystem           1K-blocks      Used Available Use% Mounted on
  /dev/sda1             33027952   6217352 25132880 20% /
  tmpfs                  5076716      1820   5074896   1% /dev/shm
  /dev/sr0                   388       388         0 100% /media/Bluebirds
  /dev/sdb1            5758517248      4320 5758512928   1% /mnt

And I proceeded happily along until after a reboot, I got the above
'meta-data' error, and "/proc/partitions" was again reporting the
partition to be only 1463676507 1K blocks.

So thinking my data might be lost anyway (and fortunately still having
the original data available), I did an experiment. I tried just
doing a re-partition with "parted", and then mounting it -- and
that worked!
  # parted /dev/sdb
  (parted) print
  (parted) rm 1
  (parted) mkpart primary 0 -0
  (parted) print
  (parted) quit
  # cat /proc/partitions
  [...]
  8       17 5758648320 sdb1
  # mount /raid (renamed from previous)

And it worked, and the files were still there!

But then I looked back and noticed a discrepancy I'd missed before.
That final 'print' command to "parted" responded with:
  Model: AMCC 9650SE-8LP DISK (scsi)
  Disk /dev/sdb: 5897GB
  Sector size (logical/physical): 512B/512B
  Partition Table: msdos

  Number Start End     Size    Type     File system Flags
  1      512B   1499GB 1499GB primary xfs

That "1499GB" isn't right. In fact, I compared the two 'print'
outputs (before removing and recreating the partition, and after),
and they were identical. So "parted" reported that it had created
the exact same partition as I started with -- but for some reason
unknown to me, it still causes "/proc/partitions" to recognize
the full disk as the partition, and the mount proceedure works
(whereas it failed before).

So, I'm at a loss for what the problem is. I looked at the data
from the 3Ware RAID system, and it is reported as expected.

So, I'm hoping someone more familiar with "parted", and even
more familiar with Linux partitions might be able to shed some
light into the situation and illuminate my mind.

My appologies for the length of this email, but I figured all
this would be requested (and useful) in determining the situation.

  Thank you in advance,
  Bill

--
Bill Sherman
Sr. Technical Advisor
Advanced Visualization Lab
Indiana University
address@hidden

_______________________________________________
bug-parted mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/bug-parted

From:	Space Ship Traveller
Subject:	Re: Concern for my data on large partition
Date:	Wed, 19 Aug 2009 04:43:27 +1200